[mira_talk] Re: Very long transcripts

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 30 Oct 2012 20:39:43 +0100

On Oct 30, 2012, at 14:35 , Jackie Lighten <jackie.lighten@xxxxxx> wrote:
> I have performed an accurate de novo assembly with poly-a/t trimming.
> I get all reads assembled, and no singlets, into around 66k contigs. Around 
> 27k of these are large contigs, with the largest being ~25k bases long. This 
> does not make much sense to me as I constructed a 3' target cDNA library (454 
> FLX). I can envisage multiple open reading frames may create longer 
> transcripts but 25k seems dodgy to me.
> Any thoughts?

Yes. Have a look at those contigs :-) No joke, this always brings the best 
insights.

Possible reasons:
- PKS genes. These can be up to 45 - 50kb long, maybe even longer
- contamination of the cDNA with gDNA
- introns. Especially for highly expressed genes, one has a higher chance to 
have sequence unedited mRNA
- unclipped adaptors which "join" contains
- assembly "errors": short overlaps of just a couple of bases

B.

Other related posts: