On Oct 30, 2012, at 14:35 , Jackie Lighten <jackie.lighten@xxxxxx> wrote: > I have performed an accurate de novo assembly with poly-a/t trimming. > I get all reads assembled, and no singlets, into around 66k contigs. Around > 27k of these are large contigs, with the largest being ~25k bases long. This > does not make much sense to me as I constructed a 3' target cDNA library (454 > FLX). I can envisage multiple open reading frames may create longer > transcripts but 25k seems dodgy to me. > Any thoughts? Yes. Have a look at those contigs :-) No joke, this always brings the best insights. Possible reasons: - PKS genes. These can be up to 45 - 50kb long, maybe even longer - contamination of the cDNA with gDNA - introns. Especially for highly expressed genes, one has a higher chance to have sequence unedited mRNA - unclipped adaptors which "join" contains - assembly "errors": short overlaps of just a couple of bases B.