[mira_talk] Re: Very long transcripts

From: "Jackie Lighten" <jackie.lighten@xxxxxx>
To: <mira_talk@xxxxxxxxxxxxx>
Date: Wed, 31 Oct 2012 15:32:15 -0300
Thanks Martin.

I am less worried about the issue of long transcripts as I get the vast 
majority of my cDNA library aligning nicely to a reference of a very closely 
related species.
But let me take a closer look at what I have and I may get back to you.

Thanks,

Jack

-----Original Message-----
From: mira_talk-bounce@xxxxxxxxxxxxx [mailto:mira_talk-bounce@xxxxxxxxxxxxx] On 
Behalf Of Martin Mokrejs
Sent: October-31-12 8:01 AM
To: mira_talk@xxxxxxxxxxxxx
Subject: [mira_talk] Re: Very long transcripts

Hi Jackie,
  you haven't said what lab protocol was used to prepare the sample for 
sequencing.
The numbers sound like a whole disaster and therefore I guess you used the 
MINT/SMART protocols (especially as you mentioned polyA/T trimming and intended 
3'-end only cDNA library). I analyzed few dozens of such datasets and FIXED 
them. Since I started to work on these I hit so many issues that honestly said, 
my only advise is you don't work with such datasets unless you clean them up. 
The question is what you should do ... ;-) I found the way through ... If you 
want to re-invent the wheel reserve
6-8 months of your time as a molecular biologist and hire a programmer.
  On the other hand, if you used Roche protocol then with those random hexamers 
you shouldn't have that many issues but still I have seen datasets which also 
had polyA/T tails in reads. These, in theory, should NOT appear in cDNA 
prepared by Roche protocol but somehow they do! ;-)
  I think I mentioned that here on this list some while ago ... I can offer a 
commercial service in cleanup and correction of the SFF files. There have to be 
done so many alignments that fixing 300k reads (a 1/4 XLR plate) takes several 
weeks on a fastest machine I could get). Therefore, a paid service. I really do 
know why it takes so long ;-), not a single adapter-trimming software available 
around does anything even remotely similar and unless the thing is published I 
won't say more about my approach.

  In general, transcripts are 2-5kb long, some aberrant transcripts are longer 
with extended 3'-UTR. That's the current view of transcriptomes. Unless you 
have a truly obscure organism then the assembly is just wrong.

  As a quick insight I can look into the assembled contigs for adapters and 
..., but for a final solution I would need SFF files. I can work with fast(a/q) 
files but that is suboptimal. So once again. How many raw reads are to be 
analyzed? What protocol?
Otherwise? You have to wait and meanwhile, good luck with you efforts. ;-) 
Best, Martin


Bastien Chevreux wrote:
> On Oct 30, 2012, at 14:35 , Jackie Lighten <jackie.lighten@xxxxxx 
> <mailto:jackie.lighten@xxxxxx>> wrote:
>> I have performed an accurate de novo assembly with poly-a/t trimming.
>> I get all reads assembled, and no singlets, into around 66k contigs. Around 
>> 27k of these are large contigs, with the largest being ~25k bases long. This 
>> does not make much sense to me as I constructed a 3' target cDNA library 
>> (454 FLX). I can envisage multiple open reading frames may create longer 
>> transcripts but 25k seems dodgy to me.
>> Any thoughts?
> 
> Yes. Have a look at those contigs :-) No joke, this always brings the best 
> insights.
> 
> Possible reasons:
> - PKS genes. These can be up to 45 - 50kb long, maybe even longer
> - contamination of the cDNA with gDNA
> - introns. Especially for highly expressed genes, one has a higher 
> chance to have sequence unedited mRNA
> - unclipped adaptors which "join" contains
> - assembly "errors": short overlaps of just a couple of bases
> 
> B.
> 

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html



--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html
References:
- [mira_talk] Very long transcripts
  - From: Jackie Lighten
- [mira_talk] Re: Very long transcripts
  - From: Bastien Chevreux
- [mira_talk] Re: Very long transcripts
  - From: Martin Mokrejs
[mira_talk] Re: Very long transcripts

Other related posts: