[mira_talk] Re: tweaking Manifest for polyploid genome

  • From: Martin Mokrejs <mmokrejs@xxxxxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 24 Feb 2014 17:51:59 +0100

Hi Juan,

Gutierrez, Juan wrote:
> Thanks Hanquan and Martin for your answers.
> 
> There are 3 types of reads in the assembly:
> 
> 869,000 454 reads, about 500bp long on average.

  is that after trimming? Was that 1kb sequencing on FLX+?, sure? Did you say 
"orphaned" plant transcriptome? Titanium-based shotgun sequencing usually gave 
worse numbers on average. And as you wrote initially that this is RNA-seq ... 
then even worse, the homopolymer tails typically kill the basecalling 
estimations, the basecalled sequence is worse than it claims to be. I just 
doubt you talk about high-qual sequence, sorry to say that. ;-)

> 32,254,100 illumina 100bp pair ended reads (approx total 62 million reads)
> 3,897,411 illumina 100bp single reads (result of the trimming process where 
> the other pair was of bad quality)
> 
> All reads are longer than 90bp (I removed the ones shorter than that after 
> trimming)
> 
> I am pretty confident on the trimming I did and all remaining reads are of 
> extreme high quality. I agree with Martin in that reads do not map, but I 
> think it is because of the similarity among the 3 different copies of each 
> gene. They just do not know where to map unequivocally.

You did not say what lab protocol was used for 454 and did not say whether MIDs 
were used. Somewhere in your manifest file was "RL12" so I concluded RLMID12. 
Wrong guess?

> 
> I am thinking that I might have been very strict on the parameters. Maybe the 
> right approach for this kind of repetitive assembly is being less strict on 
> the parameters and let Mira resolve the uncertainties?

Martin

> 
> Juan 
>  
> 
> -----Original Message-----
> From: mira_talk-bounce@xxxxxxxxxxxxx [mailto:mira_talk-bounce@xxxxxxxxxxxxx] 
> On Behalf Of Martin Mokrejs
> Sent: Monday, February 24, 2014 8:16 AM
> To: mira_talk@xxxxxxxxxxxxx
> Subject: [mira_talk] Re: tweaking Manifest for polyploid genome
> 
> Hi Juan,
>   how many 454 reads do you have on input? I see max coverage 180830 for the 
> 454 technology. also from "Coverage assessment" I see that only 454 is 
> covered at 0.78x and Solexa at 28.34x. Let me suspect your main issue is 
> proper trimming of raw reads. Looks the reads just do not assemble or you 
> sequenced too few material using 454. Ah, I see in your Manifest file you 
> used RapidLib approach and MIDs ... poor boy, how did you remove them?
> 
>   Maybe you would appreciate as a paid service my help, please see 
> http://www.bioinformatics.cz . There are plenty of tricks needed to get
> 454 trimming right and I don't know any other tool (except mine) ;) doing 
> that right, not even of a tool doing the proper queries for all adapters, 
> primers, artifacts. However, a lot of effort had to be put into the wrapper 
> code to manage and interpret the candidate alignments.
> Having just the right queries is not enough. 3 years of work, 25k lines of 
> code in python. My apologies if this this is considered as an Ad, I couldn't 
> resist.
> Martin
> 
> 
> Gutierrez, Juan wrote:
>> Hi,
>>
>>  
>>
>> I am trying to do RNA-seq de novo on a polyploid (hexaploid) genome 
>> using a combination of 454 and illumina 100bp paired ended reads. The 
>> three copies of each gene are highly similar to one another. I am 
>> having trouble in separating each one of the three copies into three 
>> different fully-length assemblies. Most of the times I just get a 
>> fragment of each of the three copies. I am guessing that when Mira 
>> finds a difference between highly similar transcripts, it just can’t 
>> assess if there is a polymorphism between 2 of the copies or a 
>> sequencing error. In any case, Mira seems to end the assembly way 
>> before reaching the end of the transcript.
>>
>>  
>>
>> I have prepared and run 2 different Manifests. I am getting better 
>> results with Manifest.conf (less number of contigs but longer) than 
>> with Manifest2.conf (higher number but shorter contigs), so I am 
>> supposing that I could eventually separate the 3 copies of each gene 
>> by fine-adjusting the parameters.
>>
>>  
>>
>> Any suggestion would be greatly appreciated,
>>
>> Thanks so much!
>>
>> Juan

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: