[mira_talk] Re: tweaking Manifest for polyploid genome

  • From: "Gutierrez, Juan" <Juan.Gutierrez@xxxxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Mon, 24 Feb 2014 22:37:23 +0000

Thanks Adrian, 

The illumina coverage is 80-100x, so pretty decent. I did an assembly with only 
illumina reads and yes the numbers were better but not fully recovering the 
genes and shuffling snps among the 3 homologous genes. Thus, I thought that 
adding long 454 reads could help in resolving gene copies to their full extent. 
Paradoxically, I am also getting better assemblies using 454 reads alone, it is 
just that I can't get both types of reads assembled together.

Juan 


-----Original Message-----
From: mira_talk-bounce@xxxxxxxxxxxxx [mailto:mira_talk-bounce@xxxxxxxxxxxxx] On 
Behalf Of Adrian Pelin
Sent: Monday, February 24, 2014 11:49 AM
To: mira_talk@xxxxxxxxxxxxx
Subject: [mira_talk] Re: tweaking Manifest for polyploid genome

What is your coverage for the illumina PE reads? In my experience, 454 + 
illumina assemblies with MIRA are worse in stats than illumina assemblies 
alone. If your illumina coverage is enough, try using it only without 454.

On 2/24/2014 11:30 AM, Gutierrez, Juan wrote:
> Thanks Hanquan and Martin for your answers.
>
> There are 3 types of reads in the assembly:
>
> 869,000 454 reads, about 500bp long on average.
> 32,254,100 illumina 100bp pair ended reads (approx total 62 million 
> reads)
> 3,897,411 illumina 100bp single reads (result of the trimming process 
> where the other pair was of bad quality)
>
> All reads are longer than 90bp (I removed the ones shorter than that 
> after trimming)
>
> I am pretty confident on the trimming I did and all remaining reads are of 
> extreme high quality. I agree with Martin in that reads do not map, but I 
> think it is because of the similarity among the 3 different copies of each 
> gene. They just do not know where to map unequivocally.
>
> I am thinking that I might have been very strict on the parameters. Maybe the 
> right approach for this kind of repetitive assembly is being less strict on 
> the parameters and let Mira resolve the uncertainties?
>
> Juan
>   
>
> -----Original Message-----
> From: mira_talk-bounce@xxxxxxxxxxxxx 
> [mailto:mira_talk-bounce@xxxxxxxxxxxxx] On Behalf Of Martin Mokrejs
> Sent: Monday, February 24, 2014 8:16 AM
> To: mira_talk@xxxxxxxxxxxxx
> Subject: [mira_talk] Re: tweaking Manifest for polyploid genome
>
> Hi Juan,
>    how many 454 reads do you have on input? I see max coverage 180830 for the 
> 454 technology. also from "Coverage assessment" I see that only 454 is 
> covered at 0.78x and Solexa at 28.34x. Let me suspect your main issue is 
> proper trimming of raw reads. Looks the reads just do not assemble or you 
> sequenced too few material using 454. Ah, I see in your Manifest file you 
> used RapidLib approach and MIDs ... poor boy, how did you remove them?
>
>    Maybe you would appreciate as a paid service my help, please see 
> http://www.bioinformatics.cz . There are plenty of tricks needed to 
> get
> 454 trimming right and I don't know any other tool (except mine) ;) doing 
> that right, not even of a tool doing the proper queries for all adapters, 
> primers, artifacts. However, a lot of effort had to be put into the wrapper 
> code to manage and interpret the candidate alignments.
> Having just the right queries is not enough. 3 years of work, 25k lines of 
> code in python. My apologies if this this is considered as an Ad, I couldn't 
> resist.
> Martin
>
>
> Gutierrez, Juan wrote:
>> Hi,
>>
>>   
>>
>> I am trying to do RNA-seq de novo on a polyploid (hexaploid) genome 
>> using a combination of 454 and illumina 100bp paired ended reads. The 
>> three copies of each gene are highly similar to one another. I am 
>> having trouble in separating each one of the three copies into three 
>> different fully-length assemblies. Most of the times I just get a 
>> fragment of each of the three copies. I am guessing that when Mira 
>> finds a difference between highly similar transcripts, it just can’t 
>> assess if there is a polymorphism between 2 of the copies or a 
>> sequencing error. In any case, Mira seems to end the assembly way 
>> before reaching the end of the transcript.
>>
>>   
>>
>> I have prepared and run 2 different Manifests. I am getting better 
>> results with Manifest.conf (less number of contigs but longer) than 
>> with Manifest2.conf (higher number but shorter contigs), so I am 
>> supposing that I could eventually separate the 3 copies of each gene 
>> by fine-adjusting the parameters.
>>
>>   
>>
>> Any suggestion would be greatly appreciated,
>>
>> Thanks so much!
>>
>> Juan
>>
>>
>>
>>
>>
>> This electronic message contains information generated by the USDA solely 
>> for the intended recipients. Any unauthorized interception of this message 
>> or the use or disclosure of the information it contains may violate the law 
>> and subject the violator to civil or criminal penalties. If you believe you 
>> have received this message in error, please notify the sender and delete the 
>> email immediately.
> --
> You have received this mail because you are subscribed to the 
> mira_talk mailing list. For information on how to subscribe or 
> unsubscribe, please visit 
> http://www.chevreux.org/mira_mailinglists.html
>
> b  j yǢ  m +&j)[yƮ 쨹 ޲  r  y h     jY&j)b    b  h )ߢ   *' xh  ,   &ޢ     r  
> z ^jǯ ȭ  i  0  ^   Ɗ  h jf  ) +- fl===


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: