[mira_talk] Re: Metagenome assembly

From: Bastien Chevreux <bach@xxxxxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Thu, 26 Apr 2012 22:53:45 +0200

On Apr 25, 2012, at 22:44 , Shaun Tyler wrote:
> Does anyone have experience assembling metagenome data with Mira.  I have a 
> feeling this might be one of those applications that will give Mira a nervous 
> breakdown.  The data is 100 bp paired end Illumina reads from libraries 
> derived from nasal swabs.  There is slightly in excess of 2 Gbp of data per 
> sample (25 M reads or so).
> 
25m reads alone are not a big problem, I've done RNASeq assemblies with 40 to 
50m and I know at least two users who ventured into the 100m area (but I'd not 
recommend doing that). You just need a machine which is big enough.

However, I fear that some aspects of metagenomes will indeed lead to problems. 
If you assemble the date in "genome" mode, I think MIRA will have a hard time 
in guessing the "coverage" of this "genome" ... and that will lead to 
misassemblies. If you assemble in EST mode, things will probably go faster, but 
there again I am almost sure misassemblies will happen.

The thing is: in metagenomes, there is no such thing as an "average coverage" 
because this "average coverage" will be mainly driven by population ratios. I 
have no idea how to get around this.

In any case: if you are making trials, set 
  -SK:bph=31

This will probably greatly reduce misassemblies at the expense of genomes with 
low abundance being less well assembled.

Would love to hear back from you on that.

B.

References:
- [mira_talk] Metagenome assembly
  - From: Shaun Tyler

[mira_talk] Re: Metagenome assembly

Other related posts: