[mira_talk] Re: Metagenome assembly

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 26 Apr 2012 22:53:45 +0200

On Apr 25, 2012, at 22:44 , Shaun Tyler wrote:
> Does anyone have experience assembling metagenome data with Mira.  I have a 
> feeling this might be one of those applications that will give Mira a nervous 
> breakdown.  The data is 100 bp paired end Illumina reads from libraries 
> derived from nasal swabs.  There is slightly in excess of 2 Gbp of data per 
> sample (25 M reads or so).
> 
25m reads alone are not a big problem, I've done RNASeq assemblies with 40 to 
50m and I know at least two users who ventured into the 100m area (but I'd not 
recommend doing that). You just need a machine which is big enough.

However, I fear that some aspects of metagenomes will indeed lead to problems. 
If you assemble the date in "genome" mode, I think MIRA will have a hard time 
in guessing the "coverage" of this "genome" ... and that will lead to 
misassemblies. If you assemble in EST mode, things will probably go faster, but 
there again I am almost sure misassemblies will happen.

The thing is: in metagenomes, there is no such thing as an "average coverage" 
because this "average coverage" will be mainly driven by population ratios. I 
have no idea how to get around this.

In any case: if you are making trials, set 
  -SK:bph=31

This will probably greatly reduce misassemblies at the expense of genomes with 
low abundance being less well assembled.

Would love to hear back from you on that.

B.

Other related posts: