On Apr 25, 2012, at 22:44 , Shaun Tyler wrote: > Does anyone have experience assembling metagenome data with Mira. I have a > feeling this might be one of those applications that will give Mira a nervous > breakdown. The data is 100 bp paired end Illumina reads from libraries > derived from nasal swabs. There is slightly in excess of 2 Gbp of data per > sample (25 M reads or so). > 25m reads alone are not a big problem, I've done RNASeq assemblies with 40 to 50m and I know at least two users who ventured into the 100m area (but I'd not recommend doing that). You just need a machine which is big enough. However, I fear that some aspects of metagenomes will indeed lead to problems. If you assemble the date in "genome" mode, I think MIRA will have a hard time in guessing the "coverage" of this "genome" ... and that will lead to misassemblies. If you assemble in EST mode, things will probably go faster, but there again I am almost sure misassemblies will happen. The thing is: in metagenomes, there is no such thing as an "average coverage" because this "average coverage" will be mainly driven by population ratios. I have no idea how to get around this. In any case: if you are making trials, set -SK:bph=31 This will probably greatly reduce misassemblies at the expense of genomes with low abundance being less well assembled. Would love to hear back from you on that. B.