On Apr 21, 2011, at 18:07 , George Marselis wrote: > One of my researchers has a couple of questions. I am pasting verbatim: > 1. Is there a parallel version that can use multiple nodes on a cluster to > distribute the analysis? Sorry, no. > 2. How can we feed in large amount of Paired Ended data with different > insert sizes (e.g. 16 libraries with different inserts and some with > multiple PE sets; altogether 76 fastq files; around 200GB size on disk) + > 50 GB long reads). Forget it, MIRA will not work with that amount of data. > 3. Can we feed to mira pre-assembled contigs e.g. from soapdenovo along > with the original PE libraries so that contigs can be extended; there > seems to be a limit of 2k reads currently acceptable to mira. The limit is more at something like 15 to 20 kbp. Longer than that, you need to fragment. > 4. Is this known bug in mira solved? mapping of paired-end reads with one > read being in non-repetitive area and the other in a repeat is not as > effective as it should be (taken from > http://mira-assembler.sourceforge.net/docs/chap_solexa_part.html#sect_sxa_k > nown_bugs___problems) No, still there. On my TODO, though I do not know when I will have time to look at it. > I think the answer to the first question is "no, launch multiple instances > instead". Indeed, but only possible for EST assemblies which were pre-clustered. For genome assemblies it makes less sense as everything somehow connects to everything else. B.