[mira_talk] Re: Questions from one of my researchers

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sat, 23 Apr 2011 17:50:07 +0200

On Apr 21, 2011, at 18:07 , George Marselis wrote:
> One of my researchers has a couple of questions. I am pasting verbatim:
> 1. Is there a parallel version that can use multiple nodes on a cluster to
> distribute the analysis?

Sorry, no.

> 2. How can we feed in large amount of Paired Ended data with different
> insert sizes (e.g. 16 libraries with different inserts and some with
> multiple PE sets; altogether 76 fastq files; around 200GB size on disk) +
> 50 GB long reads).

Forget it, MIRA will not work with that amount of data.

> 3. Can we feed to mira pre-assembled contigs e.g. from soapdenovo along
> with the original PE libraries so that contigs can be extended; there
> seems to be a limit of 2k reads currently acceptable to mira.

The limit is more at something like 15 to 20 kbp. Longer than that, you need to 
fragment.

> 4. Is this known bug in mira solved? mapping of paired-end reads with one
> read being in non-repetitive area and the other in a repeat is not as
> effective as it should be (taken from
> http://mira-assembler.sourceforge.net/docs/chap_solexa_part.html#sect_sxa_k
> nown_bugs___problems)

No, still there. On my TODO, though I do not know when I will have time to look 
at it.

> I think the answer to the first question is "no, launch multiple instances
> instead". 

Indeed, but only possible for EST assemblies which were pre-clustered. For 
genome assemblies it makes less sense as everything somehow connects to 
everything else.

B.


Other related posts: