[mira_talk] Long running time for a mapping assembly

From: Henrik Lantz <Henrik.Lantz@xxxxxx>
To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
Date: Mon, 23 Jan 2012 09:49:24 +0100

Dear all,

recently I ran a mapping assembly of Illumina data on a backbone assembled by 
Newbler, and this took over three weeks on 8 nodes on a 128 GB machine. This 
does not compare well with the example given in the Mira documentation of a 
yeast genome with a similar coverage which took around 3.5 hours. Granted, I 
have longer reads, and the fungus I am working with probably has more repeats 
than a yeast, but I still can't help feeling I am doing something wrong.

This is the command I used:
/media/vol1/bioinformatics/mira_dev/bin/mira --project=0525 
--job=mapping,genome,accurate,solexa -GE:not=8 -SK:mmhr=2 -AS:nop=1 
-SB:bsn=All454 -MI:somrnl=0 SOLEXA_SETTINGS -SB:dsn=0525 >&log_assembly.txt

The organism is a fungus with a genome size of around 21.5 MB. The Illumina 
data is paired end, and Mira reports a Illumina coverage of 32.06 for the three 
week run. I have some slight megahub-problems, an I therefore include the 
-SK:mmhr=2 setting as recommended by Bastien in a earlier question I had. I 
used Mira V3.4rc3.

Is the three weeks running time expected, or am I doing something wrong? In the 
best of all worlds I would like to decrease the running time and also include 
the PE information, which I am not doing above.

Cheers,
Henrik
--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

[mira_talk] Long running time for a mapping assembly

Other related posts: