[mira_talk] Re: assembly parameters and more

On Sunday 15 March 2009 Davide Sassera wrote:
> I hope I understood correctly what you need and that the file I'm
> sending is the right one.

Hi Davide,

thanks for the files you sent. Took me some time to have a look at, but they 
were basically what I needed.

Looks like there is a combination of a few things going on. I made a some 
tests and discovered that massive coverage together highly repetitive data and 
turning on masking nasty sequences gave the behaviour you saw.

First: the coverage (and megahubs)

100x is already quite massive for de-novo. If the genome you have contains 
some nasty surprises (say, a repetitive element, longer than a read, is 
contained 15x in the genome), then this explains the 1500x coverages you saw 
in some passes of MIRA. In earlier times one saw this kind of coverage only 
for EST sequencing projects ... in non-normalised libraries.

Therefore, these things tend to trigger the megahub detector ... and I cannot 
really blame it for that :-)


Second: the long runtime and insane memory requirements

These were triggered by the massive coverage in conjunction with the masking 
of nasty repeats ... and a "security feature" I built into MIRA that backfired. 
Short story: So as not to loose alignments when parts of it (those with nasty 
sequences) are masked, I told MIRA to take every alignment there. Which in 
your case led to "a lot" of reads having *all* their alignments analysed and 
stored.

> [...]
> I'm currently using version 2.39, I suppose I should update to 2.42

I've removed said "security feature" from the code, have a try at 2.9.43 :-)

Also, if you have time, please try with and without masking nasty repeats. I'm 
curious about how it behaves in your real world case and what you thing is 
better. To get MIRA going when masking is off, increase -SK:mmhr (5 should be 
enough).

Regards,
  Bastien


-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: