[mira_talk] Re: assembly parameters and more
- From: Davide Sassera <davide.sassera@xxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Thu, 19 Mar 2009 14:15:46 +0100
Hi Bastien, all,
Thanks for the answer!
I'm currently running 2.9.43 at 'normal' level with mask on. I started
2.9.43 before you told me to, must be telephatic ;)
The first pass went smoothly (around 33hours) but now in the second pass
it started swapping (now 6 gigs), I'll keep you informed.
Maybe I should assemble just the half titanium plate and use the half
normal gs-flx afterwards to close the gaps?
thanks
D.
On Sunday 15 March 2009 Davide Sassera wrote:
I hope I understood correctly what you need and that the file I'm
sending is the right one.
Hi Davide,
thanks for the files you sent. Took me some time to have a look at, but they
were basically what I needed.
Looks like there is a combination of a few things going on. I made a some
tests and discovered that massive coverage together highly repetitive data and
turning on masking nasty sequences gave the behaviour you saw.
First: the coverage (and megahubs)
100x is already quite massive for de-novo. If the genome you have contains
some nasty surprises (say, a repetitive element, longer than a read, is
contained 15x in the genome), then this explains the 1500x coverages you saw
in some passes of MIRA. In earlier times one saw this kind of coverage only
for EST sequencing projects ... in non-normalised libraries.
Therefore, these things tend to trigger the megahub detector ... and I cannot
really blame it for that :-)
Second: the long runtime and insane memory requirements
These were triggered by the massive coverage in conjunction with the masking
of nasty repeats ... and a "security feature" I built into MIRA that backfired.
Short story: So as not to loose alignments when parts of it (those with nasty
sequences) are masked, I told MIRA to take every alignment there. Which in
your case led to "a lot" of reads having *all* their alignments analysed and
stored.
[...]
I'm currently using version 2.39, I suppose I should update to 2.42
I've removed said "security feature" from the code, have a try at 2.9.43 :-)
Also, if you have time, please try with and without masking nasty repeats. I'm
curious about how it behaves in your real world case and what you thing is
better. To get MIRA going when masking is off, increase -SK:mmhr (5 should be
enough).
Regards,
Bastien
--
Davide Sassera
Sezione di Patologia Generale e Parassitologia
Dipartimento di Patologia Animale,
Igiene e Sanita` Pubblica Veterinaria
Facolta` di Veterinaria
Universita` degli Studi di Milano
Via Celoria 10, 20133, Milano, ITALY
Tel: +39 0250318094
Fax: +39 0250318095
Other related posts: