[mira_talk] Re: mira_talk Digest V2 #23
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Wed, 18 Mar 2009 18:47:18 +0100
On Monday 16 March 2009 Alex Washington wrote:
> Yes. I did not have enough memory for the solexa reads (14, 771, 764
> reads)(ex2), and have switched to a second server with 16 gigs thus my
> second issue arose.
> [...]
> For nearly each read, it repeats the following for example.
Fortunately only for 347k reads and not all 14m.
> : Solexa: Filter out (A hard) sa2|218
> : Solexa: Filter out (A hard) sa2|819
> : Solexa: Filter out (A hard) sa2|894
Background for this filter: Solexa data may contain A-homopolymers (sometimes
also T, G&C are less frequent) on the 3' end of reads. From what I gather
there are a different reasons for that, but for an assembler it boils down to:
those stretches have to be clipped away or else bad things will happen.
So, MIRA has a separate filter which works like this: starting at the 3' end,
if a stretch >= 20 A (or T) appears, completely clip that read (removing it).
This is the "hard" clip". The "soft" clip works similarly, but start with 12 A
(or T) and needs an A (or T) ration of >= 80% for the whole read.
That kills most of the junk, the rest can be handled eficciently by the -CL:pec
clipping.
Of course, there may be problems for eukaryots and their poly-A tails, but one
can't have everything :-)
> until moving to phase 1 were there system froze.
Hmmm, a system should never completely freeze.
> This is were the system freezes. I am unsure if it is a memory issue or
> hardware issue. Currently running on a...
> Intel(R)Xeon(R) CPU X5355
> 2550.00 Mhz
> 4096 KB
> 16 gb ram
> Ubunto 8.04 Hardy.
Fair enough. But not enough RAM for the 14m reads you have (MIRA has too much
overhead per read *sigh* and I simply re-used the de-novo assembly structures
to do the mapping, which leads to this unacceptable memory hunger). It still
should work more or less painlessly if you have an additional 8 to 16 GiB of
swap.
On the other hand: you'll have a coverage of >270x, which is total overkill.
From past experience, a coverage of 30x is more than enough to catch
absolutely everything. See also the paper from the Sanger Centre November last
year, I don't remember the first author but Durbin was (of course) also on the
list.
To evaluate whether MIRA still may do what you want or need, start the mapping
with 8m reads. This will be still overkill, but you'll be able to get a good
insight whether you like it or not.
Ah, please take version 2.9.43 :-)
Regards,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: