[mira_talk] Re: 454/Solexa hybrid assembly of a 35Mbp genome?

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 3 Jun 2009 00:08:03 +0200

On Dienstag 02 Juni 2009 Jan Paces wrote:
> [...]
> I think Mira keeps in memory few bytes about each read, which makes it
> impossible to use it with huge amount of SOLiD or SOLEXA reads.
> [...]

Yep, this is currently a real problem. Unfortunately, not only a few bytes. On 
64 bit machines, I have ~280 bytes of overhead per read just for alll the 
empty pointers, strings, vectors, clipping points and lists. Then around 10 
bytes per read base (nucleotide, quality, adjustement positions etc.). An then 
MIRA uses an assembly strategy the keeps copies of reads untouched until the 
contig they were put in has been accepted as good.

This was perfectly reasonable for Sanger project up to medium eukaryote size 
(say, 80 megabases with 1m Sanger reads) and is still manageable now for 454 
Titanium, but the small reads break my neck there. I sometimes wish I could 
squeeze in more work in the evening hours and weekends to get around this ;-)

Regards,
  Bastien

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: