[mira_talk] Re: 454/Solexa hybrid assembly of a 35Mbp genome?

  • From: Jan Paces <hpaces@xxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 02 Jun 2009 09:01:20 +0200

Khan, Anar wrote:
> My fungus’ expected genome size is 35Mb. I’m running some simulations on
> a close relative’s genome to choose the best sequencing strategy. I’d
> like to try assembling half or full plate of 454 Titanium reads,
> together with say an eighth/quarter plate of paired end SOLiD 3 reads (2
> x 50bp) (btw I’m just plugging the SOLiD data into MIRA as Solexa data
> for now – i.e. it’s just simulated nucleotides rather than colour space).

Hi, I have similar task (just a little bigger), but I still did not
found reasonable solution. I am afraid 16G RAM is too low for 35Mb
genome for any technology.

I think Mira keeps in memory few bytes about each read, which makes it
impossible to use it with huge amount of SOLiD or SOLEXA reads. In Mira
you can use two approaches:

first run Mira with only 454 reads and then map short reads on it

or

if you have enough coverage (~ > 30) of short reads, assembly them (eg
in velvet) and add those contigs together with 454 reads into Mira with
reasonable high default quality. Contigs longer than 20kb have to be
splitted.

However, none of these approaches is optimal.

I am using Bambus for scaffolding as well, but you need mate-pairs
paired end with at least two different insert sizes, eg 5kb and 20kb or
something like this. Just paired end with ~ .5 kb distance do not help,
it's roughly same size as 454 read.

If you still did not get your sequences, my suggestion is to use paired
end 454 titanium reads. For now I do not see any software/hardware
suitable for small eukaryots and mixed technology.

Hope it helps,

Jan


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: