[mira_talk] Re: 454/Solexa hybrid assembly of a 35Mbp genome?
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Tue, 2 Jun 2009 23:57:07 +0200
On Dienstag 02 Juni 2009 Khan, Anar wrote:
> [...]
> My fungus' expected genome size is 35Mb. I'm running some simulations on a
> close relative's genome to choose the best sequencing strategy. I'd like to
> try assembling half or full plate of 454 Titanium reads, together with say
> an eighth/quarter plate of paired end SOLiD 3 reads (2 x 50bp)
Hello Anar,
Jan has already given most of the answers I would have, too. In short: going
de-novo (or even hybrid) with short reads is pretty risky for eukaryotes. The
paired-end for microreads (be they SOLiD or Solexa) will not bring much
advantage in case of small insert sizes (say, 200-500) when you have a decent
coverage of longer reads which are as long as the insert size of the small
reads. Things look better if you have a 2k insert size like it is now possible
for Solexa (SOLiD I don't know).
With one plate of Titanium (1m reads @400 bases) you'd get ~11x-12x coverage
from 454. It's a bit low, but certainly a start.
> (btw I'm
> just plugging the SOLiD data into MIRA as Solexa data for now - i.e. it's
> just simulated nucleotides rather than colour space).
Other people got burned with this strategy in the past, with any assembler. At
least with the first and second generation of SOLiDs which I got told had a
third to half the reads with at least one sequencing error somewhere. This one
error together with naive conversion of colour space to nucleotide bases then
leads to reads which are totally wrong starting at the sequencing error site.
This is also the reason why MIRA does not accept SOLiD as input at the moment
... and it might very well be that it never really will. To work well with
that data you need to work in colour space, and I simply don't have the time
to implement all needed routines for that. ABI could alleviate the problem a
bit if they'd provide robust conversion from colour space to nucleotide space
*without* having to use a reference, but I imagine that this is not easy to do
and would also eat up quite some resources.
> I've scoured the mailing lists for parameters which
> might control memory usage, but I don't think anything (e.g. -SK:mhpr,
> -SK:mchr) could bring memory usage below 16Gb RAM. Is that correct?
Correct. No way this is enough for a hybrid 454 + Solexa/SOLiD assembly of a
eukaryote.
> If 454/Solexa hybrid assembly isn't possible using MIRA, I'll probably
> assemble the 454 reads using MIRA, and use the likes of Bambus to
> "scaffold" up the MIRA contigs. You may wish to comment on whether this
> sounds like a reasonable alternative approach.
It is reasonable (not only with MIRA, but also with other assemblers), I know
that quite some people are going this way. The 16GB should be enough for a
Titanium plate (normally).
Prokaryotes in complete hybrid are possible since quite a while with MIRA,
depending on the version, sometimes more and sometimes less. I don't advertise
it much as I'm still experimenting a bit and would not recommend trying
without asking me which version to use and what parameters to try.
You just need to realise that assembly of a eukaryote will take you quite some
time :-)
Regards,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: