[mira_talk] Re: Megahub info

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Fri, 26 Jun 2009 12:11:18 +0200

On Freitag 26 Juni 2009 Björn Nystedt wrote:
> [...]
> Basically this is about integrating data; if I have long pieces of the
> genome that I know is correct, how do I best combine that information with
> the full set of shotgun and paired-end reads to make the most accurate and
> complete assembly?

Hmmm, fragmenting them into longer "fake" reads and mixing them into the lot 
is actually the only way to go with mira at the moment.

> Phrap can assemble "reads" of any length in a sensible way. We often have
> long segments of a genome that for various reasons have been proven
> correct;
> [...]

Wait a minute: are you telling me that giving phrap two "reads" of, say, 50 
megabases and having an overlap of, say, 200kb, then phrap will actually join 
those reads?

Ouch, if that's the case, that won't be available with MIRA anytime soon. I 
use a banded Smith-Waterman that is almost O(n) in time, but O(n^2) is space 
used. And these are some of the most tricky parts of MIRA which I am not too 
keen to touch at the moment.

> [...]
> As discussed, fake reads of up to 20kb can be fed into MIRA allready now,
> but there was the issue with the megahubs, making me a bit unsure that the
> assembly algorithm is really designed for this, although it appears to work
> pretty well (but I have not had time to investigate it too much yet).

That megahub problem actually is something which could be solved, I'll have a 
look.

> However, for longer fake reads (such as for example complete manually
> checked contigs, or manually combined PCR products), we need to cut them
> into 20kb overlapping pieces, which is kind of against the whole idea of
> producing long correct segments.
> If anything can be done in this direction it would be great!

Not directly with very long fake reads, no. But I do have an idea how this 
could be solved efficiently (if not elegantly).

Won't be available exactly tomorrow though.

Regards,
  Bastien



--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: