[mira_talk] Re: Megahub info

  • From: Björn Nystedt <bjorn.nystedt@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Fri, 26 Jun 2009 13:32:14 +0200

On Fri, 26 Jun 2009 12:11:18 +0200
Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> > Phrap can assemble "reads" of any length in a sensible way. We often have
> > long segments of a genome that for various reasons have been proven
> > correct;
> > [...]
> 
> Wait a minute: are you telling me that giving phrap two "reads" of, say, 50 
> megabases and having an overlap of, say, 200kb, then phrap will actually join 
> those reads?

Yepp, exactly! 
It works quite well, especially if you add a bunch of short reads that span the 
overlap; this is sometimes a quite efficient way to proceed in a genome project.
 
> Ouch, if that's the case, that won't be available with MIRA anytime soon. I 
> use a banded Smith-Waterman that is almost O(n) in time, but O(n^2) is space 
> used. And these are some of the most tricky parts of MIRA which I am not too 
> keen to touch at the moment.

Ok, I see.
So what about the idea of an optional final joining step, starting where a 
mapping+denovo assembly ends today; I am no algorithm wizard, but it seems to 
me that a fairly simple joining approach would do better than most people 
manage by manual fiddling around in gap4/consed. Although it would require some 
other approach than the core MIRA algorithm, so it is a bit of a hack. But a 
useful one :)


> > [...]
> > As discussed, fake reads of up to 20kb can be fed into MIRA allready now,
> > but there was the issue with the megahubs, making me a bit unsure that the
> > assembly algorithm is really designed for this, although it appears to work
> > pretty well (but I have not had time to investigate it too much yet).
> 
> That megahub problem actually is something which could be solved, I'll have a 
> look.
> 
> > However, for longer fake reads (such as for example complete manually
> > checked contigs, or manually combined PCR products), we need to cut them
> > into 20kb overlapping pieces, which is kind of against the whole idea of
> > producing long correct segments.
> > If anything can be done in this direction it would be great!
> 
> Not directly with very long fake reads, no. But I do have an idea how this 
> could be solved efficiently (if not elegantly).
> 
> Won't be available exactly tomorrow though.

Anything is welcome, whenever possible. 

Thanks again!
Björn
 
> Regards,
>   Bastien
> 
> 
> 
> -- 
> You have received this mail because you are subscribed to the mira_talk 
> mailing list. For information on how to subscribe or unsubscribe, please 
> visit http://www.chevreux.org/mira_mailinglists.html


-- 
====================================
Björn Nystedt (Sällström)
PhD Student
Molecular Evolution
EBC, Uppsala University
Norbyv. 18C, 752 36  Uppsala
Sweden
phone: +46 (0)18-471 45 88
email: Bjorn.Nystedt@xxxxxxxxx
====================================

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: