[mira_talk] Re: megahubs

  • From: Chris Hoefler <hoeflerb@xxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Thu, 3 Jul 2014 11:28:39 -0500

> One warning: a few weeks ago I’ve had the unpleasant surprise to discover
> that the HGAP3 pipeline totally fails to give reliable corrected reads for
> a diploid genome. It simply mixed differences fromdifferent ploidies into
> single reads. Of course assemblers which recognise this barf and will
> create many more contigs than you’d expect (MIRA certainly does, the one
> from the HGAP pipeline also).


I have the feeling this is due to their method for processing the read
alignments into corrected reads. In HGAP1 they were using AMOS tools, which
had the option to exclude partial alignments. In HGAP2/3 they are using
their own tool (pbdagcon) to basically go from multiple alignments of the
raw reads directly to corrected reads. If you try with HGAP1 and use the
option allowPartialAlignments="false", I would be curious to know if that
resolves the problem. The downside is that the error correction will be
slower, which might be prohibitive for a large genome.


On Wed, Jun 25, 2014 at 2:08 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On 25 Jun 2014, at 16:34 , Jose Huguet Tapia <jch63@xxxxxxxxxxx> wrote:
> > I am using MIRA 4 to assembly an Oomycete (Eukaryotic). I believe that
> the organism has a highly heterozygous genome.
>
> One warning: a few weeks ago I’ve had the unpleasant surprise to discover
> that the HGAP3 pipeline totally fails to give reliable corrected reads for
> a diploid genome. It simply mixed differences fromdifferent ploidies into
> single reads. Of course assemblers which recognise this barf and will
> create many more contigs than you’d expect (MIRA certainly does, the one
> from the HGAP pipeline also).
>
> >   I run a "first test" in MIRA with Pacbio corrected Long reads. The
> assembly was going ok until I got the message of megahubs. From previous
> discussion I learned that megahubs are quite common in eukarotycs.
> > My concern is the level of total megahubs . It is more than 90 %.
>
> Errrm, yes. 99% megahubs points to a hefty problem. I know that MIRA 4.0.2
> has some trouble correctly defining megahubs for long read data, but 99% is
> just ridiculous. Something feels fishy, either regarding MIRA or the data.
>
> > In the message It says that I set the a maxium allowed ratio of 90. I
> believe that I did not set this parameter though. Does MIRA 4 have a
> default for this.
>
> Yes, MIRA sets default according to the data you feed it. Am I correct in
> assuming you used a pretty standard manifest to launch this assembly?
>
> B.
>
>
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>



-- 
Chris Hoefler, PhD
Postdoctoral Research Associate
Straight Lab
Texas A&M University
2128 TAMU
College Station, TX 77843-2128

Other related posts: