[mira_talk] Re: Reference vs. De novo assembly.

  • From: Andrzej N <andrzej.k.n@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 16 Dec 2009 16:53:33 -0600

All right. Some more questions.

 I just did a BLAST against NCBI nucleotide data base of my 175 best quality
contigs from de novo assembly. 170 of these contigs align against the
mitochondrial sequence which I'm using as my reference (during reference
alignment).

 Now, funny part. ALL 170 contings created "consensus" of about 790,000bp
(some of them can be joined, which is nice, and after joining I will get 150
contigs total), and they have from about 64% (only a minor fraction) to 100%
(majority of the contigs) query coverage to reference sequence?! BUT
mitochondrial reference sequence is about 460,000bp long...

 So, where do I get the extra 330,000bp???

 How do all these contigs aligned against one mitochondrial genome sequence,
but they represent 1.7X of the reference sequence ? It's impossible!?
Especially
when query of each contig is 100% the same as reference?

 I really can NOT understand that. Can anybody help me here?

Andrzej


On Thu, Dec 10, 2009 at 12:23 PM, Bastien Chevreux <bach@xxxxxxxxxxxx>wrote:

> On Dienstag 08 Dezember 2009 Andrzej N wrote:
> > As you can see now I have biggest contig 50000bp, but the "problem"
> >  remains. Is there any chance to tell MIRA not to add more sequences
> above
> >  the for example 100x? Here as you see in some single regions MIRA is
> >  putting 899 (I think is a limit of MIRA to put stuff on top of each
> >  other). Can we tell MIRA stop doing it?
>
> You can't at the moment. MIRA keeps repetitive sequences in 'normal'
> contigs
> at normal coverage, but for contigs only made of 100% identical repeats it
> stacks everything together (if there was no way to disentangle using
> paired-
> end).
>
> I also had MIRA disentangle these 'repeatcontigs' at one time, i.e. if 12
> rRNA
> stretches were present in a bacterium it made between 10 and 13 rRNA
> contigs,
> but then MIRA lost in those wannabe assembly benchmarks which only look at
> N50
> and number of contigs. So I removed it.
>
> I still feel that disentangling them is the right way to go though.
>
> >  This somehow artificially increases my coverage...
> >  I don't know if it that important, but doesn't look good.
>
> It does not really increase your average coverage, perhaps by 0.1 or so.
> Not
> really important.
>
> > Do you have other suggestions, before I will go and start doing cloning
> ;).
>
> Not really.
>
> Regards,
>  Bastien
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

Other related posts: