[mira_talk] Re: 454 homopolymers

  • From: Cladonia2 <fermaral1981@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 22 Mar 2011 15:53:47 +0100

Hi, I am in the same situation, and I read this paper were they compare
the novo assemblers and they said that merging two different assembly
programs (MIRA and Newbler2.5) with CAP3 and then mapping in to the new
contigs is the best approach:

http://www.biomedcentral.com/1471-2164/11/571

What do you think about it?


El mar, 22-03-2011 a las 12:32 +0100, Yvan Wenger escribió: 
> Hello everybody,
> 
> I'm in a similar situation, although I'm working on cDNA with a large
> eukaryotic transcriptome (without reference). I get a very high
> representation of sequences I know of with Mira, but frequents 1 base
> insertions/deletions when compared to Newbler 2.5 output.
> 
> In my case, I was considering taking the Newbler sequences whenever
> available to correct the Mira sequences... did anybody try this by any
> chance?
> 
> Finally about the difference between chromatograms and fasta(+qual), I
> was wondering if there is any tool allowing to remove adapters/vector
> sequences directly in the sff or xml file used by mira? The problem
> here is that my sff file is correct, but some prior adapters used for
> normalisation are still in the sequences.
> 
> Finally in my experience, Newbler performs slightly better with the
> sff files are input than with fasta+qual, but the difference is not
> dramatic. I see still more "future frameshift" after in-silico
> translation of mira seqs than after newbler seqs even when the input
> is the same for both.
> 
> All the best,
> 
> Yvan
> 
> 
> 
> On Tue, Mar 22, 2011 at 11:55 AM, Leonor Palmeira <mlpalmeira@xxxxxxxxx> 
> wrote:
> > Dear All,
> >
> > I am assembling a small 110kb viral genome and comparing the results between
> > MIRA and Newbler. The data I have is a 454 run, and some Sanger reads
> > covering one of my repetitive regions that was very hard to assemble 'de
> > novo'.
> >
> > I am quite happy with the MIRA hybrid assembly (with the -highlyrepetitive
> > flag) which yields a very large contig covering almost my entire genome,
> > including my repeats. However, compared to some previously sequenced Sanger
> > reads and to another strain, there is a significant number of errors in
> > homopolymers. This is particularly annoying in CDSs as it leads to a shift
> > in the reading frame...
> >
> > The Newbler assembly, however, yields much smaller contigs but with fewer
> > homopolymer length differences. I suspect this comes from the usage of the
> > flowgram information in the alignment of the reads?
> >
> > The MIRA assembly is much better at disentangling repeats but these small
> > errors are probably due to the usage of .fasta and .qual files instead of
> > the flowgrams as used in Newbler. I find it very frustrating to be forced to
> > use my Newbler contigs, as the MIRA assembly is much better on several
> > points.
> >
> > I realize the difficulty of the implementation, but would there be a way of
> > integrating flowgrams in the 454 part of the MIRA assembler some time in the
> > future?
> >
> > Best,
> > Leonor.
> > --
> > Leonor Palmeira, PhD
> >
> > Phone: +32 4 366 42 69
> > Email: mlpalmeira AT ulg DOT ac DOT be
> > http://sites.google.com/site/leonorpalmeira
> >
> > Immunology-Vaccinology, Bat. B43b
> > Faculty of Veterinary Medicine
> > Boulevard de Colonster, 20
> > University of Liege, B-4000 Liege (Sart-Tilman)
> > Belgium
> >
> > --
> > You have received this mail because you are subscribed to the mira_talk
> > mailing list. For information on how to subscribe or unsubscribe, please
> > visit http://www.chevreux.org/mira_mailinglists.html
> >
> 



-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: