[mira_talk] Re: 454 homopolymers

  • From: Yvan Wenger <yvan.wenger@xxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 22 Mar 2011 12:32:50 +0100

Hello everybody,

I'm in a similar situation, although I'm working on cDNA with a large
eukaryotic transcriptome (without reference). I get a very high
representation of sequences I know of with Mira, but frequents 1 base
insertions/deletions when compared to Newbler 2.5 output.

In my case, I was considering taking the Newbler sequences whenever
available to correct the Mira sequences... did anybody try this by any
chance?

Finally about the difference between chromatograms and fasta(+qual), I
was wondering if there is any tool allowing to remove adapters/vector
sequences directly in the sff or xml file used by mira? The problem
here is that my sff file is correct, but some prior adapters used for
normalisation are still in the sequences.

Finally in my experience, Newbler performs slightly better with the
sff files are input than with fasta+qual, but the difference is not
dramatic. I see still more "future frameshift" after in-silico
translation of mira seqs than after newbler seqs even when the input
is the same for both.

All the best,

Yvan



On Tue, Mar 22, 2011 at 11:55 AM, Leonor Palmeira <mlpalmeira@xxxxxxxxx> wrote:
> Dear All,
>
> I am assembling a small 110kb viral genome and comparing the results between
> MIRA and Newbler. The data I have is a 454 run, and some Sanger reads
> covering one of my repetitive regions that was very hard to assemble 'de
> novo'.
>
> I am quite happy with the MIRA hybrid assembly (with the -highlyrepetitive
> flag) which yields a very large contig covering almost my entire genome,
> including my repeats. However, compared to some previously sequenced Sanger
> reads and to another strain, there is a significant number of errors in
> homopolymers. This is particularly annoying in CDSs as it leads to a shift
> in the reading frame...
>
> The Newbler assembly, however, yields much smaller contigs but with fewer
> homopolymer length differences. I suspect this comes from the usage of the
> flowgram information in the alignment of the reads?
>
> The MIRA assembly is much better at disentangling repeats but these small
> errors are probably due to the usage of .fasta and .qual files instead of
> the flowgrams as used in Newbler. I find it very frustrating to be forced to
> use my Newbler contigs, as the MIRA assembly is much better on several
> points.
>
> I realize the difficulty of the implementation, but would there be a way of
> integrating flowgrams in the 454 part of the MIRA assembler some time in the
> future?
>
> Best,
> Leonor.
> --
> Leonor Palmeira, PhD
>
> Phone: +32 4 366 42 69
> Email: mlpalmeira AT ulg DOT ac DOT be
> http://sites.google.com/site/leonorpalmeira
>
> Immunology-Vaccinology, Bat. B43b
> Faculty of Veterinary Medicine
> Boulevard de Colonster, 20
> University of Liege, B-4000 Liege (Sart-Tilman)
> Belgium
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: