[mira_talk] Re: How does Mira determine quality scores?

From: Davide Sassera <davide.sassera@xxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Wed, 29 Jul 2009 15:08:56 +0200

I would like to add something on this topic.

I found that often in situations of long homopolymers the presence offew reads containing "1 more base" overcomes the presence of many morereads with "1 less base" in the consensus.

Manual corrections shows that the majority "1 less base" reads areright, so I have to correct the consensus each time this happens.

Could the problem brought up by David Hesselbom be the reason for this"bug"?


Thanks
Davide

Bastien,
I've run some tests on 454 assemlies from both Mira and Newbler andhave concluded that the quality scores attributed to homopolymers arevery different depending on the source, even within the same genome.For example, in a homopolymer in a consensus sequence, Newbler qualityscores are nearly always the same in the neighboring bases andthroughout the homopolymer itself, except for its last base, which hasa very low score compared to the rest of the bases in the homopolymer.Supposedly, this is because the length of the homopolymer is notcertain (the reads do not agree), but it's only the last of the basesthat is uncertain whether it should be there or not.
In Mira assemblies, however, all bases in a homopolymer have varyingquality scores, none of which are very low, and typically, bases in(at least) long homopolymers have a lower average score than thosesurrounding the homopolymer, meaning it constitutes a considerable"drop" in the quality scores. To me, the Newbler quality scores inhomopolymers seem to make more sense than the Mira ones, since whatwe're uncertain about is the number of bases in the homopolymer. Sinceit doesn't matter which base we remove within the homopolymer, the lowquality score might as well be attributed to the last one. Mira seemsto spread out the quality score penalty over each base in thehomopolymer, though I do not believe this is what's actually happening. :)
I'd like to know why the quality scores are determined so differentlyby Mira and Newbler, and also the details on how Mira does it. Forexample, does it take homopolymers into special consideration?
Thanks,

David Hesselbom
Research assistant
Molecular Evolution
EBC, Uppsala University



--
Davide Sassera
Sezione di Patologia Generale e Parassitologia

Dipartimento di Patologia Animale,Igiene e Sanità Pubblica VeterinariaFacoltà di Veterinaria

Università degli Studi di Milano
Via Celoria 10, 20133, Milano, ITALY
Tel: +39 0250318094
Fax: +39 0250318095

Follow-Ups:
- [mira_talk] Re: How does Mira determine quality scores?
  - From: Bastien Chevreux

References:
- [mira_talk] How does Mira determine quality scores?
  - From: David Hesselbom

[mira_talk] Re: How does Mira determine quality scores?

Other related posts: