[mira_talk] Re: How does Mira determine quality scores?

  • From: Davide Sassera <davide.sassera@xxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 29 Jul 2009 15:08:56 +0200

I would like to add something on this topic.

I found that often in situations of long homopolymers the presence of few reads containing "1 more base" overcomes the presence of many more reads with "1 less base" in the consensus.

Manual corrections shows that the majority "1 less base" reads are right, so I have to correct the consensus each time this happens.

Could the problem brought up by David Hesselbom be the reason for this "bug"?

Thanks
Davide


Bastien,

I've run some tests on 454 assemlies from both Mira and Newbler and have concluded that the quality scores attributed to homopolymers are very different depending on the source, even within the same genome. For example, in a homopolymer in a consensus sequence, Newbler quality scores are nearly always the same in the neighboring bases and throughout the homopolymer itself, except for its last base, which has a very low score compared to the rest of the bases in the homopolymer. Supposedly, this is because the length of the homopolymer is not certain (the reads do not agree), but it's only the last of the bases that is uncertain whether it should be there or not.

In Mira assemblies, however, all bases in a homopolymer have varying quality scores, none of which are very low, and typically, bases in (at least) long homopolymers have a lower average score than those surrounding the homopolymer, meaning it constitutes a considerable "drop" in the quality scores. To me, the Newbler quality scores in homopolymers seem to make more sense than the Mira ones, since what we're uncertain about is the number of bases in the homopolymer. Since it doesn't matter which base we remove within the homopolymer, the low quality score might as well be attributed to the last one. Mira seems to spread out the quality score penalty over each base in the homopolymer, though I do not believe this is what's actually happening. :)

I'd like to know why the quality scores are determined so differently by Mira and Newbler, and also the details on how Mira does it. For example, does it take homopolymers into special consideration?

Thanks,

David Hesselbom
Research assistant
Molecular Evolution
EBC, Uppsala University


--
Davide Sassera
Sezione di Patologia Generale e Parassitologia
Dipartimento di Patologia Animale, Igiene e Sanità Pubblica Veterinaria Facoltà di Veterinaria
Università degli Studi di Milano
Via Celoria 10, 20133, Milano, ITALY
Tel: +39 0250318094
Fax: +39 0250318095

Other related posts: