[mira_talk] Re: How does Mira determine quality scores?

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 29 Jul 2009 22:39:05 +0200

On Mittwoch 29 Juli 2009 Davide Sassera wrote:
> I would like to add something on this topic.
> I found that often in situations of long homopolymers the presence of
> few reads containing "1 more base" overcomes the presence of many more
> reads with "1 less base" in the consensus.
> Manual corrections shows that the majority "1 less base" reads are
> right, so I have to correct the consensus each time this happens.
> Could the problem brought up by David Hesselbom be the reason for this
> "bug"?

The current consensus algorithms look at base specific qualities only, and 
often a few reads are enough to have a high enough quality to be considered as 
valid base. In this case MIRA currently prefers the base over the gap, that is 
true.

Question is: how many times do you have a majority of gaps which is right ... 
and how often do you have a majority of gaps which is wrong? Would you have 
any numbers on that? I could fine tune the algorithm a bit with that. I looked 
at the function and I think that building it a simple majority vote (e.g. when 
>=2/3 of all bases are gaps then take the gap regardles of the base 
qualities).

Would you want to try a version with that algorithm and report back whether 
you see improvements?

Regards,
  Bastien




-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: