[mira_talk] Re: How does Mira determine quality scores?
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Wed, 29 Jul 2009 22:39:05 +0200
On Mittwoch 29 Juli 2009 Davide Sassera wrote:
> I would like to add something on this topic.
> I found that often in situations of long homopolymers the presence of
> few reads containing "1 more base" overcomes the presence of many more
> reads with "1 less base" in the consensus.
> Manual corrections shows that the majority "1 less base" reads are
> right, so I have to correct the consensus each time this happens.
> Could the problem brought up by David Hesselbom be the reason for this
> "bug"?
The current consensus algorithms look at base specific qualities only, and
often a few reads are enough to have a high enough quality to be considered as
valid base. In this case MIRA currently prefers the base over the gap, that is
true.
Question is: how many times do you have a majority of gaps which is right ...
and how often do you have a majority of gaps which is wrong? Would you have
any numbers on that? I could fine tune the algorithm a bit with that. I looked
at the function and I think that building it a simple majority vote (e.g. when
>=2/3 of all bases are gaps then take the gap regardles of the base
qualities).
Would you want to try a version with that algorithm and report back whether
you see improvements?
Regards,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: