On Mittwoch 29 Juli 2009 Davide Sassera wrote: > I would like to add something on this topic. > I found that often in situations of long homopolymers the presence of > few reads containing "1 more base" overcomes the presence of many more > reads with "1 less base" in the consensus. > Manual corrections shows that the majority "1 less base" reads are > right, so I have to correct the consensus each time this happens. > Could the problem brought up by David Hesselbom be the reason for this > "bug"? The current consensus algorithms look at base specific qualities only, and often a few reads are enough to have a high enough quality to be considered as valid base. In this case MIRA currently prefers the base over the gap, that is true. Question is: how many times do you have a majority of gaps which is right ... and how often do you have a majority of gaps which is wrong? Would you have any numbers on that? I could fine tune the algorithm a bit with that. I looked at the function and I think that building it a simple majority vote (e.g. when >=2/3 of all bases are gaps then take the gap regardles of the base qualities). Would you want to try a version with that algorithm and report back whether you see improvements? Regards, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html