[mira_talk] Re: How does Mira determine quality scores?
- From: Davide Sassera <davide.sassera@xxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Thu, 30 Jul 2009 15:12:37 +0200
Dear Bastien,
thanks for the quick reply.
I think that 2/3 gaps could be a good threshold to start with.
At least we would get rid of those pesky situations where I have 30 gaps
and 3 bases and still get a base.
I would love to try a version with this algorithm, so for once I can
help instead of just using the software.
I cannot guarantee on how fast I will be able to give you some results
as I'm very busy, but I'll do my best.
Reasoning from another point of view, couldn't Mira give quality score
to gaps also? it could either be a fixed number or something like the
mean value of the two bases around the gap?
maybe this does not make sense, what do you think?
Davide
On Mittwoch 29 Juli 2009 Davide Sassera wrote:
I would like to add something on this topic.
I found that often in situations of long homopolymers the presence of
few reads containing "1 more base" overcomes the presence of many more
reads with "1 less base" in the consensus.
Manual corrections shows that the majority "1 less base" reads are
right, so I have to correct the consensus each time this happens.
Could the problem brought up by David Hesselbom be the reason for this
"bug"?
The current consensus algorithms look at base specific qualities only, and
often a few reads are enough to have a high enough quality to be considered as
valid base. In this case MIRA currently prefers the base over the gap, that is
true.
Question is: how many times do you have a majority of gaps which is right ...
and how often do you have a majority of gaps which is wrong? Would you have
any numbers on that? I could fine tune the algorithm a bit with that. I looked
at the function and I think that building it a simple majority vote (e.g. when
=2/3 of all bases are gaps then take the gap regardles of the base
qualities).
Would you want to try a version with that algorithm and report back whether
you see improvements?
Regards,
Bastien
--
Davide Sassera
Sezione di Patologia Generale e Parassitologia
Dipartimento di Patologia Animale,
Igiene e Sanità Pubblica Veterinaria
Facoltà di Veterinaria
Università degli Studi di Milano
Via Celoria 10, 20133, Milano, ITALY
Tel: +39 0250318094
Fax: +39 0250318095
Other related posts: