[mira_talk] Re: bug report for -CO:fnicpst
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Thu, 28 May 2009 21:23:44 +0200
On Freitag 22 Mai 2009 Byron Knoll wrote:
> There appears to be a bug when forcing non IUPAC tags (-CO:fnicpst). When
> examining assembled contigs, there are several cases where the consensus
> base is clearly not set to the majority vote. For example, I have a column
> with 973Gs, 3As, and 2Cs and the consensus base is A. I am running
> mira_2.9.45_dev_linux-gnu_i686_32.
Hello Byron,
hmmmm, not sure whether it's a bug per se or more a case of "something
unexpected".
As the man page describes, -CO:fnicpst is not a majpority vote per se, but a
flag for using majority vote when a conflict arises. Now, apparently MIRA saw
no
problem in calling an A at this place which overturns the G.
There may be several reasons for this, without seeing the data I'll just give
a few on top of my head:
- the reads with "G" have no quality and the default quality of '10' has not
been changed: this would lead to a consensus quality of 11 or 12 for 'G'. If
the reads with 'A' now do have qualities which would be a lot higher than 10,
let's assume 22, 27 and 29. So, the 'A' consensus get's a quality so much
higher (around 30 or 31) than the 'G' that MIRA does not care considering the
'G' as viable call.
- all reads have no qualities attached (and therefore all bases have the same
quality), but the reads with 'G' are all in the same direction (either forward
or reverse). If now in the reads with 'A' there are two in one direction (say,
forward) and one in the other direction (reverse), then MIRA will give a
consensus quality of ~11 for the 'G', but a consensus quality of ~22 for the
'A'. Here too, it's clear for MIRA that it must be 'A' and the 'G' is not
considered.
Please also have a look at
http://www.freelists.org/post/mira_talk/Quality-Values,4
where I gave a short roundup on how MIRA currently calculates qualities.
Now, the reason MIRA uses this approach is that for every sequencing
technology I've worked with so far (Sanger, 454 and Solexa), there are
sequencing artefacts that can be overcome only when looking at quality values
and read orientation in an alignment. Looking at coverage alone or quality
values alone would not be enough to call the "real" base. This strategy fails
in some cases, the most distinct one being when working with sequences without
quality values.
Does this answer your question or do you think that you have a different case?
If yes, please tell :-)
Regards,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: