[mira_talk] Re: bug report for -CO:fnicpst

Thanks Bastien, that explains the results I observed. I am working with
sequences with no quality values.

Cheers,
// Byron

On Thu, May 28, 2009 at 12:23 PM, Bastien Chevreux <bach@xxxxxxxxxxxx>wrote:

> On Freitag 22 Mai 2009 Byron Knoll wrote:
> > There appears to be a bug when forcing non IUPAC tags (-CO:fnicpst). When
> > examining assembled contigs, there are several cases where the consensus
> > base is clearly not set to the majority vote. For example, I have a
> column
> > with 973Gs, 3As, and 2Cs and the consensus base is A. I am running
> > mira_2.9.45_dev_linux-gnu_i686_32.
>
> Hello Byron,
>
> hmmmm, not sure whether it's a bug per se or more a case of "something
> unexpected".
>
> As the man page describes, -CO:fnicpst is not a majpority vote per se, but
> a
> flag for using majority vote when a conflict arises. Now, apparently MIRA
> saw no
> problem in calling an A at this place which overturns the G.
>
> There may be several reasons for this, without seeing the data I'll just
> give
> a few on top of my head:
>
> - the reads with "G" have no quality and the default quality of '10' has
> not
> been changed: this would lead to a consensus quality of 11 or 12 for 'G'.
> If
> the reads with 'A' now do have qualities which would be a lot higher than
> 10,
> let's assume 22, 27 and 29. So, the 'A' consensus get's a quality so much
> higher (around 30 or 31) than the 'G' that MIRA does not care considering
> the
> 'G' as viable call.
>
> - all reads have no qualities attached (and therefore all bases have the
> same
> quality), but the reads with 'G' are all in the same direction (either
> forward
> or reverse). If now in the reads with 'A' there are two in one direction
> (say,
> forward) and one in the other direction (reverse), then MIRA will give a
> consensus quality of ~11 for the 'G', but a consensus quality of ~22 for
> the
> 'A'. Here too, it's clear for MIRA that it must be 'A' and the 'G' is not
> considered.
>
> Please also have a look at
>  http://www.freelists.org/post/mira_talk/Quality-Values,4
> where I gave a short roundup on how MIRA currently calculates qualities.
>
> Now, the reason MIRA uses this approach is that for every sequencing
> technology I've worked with so far (Sanger, 454 and Solexa), there are
> sequencing artefacts that can be overcome only when looking at quality
> values
> and read orientation in an alignment. Looking at coverage alone or quality
> values alone would not be enough to call the "real" base. This strategy
> fails
> in some cases, the most distinct one being when working with sequences
> without
> quality values.
>
> Does this answer your question or do you think that you have a different
> case?
> If yes, please tell :-)
>
> Regards,
>  Bastien
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>
>

Other related posts: