[mira_talk] Re: Quality Values

On Wednesday 18 March 2009 Burkhard Steuernagel wrote:
> I have two questins.
> 1. How does Mira calculate the output quality values? Is that still the
> same algorithm as you describe in your phd-thesis?

Ha, one of the few place in the code where I documented well :-) The principle 
has remained, there have been changes in details. 

Here's the cut'n'paste from the basic quality computing routine of a base 
group for one sequencing method (please use a fixed font like Courier for 
displaying):

  /* errorrate for this group is computed as follows:
     Best quality for a base in a direction makes basic rate = 100%
     add to this: 10% of next best base quality
     
     Same procedure for other direction, then add both qualities

     In general, the values are almost the same (mostly a tad higher) as
     with the more complicated (and time consuming) old variant.
     
     Cap at 90
     
     e.g.
     + A 30     -> 30       \
     + A 20     ->  2        \
     + A 20                  /+ = 32    \
     + A 20                 /            \
     .                                    > + = 60
     - A 26     -> 26     \              /
     - A 20     ->  2      >  + = 28    /
     - A 15               /
  */

The reason I changed this was that the computation with log values is 
incredibly time consuming and this method gives comparable values.


To account for several sequencing methods, the following routine is used in 
case all sequencing methods agree:

   quality = sum(0.75*quality of each seqtype) 

If this value is less that the best quality of any sequencing method, that 
that best quality instead.

When the sequencing methods disagree on a given base, MIRA will decide whether 
it can resolve the disagreement by deciding whether the base from one method 
"is probably a sequencing error" or if there is a true disagreement.

The quality for "true sequencing error" is then the quality of the base from 
the "chosen right" sequencing method. If there is a "true disagreement", the 
quality is 0 (I think, need to check).

> 2. We have sequenced BACs with 454 FLX, about 20 fold coverage. The
> assemblies with MIRA show very nice results. I have also Sanger BAC End
> Sequences. Is there a good way to also use them for the assembly?

Yep, as Jan already said: works well. Include them simply as Sanger sequence. 
Only problem: these sequences need to be <20kb.

Regards,
  Bastien


-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: