[mira_talk] Re: Quality Values
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Wed, 18 Mar 2009 19:19:50 +0100
On Wednesday 18 March 2009 Burkhard Steuernagel wrote:
> I have two questins.
> 1. How does Mira calculate the output quality values? Is that still the
> same algorithm as you describe in your phd-thesis?
Ha, one of the few place in the code where I documented well :-) The principle
has remained, there have been changes in details.
Here's the cut'n'paste from the basic quality computing routine of a base
group for one sequencing method (please use a fixed font like Courier for
displaying):
/* errorrate for this group is computed as follows:
Best quality for a base in a direction makes basic rate = 100%
add to this: 10% of next best base quality
Same procedure for other direction, then add both qualities
In general, the values are almost the same (mostly a tad higher) as
with the more complicated (and time consuming) old variant.
Cap at 90
e.g.
+ A 30 -> 30 \
+ A 20 -> 2 \
+ A 20 /+ = 32 \
+ A 20 / \
. > + = 60
- A 26 -> 26 \ /
- A 20 -> 2 > + = 28 /
- A 15 /
*/
The reason I changed this was that the computation with log values is
incredibly time consuming and this method gives comparable values.
To account for several sequencing methods, the following routine is used in
case all sequencing methods agree:
quality = sum(0.75*quality of each seqtype)
If this value is less that the best quality of any sequencing method, that
that best quality instead.
When the sequencing methods disagree on a given base, MIRA will decide whether
it can resolve the disagreement by deciding whether the base from one method
"is probably a sequencing error" or if there is a true disagreement.
The quality for "true sequencing error" is then the quality of the base from
the "chosen right" sequencing method. If there is a "true disagreement", the
quality is 0 (I think, need to check).
> 2. We have sequenced BACs with 454 FLX, about 20 fold coverage. The
> assemblies with MIRA show very nice results. I have also Sanger BAC End
> Sequences. Is there a good way to also use them for the assembly?
Yep, as Jan already said: works well. Include them simply as Sanger sequence.
Only problem: these sequences need to be <20kb.
Regards,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: