[mira_talk] Re: Questions regarding the paired-end distance calculation done by MIRA

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 30 Jan 2011 22:43:57 +0100

On Friday 28 January 2011 23:41:11 Bayles, Darrell wrote:
> I've been working on an assembly of 454 data that contains a mix of
> shotgun and paired-end reads.  The paired end reads have been
> empirically mapped and have a calculated distance of 3000 bp with a
> standard deviation of 800 bp.  All that information gets put into the
> traceinfo.xml file in the usual fashion, but the distance range that
> MIRA indicates its actually using is quite a bit larger than I expect.
> 
> Bastien previously posted
> (//www.freelists.org/post/mira_talk/TF-and-TT-in-MAF-format,3) that
> he has built in a '"lenience" factor' that allows for 20 or 30%
> deviation.  Was the actual value pinned down?

Hello Darell,

no, it was not. This partucilar lenience factor (I just looked it up) is 15%.

> For the experiment that I mentioned, the *out.caf file indicates MIRA
> has given the paired reads the allowable minimum distance of 600 bp and
> maximum of 5400 bp.  That still gives a mean of 3000 but with a
> deviation of 2400 bp.  I also looked at a previous experiment where the
> pairs were defined as having a mean of 7200 bp and a standard deviation
> of 1800 bp.  MIRA output indicates that it is using a range of 1800 bp
> to 12600 bp.  That would indicate a mean of 7200 bp and a deviation of
> 5400 bp.  In both cases it appears that MIRA is not using a percentage
> factor, but is actually using three standard deviations.  Does that
> sound correct Bastien?

Ah ... my old friends 'mean' and 'stdev' of a library ... actually I hate 
them. People should be able to say "I want the the fragments to be exactly 
within two bounds (upper and lower)" and that should be it. But, no, the 
TRACEINFO standard has 'mean' and 'stdev' *sigh*

So, yes, MIRA has to somehow transform that into an upper and lower bound. And 
as you found out (I know, one more thing I should document), MIRA uses 

  lower/upper = mean -/+ 3*stdev

Agreed, it's quite broad. I did that after getting some projects where the 
stdev was so optimistally narrow that I supposed every lab did that. I think 
I'll add another point on my TODO: let the user choose the multiplicator for 
stdev. 

Note: the +/-15% adds to the lower/upper from above and is meant to counter 
additional 'gap' column bases in raw assemblies. I wouldn't really touch it, 
especially not in 454 assemblies.

> I'd like to be able to reign in that value, and can probably do it by
> indicating an artificially smaller standard deviation in the input
> traceinfo.xml file, but does anyone know if that "lenience factor" can
> be modified directly?

No, not the lenience factor. I'm not sure if I should make this one 
customisable, too.

For what you want to do: atm (and until there's a parameter for that (next 
version?)): artificially reduce your stdev in the TRACEINFO.

Note: if given via the command line (-GE:tismin:tismax), then these boundaries 
are fixed (except for the 15% lenience). But of course, this does not work when 
working with several librarie. Then TRACEINFO is the only way.

Best,
  Bastien

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: