[mira_talk] Re: Questions regarding the paired-end distance calculation done by MIRA

  • From: Martin Asser Hansen <mail@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 31 Jan 2011 09:38:40 +0100

I agree that 'mean' and 'stdev' for insert sizes is silly. It would be much
better to use upper and lower bounds as this reflects how the libraries are
prepared when a band is cut from a gel. One might take into account that a
band in a gel does not move linearly through the gel, but rather in a
logarithmic way.

I guess that is what happens when engineers and computer people ignore the
biologists :o)



Cheers,



Martin

On Sun, Jan 30, 2011 at 10:43 PM, Bastien Chevreux <bach@xxxxxxxxxxxx>wrote:

> On Friday 28 January 2011 23:41:11 Bayles, Darrell wrote:
> > I've been working on an assembly of 454 data that contains a mix of
> > shotgun and paired-end reads.  The paired end reads have been
> > empirically mapped and have a calculated distance of 3000 bp with a
> > standard deviation of 800 bp.  All that information gets put into the
> > traceinfo.xml file in the usual fashion, but the distance range that
> > MIRA indicates its actually using is quite a bit larger than I expect.
> >
> > Bastien previously posted
> > (//www.freelists.org/post/mira_talk/TF-and-TT-in-MAF-format,3) that
> > he has built in a '"lenience" factor' that allows for 20 or 30%
> > deviation.  Was the actual value pinned down?
>
> Hello Darell,
>
> no, it was not. This partucilar lenience factor (I just looked it up) is
> 15%.
>
> > For the experiment that I mentioned, the *out.caf file indicates MIRA
> > has given the paired reads the allowable minimum distance of 600 bp and
> > maximum of 5400 bp.  That still gives a mean of 3000 but with a
> > deviation of 2400 bp.  I also looked at a previous experiment where the
> > pairs were defined as having a mean of 7200 bp and a standard deviation
> > of 1800 bp.  MIRA output indicates that it is using a range of 1800 bp
> > to 12600 bp.  That would indicate a mean of 7200 bp and a deviation of
> > 5400 bp.  In both cases it appears that MIRA is not using a percentage
> > factor, but is actually using three standard deviations.  Does that
> > sound correct Bastien?
>
> Ah ... my old friends 'mean' and 'stdev' of a library ... actually I hate
> them. People should be able to say "I want the the fragments to be exactly
> within two bounds (upper and lower)" and that should be it. But, no, the
> TRACEINFO standard has 'mean' and 'stdev' *sigh*
>
> So, yes, MIRA has to somehow transform that into an upper and lower bound.
> And
> as you found out (I know, one more thing I should document), MIRA uses
>
>  lower/upper = mean -/+ 3*stdev
>
> Agreed, it's quite broad. I did that after getting some projects where the
> stdev was so optimistally narrow that I supposed every lab did that. I
> think
> I'll add another point on my TODO: let the user choose the multiplicator
> for
> stdev.
>
> Note: the +/-15% adds to the lower/upper from above and is meant to counter
> additional 'gap' column bases in raw assemblies. I wouldn't really touch
> it,
> especially not in 454 assemblies.
>
> > I'd like to be able to reign in that value, and can probably do it by
> > indicating an artificially smaller standard deviation in the input
> > traceinfo.xml file, but does anyone know if that "lenience factor" can
> > be modified directly?
>
> No, not the lenience factor. I'm not sure if I should make this one
> customisable, too.
>
> For what you want to do: atm (and until there's a parameter for that (next
> version?)): artificially reduce your stdev in the TRACEINFO.
>
> Note: if given via the command line (-GE:tismin:tismax), then these
> boundaries
> are fixed (except for the 15% lenience). But of course, this does not work
> when
> working with several librarie. Then TRACEINFO is the only way.
>
> Best,
>  Bastien
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

Other related posts: