I agree that 'mean' and 'stdev' for insert sizes is silly. It would be much better to use upper and lower bounds as this reflects how the libraries are prepared when a band is cut from a gel. One might take into account that a band in a gel does not move linearly through the gel, but rather in a logarithmic way. I guess that is what happens when engineers and computer people ignore the biologists :o) Cheers, Martin On Sun, Jan 30, 2011 at 10:43 PM, Bastien Chevreux <bach@xxxxxxxxxxxx>wrote: > On Friday 28 January 2011 23:41:11 Bayles, Darrell wrote: > > I've been working on an assembly of 454 data that contains a mix of > > shotgun and paired-end reads. The paired end reads have been > > empirically mapped and have a calculated distance of 3000 bp with a > > standard deviation of 800 bp. All that information gets put into the > > traceinfo.xml file in the usual fashion, but the distance range that > > MIRA indicates its actually using is quite a bit larger than I expect. > > > > Bastien previously posted > > (//www.freelists.org/post/mira_talk/TF-and-TT-in-MAF-format,3) that > > he has built in a '"lenience" factor' that allows for 20 or 30% > > deviation. Was the actual value pinned down? > > Hello Darell, > > no, it was not. This partucilar lenience factor (I just looked it up) is > 15%. > > > For the experiment that I mentioned, the *out.caf file indicates MIRA > > has given the paired reads the allowable minimum distance of 600 bp and > > maximum of 5400 bp. That still gives a mean of 3000 but with a > > deviation of 2400 bp. I also looked at a previous experiment where the > > pairs were defined as having a mean of 7200 bp and a standard deviation > > of 1800 bp. MIRA output indicates that it is using a range of 1800 bp > > to 12600 bp. That would indicate a mean of 7200 bp and a deviation of > > 5400 bp. In both cases it appears that MIRA is not using a percentage > > factor, but is actually using three standard deviations. Does that > > sound correct Bastien? > > Ah ... my old friends 'mean' and 'stdev' of a library ... actually I hate > them. People should be able to say "I want the the fragments to be exactly > within two bounds (upper and lower)" and that should be it. But, no, the > TRACEINFO standard has 'mean' and 'stdev' *sigh* > > So, yes, MIRA has to somehow transform that into an upper and lower bound. > And > as you found out (I know, one more thing I should document), MIRA uses > > lower/upper = mean -/+ 3*stdev > > Agreed, it's quite broad. I did that after getting some projects where the > stdev was so optimistally narrow that I supposed every lab did that. I > think > I'll add another point on my TODO: let the user choose the multiplicator > for > stdev. > > Note: the +/-15% adds to the lower/upper from above and is meant to counter > additional 'gap' column bases in raw assemblies. I wouldn't really touch > it, > especially not in 454 assemblies. > > > I'd like to be able to reign in that value, and can probably do it by > > indicating an artificially smaller standard deviation in the input > > traceinfo.xml file, but does anyone know if that "lenience factor" can > > be modified directly? > > No, not the lenience factor. I'm not sure if I should make this one > customisable, too. > > For what you want to do: atm (and until there's a parameter for that (next > version?)): artificially reduce your stdev in the TRACEINFO. > > Note: if given via the command line (-GE:tismin:tismax), then these > boundaries > are fixed (except for the 15% lenience). But of course, this does not work > when > working with several librarie. Then TRACEINFO is the only way. > > Best, > Bastien > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html >