On Friday 28 January 2011 23:41:11 Bayles, Darrell wrote: > I've been working on an assembly of 454 data that contains a mix of > shotgun and paired-end reads. The paired end reads have been > empirically mapped and have a calculated distance of 3000 bp with a > standard deviation of 800 bp. All that information gets put into the > traceinfo.xml file in the usual fashion, but the distance range that > MIRA indicates its actually using is quite a bit larger than I expect. > > Bastien previously posted > (//www.freelists.org/post/mira_talk/TF-and-TT-in-MAF-format,3) that > he has built in a '"lenience" factor' that allows for 20 or 30% > deviation. Was the actual value pinned down? Hello Darell, no, it was not. This partucilar lenience factor (I just looked it up) is 15%. > For the experiment that I mentioned, the *out.caf file indicates MIRA > has given the paired reads the allowable minimum distance of 600 bp and > maximum of 5400 bp. That still gives a mean of 3000 but with a > deviation of 2400 bp. I also looked at a previous experiment where the > pairs were defined as having a mean of 7200 bp and a standard deviation > of 1800 bp. MIRA output indicates that it is using a range of 1800 bp > to 12600 bp. That would indicate a mean of 7200 bp and a deviation of > 5400 bp. In both cases it appears that MIRA is not using a percentage > factor, but is actually using three standard deviations. Does that > sound correct Bastien? Ah ... my old friends 'mean' and 'stdev' of a library ... actually I hate them. People should be able to say "I want the the fragments to be exactly within two bounds (upper and lower)" and that should be it. But, no, the TRACEINFO standard has 'mean' and 'stdev' *sigh* So, yes, MIRA has to somehow transform that into an upper and lower bound. And as you found out (I know, one more thing I should document), MIRA uses lower/upper = mean -/+ 3*stdev Agreed, it's quite broad. I did that after getting some projects where the stdev was so optimistally narrow that I supposed every lab did that. I think I'll add another point on my TODO: let the user choose the multiplicator for stdev. Note: the +/-15% adds to the lower/upper from above and is meant to counter additional 'gap' column bases in raw assemblies. I wouldn't really touch it, especially not in 454 assemblies. > I'd like to be able to reign in that value, and can probably do it by > indicating an artificially smaller standard deviation in the input > traceinfo.xml file, but does anyone know if that "lenience factor" can > be modified directly? No, not the lenience factor. I'm not sure if I should make this one customisable, too. For what you want to do: atm (and until there's a parameter for that (next version?)): artificially reduce your stdev in the TRACEINFO. Note: if given via the command line (-GE:tismin:tismax), then these boundaries are fixed (except for the 15% lenience). But of course, this does not work when working with several librarie. Then TRACEINFO is the only way. Best, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html