[mira_talk] Re: Getting observed insert size etc for paired end reads

  • From: Peter Cock <p.j.a.cock@xxxxxxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Thu, 13 Feb 2014 09:48:50 +0000

On Wed, Feb 12, 2014 at 9:38 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:
> On 12 Feb 2014, at 19:15 , Peter Cock <p.j.a.cock@xxxxxxxxxxxxxx> wrote:
>> For paired end libraries, does MIRA report any of the observed
>> template/insert size, orientation, fraction assembled in same
>> contig, or something like the SAM FLAG for "properly mapped"
>> somewhere?
>
> That's something where proper output did not make it into 4.0. There
> are ways to extract all that info (except fraction) from the output log,
> but they're tedious and the format will change. If you want to give it
> a try anyway, grep for "^ATG" in the log. You'll see reports (mean,
> stdev, skewdness, inferred min/max) for every readgroup. Pass 1
> does not count, look in pass 2 and there for the final predictions.
>
> If you are just interested in min/max of the templates, look at the
> header of the MAF files (results or, during assembly, checkpoint).
>
> B.

Great - thanks Bastien.

I'm doing this with grep:

$ grep ^ATG -A 3 assembly.log
ATG PREDICTIONS
rgid: 1    c: 161712    sp: -2    m: 113.6219227891    d:
31.3431858333    s: -0.1626077222    -: 50    +: 176
rgid: 1    c: 620869    sp: -1    m: 244.2576642220    d:
123.1716308278    s: 0.7693070253    -: 22    +: 515
Final prediction: rgid: 1    c: 620869    sp: -1    m: 244.2576642220
  d: 123.1716308278    s: 0.7693070253    -: 22    +: 515
--
ATG PREDICTIONS
rgid: 1    c: 161180    sp: -2    m: 111.8964201514    d:
30.8984276809    s: -0.1555124536    -: 50    +: 173
rgid: 1    c: 613365    sp: -1    m: 231.4318040621    d:
114.9710732224    s: 0.7802482977    -: 24    +: 484
Final prediction: rgid: 1    c: 613365    sp: -1    m: 231.4318040621
  d: 114.9710732224    s: 0.7802482977    -: 24    +: 484
--
ATG PREDICTIONS
rgid: 1    c: 160397    sp: -2    m: 111.2749847255    d:
30.7681223973    s: -0.1499654587    -: 49    +: 172
rgid: 1    c: 610365    sp: -1    m: 230.1019698574    d:
114.2280587811    s: 0.7776165138    -: 24    +: 481
Final prediction: rgid: 1    c: 610365    sp: -1    m: 230.1019698574
  d: 114.2280587811    s: 0.7776165138    -: 24    +: 481
--
ATG PREDICTIONS
rgid: 1    c: 159008    sp: -2    m: 111.2163714004    d:
30.7810822718    s: -0.1497065318    -: 49    +: 172
rgid: 1    c: 604403    sp: -1    m: 229.8911552540    d:
114.0643954342    s: 0.7746355633    -: 24    +: 480
Final prediction: rgid: 1    c: 604403    sp: -1    m: 229.8911552540
  d: 114.0643954342    s: 0.7746355633    -: 24    +: 480
--
ATG PREDICTIONS
rgid: 1    c: 159089    sp: -2    m: 111.3048965994    d:
30.6778592767    s: -0.1515380996    -: 49    +: 172
rgid: 1    c: 608715    sp: -1    m: 229.7921947666    d:
113.9986506361    s: 0.7775140932    -: 24    +: 480
Final prediction: rgid: 1    c: 608715    sp: -1    m: 229.7921947666
  d: 113.9986506361    s: 0.7775140932    -: 24    +: 480

Those are the predictions for each of the five passes - settling down to a mean
template size of approx 230, standard deviation 114, skew 0.77, min 24, max 480.

However, the MAF header seems to have the min/max from the first pass (22, 515),
is that an error?

@ReadGroup
@RG    name    MiSeq
@RG    ID    1
@RG    technology    Solexa
@RG    strainname    StrainX
@RG    templatesize    22    515
@RG    segmentplacement    FR
@RG    segmentnaming    solexa
@EndReadGroup

Thanks,

Peter

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: