[mira_talk] Re: Question: padded or unpadded outputs

  • From: Peter Cock <p.j.a.cock@xxxxxxxxxxxxxx>
  • To: Bastien Chevreux <bach@xxxxxxxxxxxx>, "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>, lenis vasilis <val1@xxxxxxxxxx>
  • Date: Mon, 23 Mar 2015 16:45:10 +0000

On Thu, Mar 19, 2015 at 6:30 PM, Peter Cock <p.j.a.cock@xxxxxxxxxxxxxx> wrote:
> On Thu, Mar 19, 2015 at 6:20 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:
>> On 19 Mar 2015, at 18:45 , Peter Cock <p.j.a.cock@xxxxxxxxxxxxxx> wrote:
>>> Bastien - is there a recommend way to see in the MAF v2 format
>>> if a read is part one or part two of a pair?
>>
>> Some doc update needed I think. Yes, the TS (Template Segment)
>> line: the first gets a “1”, the last a “255”, inbetween get 2 to 254.
>> For sequencing technologies with pairs, that makes it have just
>>“1” and “255”.
>>
>> B.
>
> Lovely - I looked at that but the 255 surprised me, so I didn't
> like to guess.
>
> Peter

Thanks Bastien,

That seems to be working now:
https://github.com/peterjc/maf2sam/commit/eb3d798daf3b1e2d445dfc44d7f5474a836808f5

Lenis,

Your example now works for me, reporting a sensible small fraction
of the reads to be orphaned:

$ python maf2fasta.py vasilis-test-S2-mt-lane6.maf
vasilis-test-S2-mt-lane6.padded.fasta
vasilis-test-S2-mt-lane6.unpadded.fasta
chrM_bb
Done

$ ./maf2sam.py vasilis-test-S2-mt-lane6.padded.fasta
vasilis-test-S2-mt-lane6.maf > vasilis-test-S2-mt-lane6.padded.sam
[maf2sam] NOTE: Producing SAM using a gapped reference sequence.
[maf2sam] Identified as MIRA v3.9 or later (MAF v2)
[maf2sam] WARNING - Support for this is *still* EXPERIMENTAL!
[maf2sam] Identified 3 read groups
[maf2sam] Starting main pass though the MAF file
[maf2sam] Unpaired read chrM
[maf2sam] Almost done, 1047 orphaned paired reads remain
[maf2sam] Done

$ ./maf2sam.py vasilis-test-S2-mt-lane6.unpadded.fasta
vasilis-test-S2-mt-lane6.maf > vasilis-test-S2-mt-lane6.unpadded.sam
[maf2sam] Identified as MIRA v3.9 or later (MAF v2)
[maf2sam] WARNING - Support for this is *still* EXPERIMENTAL!
[maf2sam] Identified 3 read groups
[maf2sam] Starting main pass though the MAF file
[maf2sam] Unpaired read chrM
[maf2sam] Almost done, 1047 orphaned paired reads remain
[maf2sam] Done

$ ./sam2bam.py vasilis-test-S2-mt-lane6.padded.sam
vasilis-test-S2-mt-lane6.unpadded.sam
samtools view -b -S vasilis-test-S2-mt-lane6.padded.sam | samtools
sort - vasilis-test-S2-mt-lane6.padded
[samopen] SAM header is present: 1 sequences.
[bam_header_read] EOF marker is absent. The input is probably truncated.
samtools index vasilis-test-S2-mt-lane6.padded.bam
samtools idxstats vasilis-test-S2-mt-lane6.padded.bam
chrM_bb    16747    20486    0
*    0    0    0
samtools view -b -S vasilis-test-S2-mt-lane6.unpadded.sam | samtools
sort - vasilis-test-S2-mt-lane6.unpadded
[bam_header_read] EOF marker is absent. The input is probably truncated.
[samopen] SAM header is present: 1 sequences.
samtools index vasilis-test-S2-mt-lane6.unpadded.bam
samtools idxstats vasilis-test-S2-mt-lane6.unpadded.bam
chrM_bb    16616    20486    0
*    0    0    0

This was with samtools 0.1.19 and the "EOF marker is absent" message
here was a false alarm, see https://github.com/samtools/samtools/issues/18
(that bug has since been fixed in samtools).

Both the padded and unpadded files loaded fine in Tablet v1.14.11.07
tested on Mac OS X.

I'm not sure how I missed this back in November 2013 when I updated
maf2sam.py to handle the new MAF v2 format from MIRA 3.9+. I was
right to put the big "EXPERIMENTAL" warning in though ;)

Sorry about this, and thank you for sharing this test file with me.

Peter

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: