[mira_talk] Bug maf2sam conversion

  • From: "Walter, Mathias" <mathias@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 18 Nov 2013 14:25:36 +0100

Hi,

I've found a bug in miraconvert 4.0rc4 which produces an incorrect
CIGAR string. This prevents samtools view to convert the entire sam
file into a bam file.

Here is the error message I got from samtools:

Line 985061, sequence length 151 vs 152 from CIGAR
Parse error at line 985061: CIGAR and sequence length are inconsistent

If I look at line 985061 in the sam output I can see that the GIGAR
string sums up to 152 were the sequence length is 151.

SRR123456.925346  81 ref_bb   1  255   46S84M1D2M2D14M3D6M  =  44206 0
 
AACAAGTAAATCGGATGTGTCAAACTCGCTATTTAAATATATATTATAACTAATTGTTGACATTTGTGCCTATTTATATATACTTAATGTGATACAGATTAAGCCATATATTTGTAAGCACTTATGAAAGACAGTCTCATTACCTTAAAAC
 
??FDFB.+C:DHFFFHHHFF@>HD=BEDFHHGFGDHGFHHHHHHHHGCACE87,EHFHFGFGFHHHHGFFCGFHHIHHHHFGGDGHHFHFHHHGHHGGHHHFHHHHGGHHHHHEHGGDBAGFFHHHHHHFCFFFFBBDDB?<DBBB?????
 RG:Z:2   PT:Z:15

If I look at the proper read in the maf file I can see the following
read (reverse complement):

AT      1       113     45      157
RD      SRR123456.925346/1
RG      2
RS      
GTTTTA***AGGTAATGAGACTG**TC*TTTCATAAGTGCTTACAAATATATGGCTTAATCTGTATCACATTAAGTATATATAAATAGGCACAAATGTCAACAATTAGTTATAA*TATATATTTAAATAGCGAGTTTGACACATCCGATTTACTTGTT
RQ      
?????BBBBBBD<?BDDBBFFFFDDCFGHHHHHHFFGABDGGHEHHHHHGGHHHHFHHHGGHHGHHHFHFHHGDGGFHHHHIHHFGCFFGHHHHFGFGFHFHE,78ECACGHHHHHHHHHFGHDGFGHHFDEB=DH>@FFHHHFFFHD:C+.BFDF??
TN      SRR123456.925346
TS      1
QR      112
RT      HAF3    1       1       =       MIRA    .
RT      HAF5    2       18      =       MIRA    .
RT      HAF4    19      19      =       MIRA    .
RT      HAF5    20      36      =       MIRA    .
RT      HAF4    37      38      =       MIRA    .
RT      HAF3    39      42      =       MIRA    .
RT      HAF5    43      61      =       MIRA    .
RT      HAF3    62      64      =       MIRA    .
RT      HAF5    65      74      =       MIRA    .
RT      HAF3    75      78      =       MIRA    .
RT      HAF5    79      88      =       MIRA    .
RT      HAF4    89      95      =       MIRA    .
RT      HAF7    96      105     =       MIRA    .
RT      HAF5    106     112     =       MIRA    .
RT      HAF3    113     116     =       MIRA    .
RT      HAF4    117     117     =       MIRA    .
RT      HAF5    118     138     =       MIRA    .
RT      HAF3    139     142     =       MIRA    .
RT      HAF5    143     152     =       MIRA    .
RT      HAF4    153     158     =       MIRA    .

The last * is covered by the soft-clipped region and shout not be
counted. Hence the CIGAR string should start with "45S84..".
Fixing this manually resolves the conversion problem.

--
Kind regards,
Mathias

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: