Hi Bastien, I think there is a problem in the MIRA 4.0 SAM output with FLAG bit fields 0x10 and 0x20 (partner read's strand) sometimes being wrong, giving bad FLAG value pairs like 83 and 131, or 99 and 179. $ /home/pc40583/downloads/mira_4.0_linux-gnu_x86_64_static/bin/miraconvert -f maf -t sam LCN4_LargeContigs_out.maf LCN4_LargeContigs_out.padded.sam Loading from maf, saving to: sam Collecting basic SAM info from MAF file Sorting read info ... done ... Data conversion process finished, no obvious errors encountered. Here's a snippet from a recent MAF file, a MiSeq and 454 hybrid de novo assembly: $ grep "M01157:2:000000000-A27AN:1:1102:8685:23197" LCN4_out.maf -A 8 or, $ grep "M01157:2:000000000-A27AN:1:1102:8685:23197" LCN4_LargeContigs_out.maf -A 8 RD M01157:2:000000000-A27AN:1:1102:8685:23197/2 RG 1 RS TCTCAGATGTTGGTTTTTATCAAATATCAACAACTATAGGTAATAGTTCAGATATGGGATATGTATCAAATCGTTTCCCTATAAATGAAAGCCTTCGATGGTTTAAGCCAAACAGTGGTGCAAAAATGATTTTCGGCGATACAGACCTGAT RQ ?A?A?BBBDDDDDADDFFFFFFHHIHIIIIIIIIHHHIHICFFGGGHIHIHHIIHGHHFHHIHHH=EGGGHHHHHHIIIHHHHIHIIIIHHIFFGGFDFHHHIIIFHHIHHHFFHHGF?DFDFHHHFFHDFFFFAEFEEEEECEBE<=BB= TN M01157:2:000000000-A27AN:1:1102:8685:23197 TS 255 RT HAF5 1 41 = MIRA . RT HAF4 42 46 = MIRA . RT HAF3 47 52 = MIRA . RT HAF4 53 70 = MIRA . RT HAF5 71 151 = MIRA . ER AT 190 340 1 151 -- RD M01157:2:000000000-A27AN:1:1102:8685:23197/1 RG 1 RS AAATACATCTCTATAACCACTTTCAACAGGCATATCACTACGTGTTCTCGCCATTCTCCCGGCATTCACAGTCATCAGGTCTGTATCGCCGAAAATCATTTTTGCACCACTGTTTGGCTTAAACCATCGAAGGCTTTCATTTATAGGGAAA RQ ????ABBBDDDDDDDDGGGGGGIHFHHHHHIIIHHHIGHHIHHHHIIIIHIHHHHHHIIIHHHHHIHHGDFGHIIIIHIEGHHGHIIHHHHHHHHHHHHHHHHHHHHHHHFGFGGGGGGGGGGGGGGGGGGHGGGGGGGGGGGGHGGGGGG TN M01157:2:000000000-A27AN:1:1102:8685:23197 TS 1 RT HAF3 1 8 = MIRA . RT HAF4 9 30 = MIRA . RT HAF5 31 151 = MIRA . ER AT 413 263 1 151 ... From the AT line, read/2 is forwards while read/1 is reversed. Here's the resulting SAM output from miraconvert, note FLAG 131 vs 83: $ grep "M01157:2:000000000-A27AN:1:1102:8685:23197" LCN4_LargeContigs_out.padded.sam M01157:2:000000000-A27AN:1:1102:8685:23197 131 LCN4_c1 190 255 151M = 263 223 TCTCAGATGTTGGTTTTTATCAAATATCAACAACTATAGGTAATAGTTCAGATATGGGATATGTATCAAATCGTTTCCCTATAAATGAAAGCCTTCGATGGTTTAAGCCAAACAGTGGTGCAAAAATGATTTTCGGCGATACAGACCTGAT ?A?A?BBBDDDDDADDFFFFFFHHIHIIIIIIIIHHHIHICFFGGGHIHIHHIIHGHHFHHIHHH=EGGGHHHHHHIIIHHHHIHIIIIHHIFFGGFDFHHHIIIFHHIHHHFFHHGF?DFDFHHHFFHDFFFFAEFEEEEECEBE<=BB= RG:Z:1 PT:Z:1;41;.;HAF5;|42;46;.;HAF4;|47;52;.;HAF3;|53;70;.;HAF4;|71;151;.;HAF5; M01157:2:000000000-A27AN:1:1102:8685:23197 83 LCN4_c1 263 255 151M = 190 -223 TTTCCCTATAAATGAAAGCCTTCGATGGTTTAAGCCAAACAGTGGTGCAAAAATGATTTTCGGCGATACAGACCTGATGACTGTGAATGCCGGGAGAATGGCGAGAACACGTAGTGATATGCCTGTTGAAAGTGGTTATAGAGATGTATTT GGGGGGHGGGGGGGGGGGGHGGGGGGGGGGGGGGGGGGFGFHHHHHHHHHHHHHHHHHHHHHHHIIHGHHGEIHIIIIHGFDGHHIHHHHHIIIHHHHHHIHIIIIHHHHIHHGIHHHIIIHHHHHFHIGGGGGGDDDDDDDDBBBA???? RG:Z:1 PT:Z:144;151;.;HAF3;|122;143;.;HAF4;|1;121;.;HAF5 The second line is for read one, FLAG 83 = 0x53 = 0x1 + 0x2 + 0x10 + 0x40 0x1 = read paired 0x2 = read mapped in proper pair 0x10 = read reverse strand 0x40 = first in pair The first line is for read two, FLAG 131 = 0x83 = 0x1 + 0x2 + 0x80 0x1 = read paired 0x2 = read mapped in proper pair 0x80 = second in pair This is missing bit 0x20 indicating the partner (read one) is on the reverse strand, i.e. I think it should be: FLAG 161 = 0xA3 = 0x1 + 0x2 + 0x20 + 0x80 0x1 = read paired 0x2 = read mapped in proper pair 0x20 = mate reverse strand 0x80 = second in pair (It seems using samtools fixmate drops the 0x2 bit for some reason...) -------------------------------------- Here's another example from the same assembly: $ grep "M01157:2:000000000-A27AN:1:1113:12717:15930" LCN4_out.maf -A A or $ grep "M01157:2:000000000-A27AN:1:1113:12717:15930" LCN4_LargeContigs_out.maf -A 4 RD M01157:2:000000000-A27AN:1:1113:12717:15930/1 RG 1 RS CCTTCATTTCAGATTGCGCATCGCGCGTTGGCAAATACGGTGGGTCGCGGCAGAGGGGGATTCTTGCAACCATTGCTGACACAAATCCATTACCC RQ ?????<?BBBBBB?<BFECAEC>ECCEHHHHHHHFHHHHHH;@EDFEHEHHHC@FHHH73=DFDFFFFFFFDDEDEDEBEEEEDEECEEEE;,3? TN M01157:2:000000000-A27AN:1:1113:12717:15930 TS 1 RT HAF3 1 95 = MIRA . ER AT 178472 178566 1 95 RD M01157:2:000000000-A27AN:1:1113:12717:15930/2 RG 1 RS GGGTAATGGATTTGTGTCAGCAATGGTTGCAAGAATCCCCCTCTGCCGCGACCCACCGTATTTGCCAACGCGCGATGCGCAATCTGAAATGAAGG RQ ?????BBBD?DDDBBBFFFFFFFFFHFHHHIIIHIHHHIHHHHIIHGHH@CCHHFHHH@FHIHDFHHHHEHHHDBEHH:DAEEFBDEEFFFEFFF TN M01157:2:000000000-A27AN:1:1113:12717:15930 TS 255 RT HAF3 1 95 = MIRA . ER AT 178566 178472 1 95 The AT lines in the MAF file tell me read/2 is forward, while read/1 is reversed. Note that by chance the two reads are the reverse complement of each other, and therefore map to the exact same place! $ grep "M01157:2:000000000-A27AN:1:1113:12717:15930" LCN4_LargeContigs_out.padded.sam M01157:2:000000000-A27AN:1:1113:12717:15930 179 LCN4_c1 178472 255 95M = 178472 -94 CCTTCATTTCAGATTGCGCATCGCGCGTTGGCAAATACGGTGGGTCGCGGCAGAGGGGGATTCTTGCAACCATTGCTGACACAAATCCATTACCC FFFEFFFEEDBFEEAD:HHEBDHHHEHHHHFDHIHF@HHHFHHCC@HHGHIIHHHHIHHHIHIIIHHHFHFFFFFFFFFBBBDDD?DBBB????? RG:Z:1 PT:Z:1;95;.;HAF3; M01157:2:000000000-A27AN:1:1113:12717:15930 99 LCN4_c1 178472 255 95M = 178472 94 CCTTCATTTCAGATTGCGCATCGCGCGTTGGCAAATACGGTGGGTCGCGGCAGAGGGGGATTCTTGCAACCATTGCTGACACAAATCCATTACCC ?????<?BBBBBB?<BFECAEC>ECCEHHHHHHHFHHHHHH;@EDFEHEHHHC@FHHH73=DFDFFFFFFFDDEDEDEBEEEEDEECEEEE;,3? RG:Z:1 PT:Z:1;95;.;HAF3; The second line is for read one, on the forward strand: FLAG 99 = 0x63 = 0x1 + 0x2 + 0x20 + 0x40 0x1 = read paired 0x2 = read mapped in proper pair 0x20 = mate reverse strand 0x40 = first in pair The first line is for read two, on the reverse strand: FLAG 179 = 0xB3 = 0x1 + 0x2 + 0x10 + 0x20 + 0x80 0x1 = read paired 0x2 = read mapped in proper pair 0x10 = read reverse strand 0x20 = mate reverse strand 0x80 = second in pair This wrongly includes 0x20 in the FLAG, it should I think be 147 = 0x1 + 0x2 + 0x10 + 0x80 (as used via samtools fixmate). I've read over mira/sam_collect.C and can see where the MAF AT line is parsed, and the read direction set to +1 (forward) or -1 (reverse), and I can see where this information is used latter to build the FLAG value - but the nature of the error escapes me. Peter -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html