Another option. SAM --> BAM --> Sorted BAM samtools depth sorted.bam | awk '($2!=p+1){print p+1 "-" $2-1} {p=$2}' The awk thing finds gaps in sequential numbers. It's not my doing. Posted a question to a computer forum and this was the neatest approach. Still don't quite understand it all but it does seem to work. I've used it to do exactly what is being asked - find regions that are missing in comparison to the reference sequence. Shaun From: John Nash <john.he.nash@xxxxxxxxx> To: mira_talk@xxxxxxxxxxxxx Date: 2014-02-04 08:25 AM Subject: [mira_talk] Re: Help with MCVc (missing Co Verage in consenus) after mapping assembly Sent by: mira_talk-bounce@xxxxxxxxxxxxx On Feb 4, 2014, at 8:22 AM, Austen Chen <cyausten@xxxxxxxxx> wrote: Firstly I apologise for invading your privacy which was done unintentionally. After sending the email I couldn't see it on the list, so i wasn't sure if it's being sent or not, and with your name and photos popping out everywhere on the web page, looks like you were happy for people to contact you and that's why i thought i could just ask you if i had sent my mail successfully, and i didn't mean to demand a response from you straightaway, and i know you are very helpful and quick in response, and there is no need to do so, so merely misunderstanding and sorry to have upset you. Still thank you very much for your help, and i don't think I had explained my problem clearly. you are right that I made a mapping, imported that into gap4, made some edits/finishing there and then exported the tag positions from within gap4. this tag position has been illustrated in your mira manual page 17 (version 4.0rc5): Figure 1.18 "MCVc" tag showing a genome deletion in Solexa mapping assembly. as you can see in the figure the top line is a bacterial reference sequence, and the missing dark red stretch in the consensus is given at a position of 554830 in reference to the reference sequence. what i have done is that i have made two separate mapping assemblies, one is A1 against an annotated GenBank bacterial reference sequence, and another is A3 against the same reference sequence, and I want to find if A1 and A3 have any common missing regions, and that's where i had my problem. For example, as you can see in Gap4 A1 has no coverage at a position of 165410 with a missing length of 404bp (165410-165814) position 165410 length 404 type MCVc comment 'gff3str=.' <image.png> and as for A3, it has no coverage at a position of 165209 with a missing length of 827bp (165209-166036) position 165209 length 827 type MCVc comment 'gff3str=.' <image.png> Presumably A1 and A3 would have a common missing region from 165410-165814, but this is not the case as each missing position in A1 and A3 is not referring to the same position in relation to the reference sequence, and hope i have explained it here clearly. it looks like i am able to find missing coverage in each strain but unable to find common missing coverage between these 2 strains, and that's why I was asking you for help if my interpretation was correct and you could give me some suggestions on how to tackle this problem. If both of the genomes are mapped to the same reference sequence, you could make sam and then bam files from each assembly, merge them with picard-tools' bam merge tool, and then view the merged assembly in Tablet. That way you can see each of the read groups, and their relationship to the reference sequence. J