[mira_talk] Re: Help with MCVc (missing Co Verage in consenus) after mapping assembly

  • From: Shaun Tyler <Shaun.Tyler@xxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 4 Feb 2014 09:06:38 -0600

Another option.

SAM --> BAM --> Sorted BAM

samtools depth sorted.bam | awk '($2!=p+1){print p+1 "-" $2-1} {p=$2}'

The awk thing finds gaps in sequential numbers.  It's not my doing.  Posted
a question to a computer forum and this was the neatest approach.  Still
don't quite understand it all but it does seem to work.  I've used it to do
exactly what is being asked - find regions that are missing in comparison
to the reference sequence.

Shaun



From:   John Nash <john.he.nash@xxxxxxxxx>
To:     mira_talk@xxxxxxxxxxxxx
Date:   2014-02-04 08:25 AM
Subject:        [mira_talk] Re: Help with MCVc (missing Co Verage in consenus)
            after mapping assembly
Sent by:        mira_talk-bounce@xxxxxxxxxxxxx




On Feb 4, 2014, at 8:22 AM, Austen Chen <cyausten@xxxxxxxxx> wrote:

      Firstly I apologise for invading your privacy which was done
      unintentionally. After sending the email I couldn't see it on the
      list, so i wasn't sure if it's being sent or not, and with your name
      and photos popping out everywhere on the web page, looks like you
      were happy for people to contact you and that's why i thought i could
      just ask you if i had sent my mail successfully, and i didn't mean to
      demand a response from you straightaway, and i know you are very
      helpful and quick in response, and there is no need to do so, so
      merely misunderstanding and sorry to have upset you.

      Still thank you very much for your help, and i don't think I had
      explained my problem clearly. you are right that I made a mapping,
      imported that into gap4, made some edits/finishing there and then
      exported the tag positions from within gap4. this tag position has
      been illustrated in your mira manual page 17 (version 4.0rc5): Figure
      1.18  "MCVc" tag showing a genome deletion in Solexa mapping
      assembly. as you can see in the figure the top line is a bacterial
      reference sequence, and the missing dark red stretch in the consensus
      is given at a position of 554830 in reference to the reference
      sequence. what i have done is that i have made two separate mapping
      assemblies, one is A1 against an annotated GenBank bacterial
      reference sequence, and another is A3 against the same reference
      sequence, and I want to find if A1 and A3 have any common missing
      regions, and that's where i had my problem. For example, as you can
      see in Gap4 A1 has no coverage at a position of 165410 with a missing
      length of 404bp (165410-165814)

      position 165410
      length 404
      type MCVc
      comment 'gff3str=.'

      <image.png>

      and as for A3, it has no coverage at a position of 165209 with a
      missing length of 827bp (165209-166036)

      position 165209
      length 827
      type MCVc
      comment 'gff3str=.'

      <image.png>

      Presumably A1 and A3 would have a common missing region from
      165410-165814, but this is not the case as each missing position in
      A1 and A3 is not referring to the same position in relation to the
      reference sequence, and hope i have explained it here clearly. it
      looks like i am able to find missing coverage in each strain but
      unable to find common missing coverage between these 2 strains, and
      that's why I was asking you for help if my interpretation was correct
      and you could give me some suggestions on how to tackle this problem.


If both of the genomes are mapped to the same reference sequence, you could
make sam and then bam files from each assembly, merge them with
picard-tools' bam merge tool, and then view the merged assembly in Tablet.
That way you can see each of the read groups, and their relationship to the
reference sequence.

J

GIF image

Other related posts: