[mira_talk] Re: Gap closure

  • From: Shaun Tyler <Shaun.Tyler@xxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 6 May 2012 18:16:15 -0500

There is no “this is how you do it” when it comes to genome closure.  Every
project is different and will have different challenges which require
different approaches.  However, I can offer a few tips.
First off and most importantly is to know what it is you want to
accomplish.  This is a question that should have been addressed before you
even started with the sequencing as it ultimately impacts on the route you
need to take.  Characterizing a novel organism is different from doing a
comparative project looking at differences in gene content, SNPs,
phylogenetic relatedness, etc.  Too often people embark on these projects
without having a clear idea of the questions they want answered only to
find out that the tract they’ve taken is not really appropriate.  But your
question was on closing gaps.


Basically you’ll do this by PCR and conventional sequencing.  Typically the
genome coverage will be close to 100 % so the gaps you are left with are
generally fairly short or are due to repeats like rRNA sequences, IS
elements and things like that.  The first thing to do is to get to know
your data.  You mentioned 14 repeats.  What are they?  Are they different
copies of the same thing?   How big are they and what coverage do they have
compared to the rest of the genome?  They may be separate contigs but could
still represent multiple copies of the same thing.  Do you care about these
sequences and closing the gaps they create?


Start with doing some simple BLAST searches and looking over the ancillary
data files created by MIRA.  The better you understand the data the better
off you will be in planning a route of attack.  You should also use
something like GAP5, Tablet or other pile up viewers to assess the contigs
created.  Inconsistencies in depth of coverage or paired end distribution
could indicate misassemblies.  MIRA is good but nothing is perfect.
Contigs flanking repeats will typically have a higher depth of coverage at
the terminal ends.  Interrogate these regions to find out what repeat you
are dealing with.  It will also aid you in primer design.  No point putting
a primer at the end of a contig if it is likely to prime in multiple
locations in the genome and give you multiple products.


Scaffolding is also going to be required in order to orient your contigs
and plan your PCR experiments.  If you don’t have paired end data to
facilitate this you can use programs like Mauve, Mummer/Nucmer, Abacus,
Projector2, etc. to order your contigs based on a reference sequence.  Just
keep in mind that the resulting contig order is only as good as the
reference sequence (and program) you use so trying different reference
sequences and looking for consistency is usually recommended.  In the end
it probably won’t be perfect so expect some predictions to be wrong.
Abacus and Projector2 are nice because they will provide you with a primer
list for closing gaps but I sometimes question the ordering that they come
up with and the primers designed are not always the best (e.g. they can
target obvious repeat regions).


Another tip has to do with the repeats.  In all likelihood you will have
multiple copies of rRNA sequences or other large regions that will require
primer walking.  If you are looking to close these gaps make use of the
data you have to make things easier.  The rRNA regions for example will
typically be 5 Kb or so and would require multiple rounds of
sequence-design primer-sequence-design primer, etc.  But you can predesign
the primers to cover these repetitive regions base on the data you have so
that you can use the same sequencing primers for all of the different
copies.


I could go on and on about what to do in different situations but I think
this covers the main areas.


Good Luck and Have Fun. Oh and be prepared for frustration and
disappointment ;-)


Shaun






From:   Shankar Manoharan <shankarmanostar@xxxxxxxxx>
To:     mira_talk@xxxxxxxxxxxxx
Date:   2012-05-06 01:05 PM
Subject:        [mira_talk] Gap closure
Sent by:        mira_talk-bounce@xxxxxxxxxxxxx



Dear all...
     Firstly thanks a LOT for your earlier support. I have now managed to
assemble a ~4.5 Mb genome and obtain 48 contigs of which 14 are repeats. If
I have to close gaps, what is the best possible way possible ? Any input
would be greatly appreciated. Many thanks in advance.

Best regards,

Shankar Manoharan
Graduate Student
Department of Genetics
Madurai Kamaraj University
Ph. +919790167534

I strongly believe in doing my best and leaving the rest to God

GIF image

Other related posts: