[mira_talk] Re: Gap closure

*Dear Shaun...*
*

Thanks a LOT for that detailed and very helpful walk through. I work on
whole genome sequencing of bacterial isolates. We have roughly 3-4
references for my species. The objective of my work is to establish a gene
function relationship to certain characteristics of my isolate. Of 14
repeats, two coded for proteins, the remaining 12 were 23S, 16S and 5S rRNA
encoding sequences (Yes multiple copies of the same thing). It is essential
for me to complete the genomic gaps by whatever means that may be
available. I'll first try to order the contigs before proceeding further.
Thanks for suggesting the tools. You mentioned something about ancillary
data generated by MIRA. Would you mind explaining what it is and how it
will help in gap closure ?*
*
*
*Best regards
*
*
*
*Shankar Manoharan
Graduate Student
Department of Genetics
Madurai Kamaraj University*
*Ph. +919790167534*
*
*
*I strongly believe in doing my best and leaving the rest to God*
*
*



On Mon, May 7, 2012 at 4:46 AM, Shaun Tyler <Shaun.Tyler@xxxxxxxxxxxxxxx>wrote:

>  There is no “this is how you do it” when it comes to genome closure.
>  Every project is different and will have different challenges which
> require different approaches.  However, I can offer a few tips.
>
> First off and most importantly is to know what it is you want to
> accomplish.  This is a question that should have been addressed before you
> even started with the sequencing as it ultimately impacts on the route you
> need to take.  Characterizing a novel organism is different from doing a
> comparative project looking at differences in gene content, SNPs,
> phylogenetic relatedness, etc.  Too often people embark on these projects
> without having a clear idea of the questions they want answered only to
> find out that the tract they’ve taken is not really appropriate.  But your
> question was on closing gaps.
>
> Basically you’ll do this by PCR and conventional sequencing.  Typically
> the genome coverage will be close to 100 % so the gaps you are left with
> are generally fairly short or are due to repeats like rRNA sequences, IS
> elements and things like that.  The first thing to do is to get to know
> your data.  You mentioned 14 repeats.  What are they?  Are they different
> copies of the same thing?   How big are they and what coverage do they have
> compared to the rest of the genome?  They may be separate contigs but could
> still represent multiple copies of the same thing.  Do you care about these
> sequences and closing the gaps they create?
>
> Start with doing some simple BLAST searches and looking over the ancillary
> data files created by MIRA.  The better you understand the data the better
> off you will be in planning a route of attack.  You should also use
> something like GAP5, Tablet or other pile up viewers to assess the contigs
> created.  Inconsistencies in depth of coverage or paired end distribution
> could indicate misassemblies.  MIRA is good but nothing is perfect.
>  Contigs flanking repeats will typically have a higher depth of coverage at
> the terminal ends.  Interrogate these regions to find out what repeat you
> are dealing with.  It will also aid you in primer design.  No point putting
> a primer at the end of a contig if it is likely to prime in multiple
> locations in the genome and give you multiple products.
>
> Scaffolding is also going to be required in order to orient your contigs
> and plan your PCR experiments.  If you don’t have paired end data to
> facilitate this you can use programs like Mauve, Mummer/Nucmer, Abacus,
> Projector2, etc. to order your contigs based on a reference sequence.  Just
> keep in mind that the resulting contig order is only as good as the
> reference sequence (and program) you use so trying different reference
> sequences and looking for consistency is usually recommended.  In the end
> it probably won’t be perfect so expect some predictions to be wrong.
>  Abacus and Projector2 are nice because they will provide you with a primer
> list for closing gaps but I sometimes question the ordering that they come
> up with and the primers designed are not always the best (e.g. they can
> target obvious repeat regions).
>
> Another tip has to do with the repeats.  In all likelihood you will have
> multiple copies of rRNA sequences or other large regions that will require
> primer walking.  If you are looking to close these gaps make use of the
> data you have to make things easier.  The rRNA regions for example will
> typically be 5 Kb or so and would require multiple rounds of
> sequence-design primer-sequence-design primer, etc.  But you can predesign
> the primers to cover these repetitive regions base on the data you have so
> that you can use the same sequencing primers for all of the different
> copies.
>
> I could go on and on about what to do in different situations but I think
> this covers the main areas.
>
> Good Luck and Have Fun. Oh and be prepared for frustration and
> disappointment ;-)
>
> Shaun
>
>
>
>
> [image: Inactive hide details for Shankar Manoharan ---2012-05-06 01:05:31
> PM---*Dear all...* * Firstly thanks a LOT for your earli]Shankar
> Manoharan ---2012-05-06 01:05:31 PM---*Dear all...* *     Firstly thanks a
> LOT for your earlier support. I have now managed to
>
> From: Shankar Manoharan <shankarmanostar@xxxxxxxxx>
> To: mira_talk@xxxxxxxxxxxxx
> Date: 2012-05-06 01:05 PM
> Subject: [mira_talk] Gap closure
> Sent by: mira_talk-bounce@xxxxxxxxxxxxx
> ------------------------------
>
>
>
> *Dear all...*
> *     Firstly thanks a LOT for your earlier support. I have now managed
> to assemble a ~4.5 Mb genome and obtain 48 contigs of which 14 are repeats.
> If I have to close gaps, what is the best possible way possible ? Any input
> would be greatly appreciated. Many thanks in advance.*
>
> *Best regards,*
>
> *Shankar Manoharan
> Graduate Student
> Department of Genetics
> Madurai Kamaraj University*
> *Ph. +919790167534*
>
> *I strongly believe in doing my best and leaving the rest to God*
>
>
>

GIF image

Other related posts: