[mira_talk] Re: all my 16S in one contig

From: Shaun Tyler <Shaun.Tyler@xxxxxxxxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Mon, 5 Mar 2012 15:55:44 -0600
Yes I was referring to the high depth of coverage contigs.  Typically after
the assembly I like to assess the depth of coverage and flag those that are
likely repeats.  Any that are larger than about 1 kb (likely not going to
get closed with just a forward and reversed ABI read) I will design
internal sequencing primers so I have them on hand when needed.  I don't
like to include the repeat contigs into the final version of the assembly
because they are really a consensus sequence of multiple copies and SNP
level differences could be missed.  But it is handy to have them kicking
around in the project (e.g. Gap4) so that when they are pulled in to the
assembly via the ABI data you know right away that you are sequencing a
repeat region and if you already have the sequencing primers on hand
closing that gap is only one more ABI run away.

Even if you have ordering information from mate-pair, paired end, OpGen or
whatever you might also consider going old school and including paired end
data from a fosmid library.  The reason I suggest this is that many of the
gaps you will be closing are going to be gaps because they are repeats.
Amplifying these regions from the genomic DNA can often be a bit tricky (or
really messy) but if you have the region isolated on a fosmid clone you
have the option of sequencing directly off the clone or using it as
template for PCR.  This generally give much cleaner results than working
from the genomic DNA.  It also gives you one more layer of data confirming
the contig ordering.

Shaun



From:   Lionel Guy <guy.lionel@xxxxxxxxx>
To:     mira_talk@xxxxxxxxxxxxx
Date:   2012-03-05 03:13 PM
Subject:        [mira_talk] Re: all my 16S in one contig
Sent by:        mira_talk-bounce@xxxxxxxxxxxxx



I guess Shaun was referring to the contig with 8 times (or more, or less,
depends on how many rRNA operons you have) more coverage than the rest of
your genome.

To generate ordered contigs, you can either:
- make hypotheses based on a reference genome, provided that you have
one...
- get one or more long-insert (aka mate-pair) library (low coverage is
enough) to scaffold your contigs. The ideal is 3+8kb 454 libraries, but
other solutions are good too (Illumina...)
- if you have Illumina data already, I've heard good things about getting
some PacBio reads and correct them with your Illumina reads
- have a look at optical mapping (OpGen)
- wait until autumn and get some Oxford Nanopore reads ;)

Lionel

On 5 Mar 2012, at 21:47 , Clancy, Kevin wrote:

> I’m doing de novo sequencing as well, so this is a very interesting topic
for me. What is the pile up contig you are referring to?
>
> How do you generate ordered contigs? Any recommended strategies for this,
apart from the fosmid and long PCR approaches outlined below?
> Thanks!
> kevin
>
> From: mira_talk-bounce@xxxxxxxxxxxxx [
mailto:mira_talk-bounce@xxxxxxxxxxxxx] On Behalf Of Shaun Tyler
> Sent: Monday, March 05, 2012 12:13 PM
> To: mira_talk@xxxxxxxxxxxxx
> Subject: [mira_talk] Re: all my 16S in one contig
>
> The long PCR option would also be my recommendation but we just sequence
directly via primer walking.  Use the pile up contig to design the internal
primers and then use them to sequence all of the PCR products.  Additional
primers might be required depending on the nature of the different ITS
regions.
>
> Also be careful of the PCR if you don't have the contigs ordered.  I have
seen cases where products are obtained from unrelated rRNA copies.
Essentially the PCR is only producing single strand products from the
different copies but then these can subsequently anneal and ultimately
generate a product.  The end result is you join 2 contigs that shouldn't be
joined.
>
> Genome closure isn't an easy process :-(
>
> Shaun
>
>
> <image001.gif>Lionel Guy ---2012-03-05 01:35:17 PM---Ciao Davide, I agree
with John, you might want to try to sub-sample your data to come around
50-80X
>
> From: Lionel Guy <guy.lionel@xxxxxxxxx>
> To: mira_talk@xxxxxxxxxxxxx
> Date: 2012-03-05 01:35 PM
> Subject: [mira_talk] Re: all my 16S in one contig
> Sent by: mira_talk-bounce@xxxxxxxxxxxxx
>
>
>
> Ciao Davide,
>
> I agree with John, you might want to try to sub-sample your data to come
around 50-80X coverage - you may use the rest at a later stage by mapping
the rest of the reads to the assembled contig(s).
>
> Even with mate-pair reads from 454 (8kb library), I rarely get a correct
assembly of all my ribosomal operons. If the intergenic region between the
16 and the 23S is variable enough, you might get one or two assembled, but
not sure at all... If you feel daring (haven't tried that though, but I've
considered doing it), you might copy 8x your contig and assemble in the
gaps, but that might be wrong.
>
> The only foolproof solution I know of is to design long-range PCRs with
primers on the edge of the operon in the non-repeated sequences, shotgun
the PCR fragments, clone them into E. coli and sequence each one
separately. Alternatively, a fosmid library is a decent solution, but a tad
more complicated.
>
> Mapping to a reference genome won't tell you if your contigs are
organized correctly, because rRNA operons are hotspots for genome
rearrangements and you have no guarantee that your genome is the same as
the reference.
>
> Cheers,
>
> Lionel
>
> On 5 Mar 2012, at 19:03 , John Nash wrote:
>
> > On 2012-03-05, at 12:55 PM, Davide Sassera (davide.sassera) wrote:
> >
> >> Dear Bastien and Mira ppl,
> >>
> >> I'm assemblying with solexa (100bp, paired) a 5,6 Mb genome, with 200x
coverage.
> >>
> >> My problem is that all the copies of the ribosomal genes (16S, 23S,
5S) get assembled together in one single contig.
> >>
> >> Based on reference I think I should have 8 ribosomal operons, which
agrees with the 8fold coverage of the "all the ribosomal sequences mashed
together" contig.
> >>
> >> I have been thinking about possible solutions to this, but I then
realized other people must have had the same issue, so why lose my mind
when I can stand on the shoulder of giants?
> >
> > Welcome… it's good to had enew blood.
> >
> > In my opinion, I don't think that you can assemble a whole genome de
novo with just illumina reads, no matter what the coverage.  There is not
enough genetic diversity in the stretch between the boundary of a repeat to
the region of unique coverage with illumina alone, even with standard
paired reads - where I believe the fragment sizes are 250-500 bp. I would
recommend either mapping this to a reference genome or getting 40-fold 454
coverage.
> >
> > Speaking of coverage, I think 200x is over-kill, and would also lead to
misassembles - try 80x.
> >
> > HTH,
> > John
> >
>
>


--
You have received this mail because you are subscribed to the mira_talk
mailing list. For information on how to subscribe or unsubscribe, please
visit http://www.chevreux.org/mira_mailinglists.html
References:
- [mira_talk] all my 16S in one contig
  - From: Davide Sassera (davide.sassera)
- [mira_talk] Re: all my 16S in one contig
  - From: John Nash
- [mira_talk] Re: all my 16S in one contig
  - From: Lionel Guy
- [mira_talk] Re: all my 16S in one contig
  - From: Shaun Tyler
- [mira_talk] Re: all my 16S in one contig
  - From: Clancy, Kevin
- [mira_talk] Re: all my 16S in one contig
  - From: Lionel Guy
[mira_talk] Re: all my 16S in one contig

Other related posts: