[mira_talk] Re: all my 16S in one contig

  • From: Lionel Guy <guy.lionel@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 5 Mar 2012 20:34:37 +0100

Ciao Davide,

I agree with John, you might want to try to sub-sample your data to come around 
50-80X coverage - you may use the rest at a later stage by mapping the rest of 
the reads to the assembled contig(s).

Even with mate-pair reads from 454 (8kb library), I rarely get a correct 
assembly of all my ribosomal operons. If the intergenic region between the 16 
and the 23S is variable enough, you might get one or two assembled, but not 
sure at all... If you feel daring (haven't tried that though, but I've 
considered doing it), you might copy 8x your contig and assemble in the gaps, 
but that might be wrong. 

The only foolproof solution I know of is to design long-range PCRs with primers 
on the edge of the operon in the non-repeated sequences, shotgun the PCR 
fragments, clone them into E. coli and sequence each one separately. 
Alternatively, a fosmid library is a decent solution, but a tad more 
complicated. 

Mapping to a reference genome won't tell you if your contigs are organized 
correctly, because rRNA operons are hotspots for genome rearrangements and you 
have no guarantee that your genome is the same as the reference.

Cheers,

Lionel

On 5 Mar 2012, at 19:03 , John Nash wrote:

> On 2012-03-05, at 12:55 PM, Davide Sassera (davide.sassera) wrote:
> 
>> Dear Bastien and Mira ppl,
>> 
>> I'm assemblying with solexa (100bp, paired) a 5,6 Mb genome, with 200x 
>> coverage.
>> 
>> My problem is that all the copies of the ribosomal genes (16S, 23S, 5S) get 
>> assembled together in one single contig.
>> 
>> Based on reference I think I should have 8 ribosomal operons, which agrees 
>> with the 8fold coverage of the "all the ribosomal sequences mashed together" 
>> contig.
>> 
>> I have been thinking about possible solutions to this, but I then realized 
>> other people must have had the same issue, so why lose my mind when I can 
>> stand on the shoulder of giants?
> 
> Welcome… it's good to had enew blood.
> 
> In my opinion, I don't think that you can assemble a whole genome de novo 
> with just illumina reads, no matter what the coverage.  There is not enough 
> genetic diversity in the stretch between the boundary of a repeat to the 
> region of unique coverage with illumina alone, even with standard paired 
> reads - where I believe the fragment sizes are 250-500 bp. I would recommend 
> either mapping this to a reference genome or getting 40-fold 454 coverage.
> 
> Speaking of coverage, I think 200x is over-kill, and would also lead to 
> misassembles - try 80x.
> 
> HTH,
> John
> 


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: