Ciao Davide, I agree with John, you might want to try to sub-sample your data to come around 50-80X coverage - you may use the rest at a later stage by mapping the rest of the reads to the assembled contig(s). Even with mate-pair reads from 454 (8kb library), I rarely get a correct assembly of all my ribosomal operons. If the intergenic region between the 16 and the 23S is variable enough, you might get one or two assembled, but not sure at all... If you feel daring (haven't tried that though, but I've considered doing it), you might copy 8x your contig and assemble in the gaps, but that might be wrong. The only foolproof solution I know of is to design long-range PCRs with primers on the edge of the operon in the non-repeated sequences, shotgun the PCR fragments, clone them into E. coli and sequence each one separately. Alternatively, a fosmid library is a decent solution, but a tad more complicated. Mapping to a reference genome won't tell you if your contigs are organized correctly, because rRNA operons are hotspots for genome rearrangements and you have no guarantee that your genome is the same as the reference. Cheers, Lionel On 5 Mar 2012, at 19:03 , John Nash wrote: > On 2012-03-05, at 12:55 PM, Davide Sassera (davide.sassera) wrote: > >> Dear Bastien and Mira ppl, >> >> I'm assemblying with solexa (100bp, paired) a 5,6 Mb genome, with 200x >> coverage. >> >> My problem is that all the copies of the ribosomal genes (16S, 23S, 5S) get >> assembled together in one single contig. >> >> Based on reference I think I should have 8 ribosomal operons, which agrees >> with the 8fold coverage of the "all the ribosomal sequences mashed together" >> contig. >> >> I have been thinking about possible solutions to this, but I then realized >> other people must have had the same issue, so why lose my mind when I can >> stand on the shoulder of giants? > > Welcome… it's good to had enew blood. > > In my opinion, I don't think that you can assemble a whole genome de novo > with just illumina reads, no matter what the coverage. There is not enough > genetic diversity in the stretch between the boundary of a repeat to the > region of unique coverage with illumina alone, even with standard paired > reads - where I believe the fragment sizes are 250-500 bp. I would recommend > either mapping this to a reference genome or getting 40-fold 454 coverage. > > Speaking of coverage, I think 200x is over-kill, and would also lead to > misassembles - try 80x. > > HTH, > John > -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html