On 2012-03-05, at 12:55 PM, Davide Sassera (davide.sassera) wrote: > Dear Bastien and Mira ppl, > > I'm assemblying with solexa (100bp, paired) a 5,6 Mb genome, with 200x > coverage. > > My problem is that all the copies of the ribosomal genes (16S, 23S, 5S) get > assembled together in one single contig. > > Based on reference I think I should have 8 ribosomal operons, which agrees > with the 8fold coverage of the "all the ribosomal sequences mashed together" > contig. > > I have been thinking about possible solutions to this, but I then realized > other people must have had the same issue, so why lose my mind when I can > stand on the shoulder of giants? Welcome… it's good to had enew blood. In my opinion, I don't think that you can assemble a whole genome de novo with just illumina reads, no matter what the coverage. There is not enough genetic diversity in the stretch between the boundary of a repeat to the region of unique coverage with illumina alone, even with standard paired reads - where I believe the fragment sizes are 250-500 bp. I would recommend either mapping this to a reference genome or getting 40-fold 454 coverage. Speaking of coverage, I think 200x is over-kill, and would also lead to misassembles - try 80x. HTH, John -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html