On Monday 15 August 2011 21:12:40 Iddo Friedberg wrote: > Many repeats in Mycoplasma, unfortunately. Also, many HGTs, so k-mer > frequencies may not be as uniform as in other bugs. But thanks for turning > my attention to this, I will try running the assembly with some nasty > repeat masking. Nope, repeats are not your primary concern. The histogram is pretty healthy regarding that ... I'd expect the usual rRNA repeats and maybe a couple of highly repetitive short stretches which will probably break contig building at a couple of places, but nothing really too wild I would say. Run the first assembly n the reduced data set without repeat masking and decide then on the resulting histogram and the repeat info file what to do next. > @Bastien "Heavens! Is there any valid reason you set up an assembly with a > coverage >= 1000x?" > > Not one which I would like to put down in writing ;) He he :-) Fortunately the remedy is simple: "head -somenumber input.fastq >output.fastq" will be your friend here. B.