On Apr 22, 2011, at 0:34 , Goldman, Thomas wrote: > Thanks Egon. Yes, simple enough. Just thought I would see if there was > something better to do than just random removal of reads. Well, one might indeed think of selecting some reads and not others. I've made the observation (and I'm not the first) that reads coming from the borders of a lane have problems more often than others. I think that the read naming contains a coordinate system and one could write elaborate algorithms to select / not select reads, but in practice I make my life very easy: 1. do not select the first and the last couple of hundred reads 2. from the rest, select as many as I need ... be it as block, chosen at equal interval or randomly chosen, I never saw a really striking difference there. What should never be done (at least it gave distinctively worse results for me): choose reads by some "quality" or "length" measure ... some sequence motifs with Illumina lead to weird effects regarding the quality measures and would therefore be more prone to be left out completely ... leading to coverage holes in otherwise nice projects. Bastien PS: as always, your mileage may vary