[mira_talk] Re: reducing number of Illumina reads

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sat, 23 Apr 2011 18:01:15 +0200

On Apr 22, 2011, at 0:34 , Goldman, Thomas wrote:
> Thanks Egon. Yes, simple enough. Just thought I would see if there was 
> something better to do than just random removal of reads.

Well, one might indeed think of selecting some reads and not others. I've made 
the observation (and I'm not the first) that reads coming from the borders of a 
lane have problems more often than others.

I think that the read naming contains a coordinate system and one could write 
elaborate algorithms to select / not select reads, but in practice I make my 
life very easy:

1. do not select the first and the last couple of hundred reads
2. from the rest, select as many as I need ... be it as block, chosen at equal 
interval or randomly chosen, I never saw a really striking difference there.

What should never be done (at least it gave distinctively worse results for 
me): choose reads by some "quality" or "length" measure ... some sequence 
motifs with Illumina lead to weird effects regarding the quality measures and 
would therefore be more prone to be left out completely ... leading to coverage 
holes in otherwise nice projects.

Bastien

PS: as always, your mileage may vary

Other related posts: