On Tuesday 22 March 2011 13:50:26 Davide Sassera wrote: > Dear Bastien, > > unfortunately the 454 data is only 20x, so I guess I should really cut > most of the solexa (70bp by the way). > > I have thus another question: how should I choose the solexa reads? > Random? > Choose the longer ones? I'd say: random. > We wrote a script that selects the ones with the highest total quality, > which should allow to obtain long reads with high quality. I was > wondering if this would introduce a bias towards "easier" regions, as > difficult regions may not be covered by long high quality reads. > Any thoughts on this? I've been pondering about this issue for quite some time and made a couple of tests. In the end (having just hints, some logic, but without really proving it), I have the feeling that selecting reads either by length, quality or a combination of both *will* introduce a bias. If anyone else had some experiments about this or wants to share thoughts, please do so. B.