[mira_talk] Re: mira bambus mira

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 22 Mar 2011 20:36:46 +0100

On Tuesday 22 March 2011 13:50:26 Davide Sassera wrote:
> Dear Bastien,
> 
> unfortunately the 454 data is only 20x, so I guess I should really cut
> most of the solexa (70bp by the way).
> 
> I have thus another question: how should I choose the solexa reads?
> Random?
> Choose the longer ones?

I'd say: random.

> We wrote a script that selects the ones with the highest total quality,
> which should allow to obtain long reads with high quality. I was
> wondering if this would introduce a bias towards "easier" regions, as
> difficult regions may not be covered by long high quality reads.
> Any thoughts on this?

I've been pondering about this issue for quite some time and made a couple of 
tests. In the end (having just hints, some logic, but without really proving 
it), I have the feeling that selecting reads either by length, quality or a 
combination of both *will* introduce a bias.

If anyone else had some experiments about this or wants to share thoughts, 
please do so.

B.

Other related posts: