[mira_talk] Re: problem with loading reads

  • From: Peter Cock <p.j.a.cock@xxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 6 Oct 2011 09:35:22 +0100

On Thu, Oct 6, 2011 at 7:16 AM, Peter Menzel <ptr@xxxxxxxxxx> wrote:
> On 6 October 2011 00:12, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:
>> Damn. Never tested Illumina 1.0 format in FSTQ, too old. My bad.
>>
>> Your only option at the moment: convert the FASTQ to standard FASTQ. I think
>> one can use the FASTX toolkit for that, but I forgot how.
>>
>
> Ah ok, why does mira guess an offset of 59, which seems to be the
> problem? I know there are 3 different Solexa formats, but either they
> use ASCII offset of 33 or 64?
>
> best, Peter

While PHRED scores are from 0 upwards, the old Solexa scores
could go down to -5. So, with an old Solexa FASTQ, while the
ASCII offset was 64, the lowest score had ASCII 64 - 5 = 59.

My guess is Bastien used an offset of 59 to give old Solexa
scores shifted to make them start at zero. Not ideal, because
that will inflate the reasonable/good scores which are almost
equal to the PHRED scores. Or this is a bug - Bastien did
say he hadn't tested this for a while.

However, the old Solexa FASTQ format is practically dead.
You might as well convert it to Sanger FASTQ (which will
rescale from Solexa -5 to about 40 to PHRED from 0 to
about 40). See: http://dx.doi.org/10.1093/nar/gkp1137

Peter

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: