[mira_talk] Re: Filtering out crappy sequence

On Wed, Apr 4, 2012 at 8:45 PM, John Nash <john.he.nash@xxxxxxxxx> wrote:
> On 2012-04-04, at 3:27 PM, Peter Cock wrote:
>
>> What were the failing sequences? Perhaps there is something we
>> can suggest after seeing them and in what way they are bad.
>>
>
> Hi Peter,
>
> I thought the same thing but the sequence looks like this (in fastq format):
>
> @HK6K99I01A94O4
>
>
> +
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
> I thought that 454 was configured not to output this sort of thing…

That is very strange - did you remove the sequence or was it
missing? If it was missing it was not valid FASTQ.

Also the quality string is odd, an exclamation mark is ASCII
33 which would be PHRED 1 on the Sanger encoding. This
suggests it is either a very bad read which should have
failed QC and never made it to your SFF file, or something
has gone very wrong in the SFF to FASTQ conversion.

Which version of sff_extract did you use?

It would be worth testing the SFF file in other tools (e.g. the
Roche applications or Biopython) to isolate the problem.
If all the problem reads show this pattern (all their read
qualities are PHRED 1) that should be easy to filter out.

Peter

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: