[mira_talk] Re: bug

no i don't use sff_extract script, i have just received my fasta file and the associated quality file from 454 sequencing center and the quality file is corrupted (not very well formated). So , i'm looking for a script to analyse fasta file and quality file to see which record is bad formatted For example this is what i have received, it's just an extract from fasta file and quality file:

>FUJ9VHQ01DRU97 length=242 xy=1430_1933 region=1 run=R_2009_04_21_09_27_07_
GGTACTGCTTGCTCAGGGCGGAGATGATGACAGCAATCATCTTCTGCCATGCAGCATGGA
CATCTGGGGTATAGGCAGTGCCAAAAGTGGCAGCCAGACAATGGACAGGCAATCACCAAA
GTGCTTGAAGTTATCGGGGTCCACGTGGAGTGTCTCAGAGTGGAACTTACTCAGCTTGGT
AAAANCGTGACTCACGTCATCCATGTGAGCTATAGCGTCACCGACAGGATTTTAAGCACC
GT

and the corresponding phred scores:

>FUJ9VHQ01DRU97 length = 242
37 37 37 37 35 35 36 36 37 39 39 39 39 37 37 37 37 37 37 37
37 39 39 39 39 39 39 39 39 39 37 39 39 39 37 37 39 37 37 37
36 35 35 37 37 37 37 35 33 33 33 33 33 33 33 25 19 19 19 19
27 31 30 30 30 21 21 24 24 32 35 37 34 34 35 37 37 37 37 37
37 37 35 35 33 33 37 37 37 37 37 37 37 37 37 37 35 35 35 37
37 37 37 37 37 35 30 25 25 24 29 29 30 31 32 23 23 16 16 16
24 30 32 32 29 29 28 17 17 18 21 26 23 14 14 13 13 13 13 18
21 27 25 23 25 27 27 29 19 19 19 23 23 27 27 30 30 24 23 23
19 19 20 33 33 33 32 32 24 25 25 35 37 34 34 32 35 33 33 33
26 25 20 20 0 24 24 24 27 30 27 27 27 27 19 19 19 23 32 32 2
7 27 27 27 19 19 20 25 25 27 33 32 31 27 27 27 27 27 27 27 2
9 26 25 20 16 17 16 16 17 11 11 11 11 12 12 13 18 18 16 16 2
2 22

you see the bad format of quality record, here there are 245 values ! and not 242 !


Laurent --



Burkhard Steuernagel a écrit :
Hi Laurent,
did you use the sff_extract third party script? I had exactly the same error for some of my SFFs and it turned out that the sff_extract script produced wrong data.
Once upon a time there was the command
sffinfo mysff.sff |sffinfo2mirafiles.tcl -project mysff
and applying this on the same sff worked good.

sffinfo comes with all the software from 454.

cheers
Burkhard


Laurent MANCHON schrieb:
Bastien Chevreux a écrit :
On Montag 04 Mai 2009 Laurent MANCHON wrote:
it returns:
Warning: "Read FUJ9VHQ01DRU97: tried to set 245 qualities although the
read has 242 bases.
"
->Thrown: void Read::setQuality(vector<base_quality_t> & quals)
->Caught: void ReadPool::loadDataFromFASTA(const string & filename,
const string & qualfilename, const bool generatefilenames, const uint8
seqtype)
..../....
Loaded 434802 reads, 345575 of which have quality accounted for.

Hello Laurent,

as Lionel already pointed out: that's no bug, but one of MIRAs safeguards which try to prevent you shooting yourself in the foot. There is no option to ignore bad input. And there will never be (sorry for that). MIRA is ... well I am pretty strict with these kind of things: either the input meets the expectations and can be assembled, or it does not. In such cases, one should really, really have a look at why the input is bad. In your case, ~90k reads (1/5th of the data) are corrupt and that's too much to simply ignore it.

I cannot even recommend to assemble without quality file and fake the default quality (-AS:bdq): what if some parts that should have been clipped were in effect not? And consist of adaptors? In my eyes that's too high of a risk.

Again, I'm sorry to say: please check the upstream data generation, eliminate the bugs there and then use good data to assemble :-)

Regards,
  Bastien


thanks for this help bastien.






Other related posts: