[mira_talk] Re: bug
- From: Laurent MANCHON <lmanchon@xxxxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Tue, 05 May 2009 13:17:52 +0200
no i don't use sff_extract script, i have just received my fasta file
and the associated quality file from
454 sequencing center and the quality file is corrupted (not very well
formated).
So , i'm looking for a script to analyse fasta file and quality file to
see which record is bad formatted
For example this is what i have received, it's just an extract from
fasta file and quality file:
>FUJ9VHQ01DRU97 length=242 xy=1430_1933 region=1 run=R_2009_04_21_09_27_07_
GGTACTGCTTGCTCAGGGCGGAGATGATGACAGCAATCATCTTCTGCCATGCAGCATGGA
CATCTGGGGTATAGGCAGTGCCAAAAGTGGCAGCCAGACAATGGACAGGCAATCACCAAA
GTGCTTGAAGTTATCGGGGTCCACGTGGAGTGTCTCAGAGTGGAACTTACTCAGCTTGGT
AAAANCGTGACTCACGTCATCCATGTGAGCTATAGCGTCACCGACAGGATTTTAAGCACC
GT
and the corresponding phred scores:
>FUJ9VHQ01DRU97 length = 242
37 37 37 37 35 35 36 36 37 39 39 39 39 37 37 37 37 37 37 37
37 39 39 39 39 39 39 39 39 39 37 39 39 39 37 37 39 37 37 37
36 35 35 37 37 37 37 35 33 33 33 33 33 33 33 25 19 19 19 19
27 31 30 30 30 21 21 24 24 32 35 37 34 34 35 37 37 37 37 37
37 37 35 35 33 33 37 37 37 37 37 37 37 37 37 37 35 35 35 37
37 37 37 37 37 35 30 25 25 24 29 29 30 31 32 23 23 16 16 16
24 30 32 32 29 29 28 17 17 18 21 26 23 14 14 13 13 13 13 18
21 27 25 23 25 27 27 29 19 19 19 23 23 27 27 30 30 24 23 23
19 19 20 33 33 33 32 32 24 25 25 35 37 34 34 32 35 33 33 33
26 25 20 20 0 24 24 24 27 30 27 27 27 27 19 19 19 23 32 32 2
7 27 27 27 19 19 20 25 25 27 33 32 31 27 27 27 27 27 27 27 2
9 26 25 20 16 17 16 16 17 11 11 11 11 12 12 13 18 18 16 16 2
2 22
you see the bad format of quality record, here there are 245 values !
and not 242 !
Laurent --
Burkhard Steuernagel a écrit :
Hi Laurent,
did you use the sff_extract third party script? I had exactly the same
error for some of my SFFs and it turned out that the sff_extract
script produced wrong data.
Once upon a time there was the command
sffinfo mysff.sff |sffinfo2mirafiles.tcl -project mysff
and applying this on the same sff worked good.
sffinfo comes with all the software from 454.
cheers
Burkhard
Laurent MANCHON schrieb:
Bastien Chevreux a écrit :
On Montag 04 Mai 2009 Laurent MANCHON wrote:
it returns:
Warning: "Read FUJ9VHQ01DRU97: tried to set 245 qualities although the
read has 242 bases.
"
->Thrown: void Read::setQuality(vector<base_quality_t> & quals)
->Caught: void ReadPool::loadDataFromFASTA(const string & filename,
const string & qualfilename, const bool generatefilenames, const uint8
seqtype)
..../....
Loaded 434802 reads, 345575 of which have quality accounted for.
Hello Laurent,
as Lionel already pointed out: that's no bug, but one of MIRAs
safeguards which try to prevent you shooting yourself in the foot.
There is no option to ignore bad input. And there will never be
(sorry for that). MIRA is ... well I am pretty strict with these
kind of things: either the input meets the expectations and can be
assembled, or it does not. In such cases, one should really, really
have a look at why the input is bad. In your case, ~90k reads (1/5th
of the data) are corrupt and that's too much to simply ignore it.
I cannot even recommend to assemble without quality file and fake
the default quality (-AS:bdq): what if some parts that should have
been clipped were in effect not? And consist of adaptors? In my eyes
that's too high of a risk.
Again, I'm sorry to say: please check the upstream data generation,
eliminate the bugs there and then use good data to assemble :-)
Regards,
Bastien
thanks for this help bastien.
Other related posts: