[mira_talk] RE reading fasta files
- From: Jorge.DUARTE@xxxxxxxxxxxx
- To: mira_talk@xxxxxxxxxxxxx
- Date: Thu, 2 Apr 2009 10:04:04 +0200
Hi Jan,
I had a similar problem myself, and changing the default value for option
-AS:bdq solved it.
I think mira sets the default base quality value to 10 for sanger reads,
and the clipping quality to 20 (option -CL:qcmq).
This is probably why none of your sequences are kept.
So if you set option -AS:qcmq to less 10 or option -AS:bdq to more than
20, your reads should be used, and mira will keep rolling !
e.g., if you trust your genbank sequences you can even increase a buit
more their default quality :
SANGER_SETTINGS -AS:bdq=40
Regards
Jorge.
---
Jorge Duarte
Bioinformatics Research Engineer
BIOGEMMA - Upstream Genomics Group
Z.I. Du Brézet
8, Rue des Frères Lumière
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 39 60 71
E-mail : jorge.duarte@xxxxxxxxxxxx
mira_talk-bounce@xxxxxxxxxxxxx a écrit sur 02/04/2009 09:19:24 :
> Hi Bastien and all,
>
> I assembled bacterial genome (~7M, 30 coverage on 454 not paired). Today
> I just for curiosity added fasta files from that particular bug which
> are already known from genbank (~200 short records, marginal part of the
> genome, not suitable for mapping), but without success. It looks I have
> problem with reading sanger fasta files, but I can't figure out, how to
> overcome it.
>
> Details follows:
>
> Shortly after starting mira:
>
> mira -project=mira_v3 -job=denovo,genome,sanger,454,accurate
> COMMON_SETTINGS -GE:not=8
>
> i got:
>
> <code>
> Short length: FS4OOG301DEWUX (454): only 39 good bases, need: 40. No
> paired end partner, rejected.
> Short length: FS4OOG301BF1EN (454): only 38 good bases, need: 40. No
> paired end partner, rejected.
> Short length: FS4OOG301C692T (454): only 38 good bases, need: 40. No
> paired end partner, rejected.
> Short length: FS4OOG301CAS2J (454): only 38 good bases, need: 40. No
> paired end partner, rejected.
> Short length: ^C
> [1]+ Floating point exception
> </code>
>
> however, reading fasta files seems to work fine:
>
> <code>
> Loading data normal (probably Sanger type) from FASTA files,
> Counting sequences in FASTA file:
> Loading sequence data from FASTA file mira_v3_in.sanger.fasta:
> Could not find FASTA quality file mira_v3_in.sanger.fasta.qual, using
> default qualities for all reads.
> Done.
> Loaded 207 reads, 0 of which have quality accounted for.
> </code>
>
> but clipping went wrong for all sanger reads:
>
> <code>
> Short length: a001 (san): only 0 good bases, need: 80. No paired end
> partner, rejected.
> </code>
>
> Thanks for any suggestions,
>
> Jan
>
> --
> You have received this mail because you are subscribed to the
> mira_talk mailing list. For information on how to subscribe or
> unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html
Other related posts: