[mira_talk] RE reading fasta files

  • From: Jorge.DUARTE@xxxxxxxxxxxx
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 2 Apr 2009 10:04:04 +0200

Hi Jan,

I had a similar problem myself, and changing the default value for option 
-AS:bdq solved it.

I think mira sets the default base quality value to 10 for sanger reads, 
and the clipping quality to 20 (option -CL:qcmq).
This is probably why none of your sequences are kept.

So if you set option -AS:qcmq to less 10 or option -AS:bdq to more than 
20, your reads should be used, and mira will keep rolling !

e.g., if you trust your genbank sequences you can even increase a buit 
more their default quality :

SANGER_SETTINGS -AS:bdq=40

Regards

Jorge.

--- 
Jorge Duarte
Bioinformatics Research Engineer
BIOGEMMA - Upstream Genomics Group
Z.I. Du Brézet
8, Rue des Frères Lumière
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 39 60 71
E-mail : jorge.duarte@xxxxxxxxxxxx

mira_talk-bounce@xxxxxxxxxxxxx a écrit sur 02/04/2009 09:19:24 :

> Hi Bastien and all,
> 
> I assembled bacterial genome (~7M, 30 coverage on 454 not paired). Today
> I just for curiosity added fasta files from that particular bug which
> are already known from genbank (~200 short records, marginal part of the
> genome, not suitable for mapping), but without success. It looks I have
> problem with reading sanger fasta files, but I can't figure out, how to
> overcome it.
> 
> Details follows:
> 
> Shortly after starting mira:
> 
> mira -project=mira_v3 -job=denovo,genome,sanger,454,accurate
> COMMON_SETTINGS -GE:not=8
> 
> i got:
> 
> <code>
> Short length: FS4OOG301DEWUX (454): only 39 good bases, need: 40. No
> paired end partner, rejected.
> Short length: FS4OOG301BF1EN (454): only 38 good bases, need: 40. No
> paired end partner, rejected.
> Short length: FS4OOG301C692T (454): only 38 good bases, need: 40. No
> paired end partner, rejected.
> Short length: FS4OOG301CAS2J (454): only 38 good bases, need: 40. No
> paired end partner, rejected.
> Short length: ^C
> [1]+  Floating point exception
> </code>
> 
> however, reading fasta files seems to work fine:
> 
> <code>
> Loading data normal (probably Sanger type) from FASTA files,
> Counting sequences in FASTA file:
> Loading sequence data from FASTA file mira_v3_in.sanger.fasta:
> Could not find FASTA quality file mira_v3_in.sanger.fasta.qual, using
> default qualities for all reads.
> Done.
> Loaded 207 reads, 0 of which have quality accounted for.
> </code>
> 
> but clipping went wrong for all sanger reads:
> 
> <code>
> Short length: a001 (san): only 0 good bases, need: 80. No paired end
> partner, rejected.
> </code>
> 
> Thanks for any suggestions,
> 
> Jan
> 
> -- 
> You have received this mail because you are subscribed to the 
> mira_talk mailing list. For information on how to subscribe or 
> unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html

Other related posts: