[mira_talk] RE Re: Inconsistent unpadded reads length in ACE file

From: Jorge.DUARTE@xxxxxxxxxxxx
To: mira_talk@xxxxxxxxxxxxx
Date: Tue, 1 Jun 2010 10:04:43 +0200
Hi,

i faced the same problem and solved it by using some of the caftools.

Before to run gigabuild, you should run these commands :

caf_depad < mira_out.caf > mira_out_unpadded.caf

and

caf2phrap -caf mira_out_unpadded.caf -fasta mira_out_reads.fasta -clip

this command will generate 3 files :
mira_out_reads.fasta
mira_out_reads.fasta.qual
mira_out_reads.fasta.ace

(I'm not sure about the -clip option, but i had some problems when not 
using it)

Then you just have to run gigabuild like this :

gigaBuild --ace mira_out.ace --gig mira_out.gig --fd mira_out_reads.fasta 
--fq mira_out_reads.fasta.qual

Now you are ready to run gigaBayes, it should work properly with this gig 
file.

Although, in some rare cases, there is still a one base difference in snp 
positions reported by gigabayes
and the ones reported by mira, but i still did not find a workarround.

I do agree with Bastien, that it would be much more efficient to start 
from another file format than the ace format which do not include read 
qualities,
but i do understand the authors of GigaBayes since the ace format is 
widely used.

I wish everyone would agree on a common format some day, i'm tired to look 
for the right converter,
but i'm not very optimistic about that ;-)

Thankfully people like the authors of caftools and Bastien released and 
try to maintain some of these converters...

jorge.

--- 
Jorge Duarte
Bioinformatics Research Engineer
BIOGEMMA - Upstream Genomics Group
Z.I. Du Brézet
8, Rue des Frères Lumière
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 42 79 81
E-mail : jorge.duarte@xxxxxxxxxxxx

*****************************************************************
       Pour toute demande de support merci d'inclure
BIOGEMMA_BioInfo_Service ou bioinfo@xxxxxxxxxxxx
         dans les destinataires lors du premier contact
*****************************************************************
BIOGEMMA S.A.S. au capital social de 48.335.652,00 ?. 1, Rue Edouard 
Colonne - 75001 PARIS. RCS PARIS 412 514 366
This message and any attachments are confidential and intended solely for 
the use of the addressee(s) named above. The information contained in this 
email may also be legally privileged. If you have received this email in 
error, please notify us immediately by reply email or by fax and then 
delete it. Any use, distribution or reproduction of this message is 
strictly prohibited. The integrity or authenticity of this message cannot 
be guaranteed. We therefore shall not be liable for the message if 
altered, changed or falsified. Thank you.

Cet email et ses pièces jointes sont strictement confidentiels et destinés 
uniquement à l'usage du (des) destinataire(s) sus-indiqué(s). Les 
informations contenues dans cet email sont légalement protégées. Si vous 
avez reçu cet email par erreur, merci de nous le retourner immédiatement 
par courrier électronique ou télécopie avant de le supprimer. Toute 
utilisation ou reproduction de cet email est strictement interdite. La 
véracité et l'authenticité de cet email et de son contenu ne peuvent être 
garanties et nous ne pouvons être tenus responsables de leur altération, 
modification ou falsification. Merci.

mira_talk-bounce@xxxxxxxxxxxxx a écrit sur 31/05/2010 19:50:12 :

> On Montag 31 Mai 2010 Davide Scaglione wrote:
> > Hi Bastien,
> > thanks again for your support about heterozygous SNPs assembly.
> > Now I'm facing another problem, the following.
> > I'm using gigaBayes to mine SNPs form the ACE file.
> > While using a small dataset of Sanger sequence (the same you used to 
look
> >  at my SNPs problem) everything run fine. Now, using 450000 454 
sequences,
> >  the first module of gigaBayes (I don't if you ever played with it) 
abort,
> >  returning me an error because there is inconsistency between the read
> >  length in the input file and the unpadded read length in the ACE file 
(so
> >  far the difference I found is one base).
> 
> You might want to read this post:
> 
> //www.freelists.org/post/mira_talk/Some-nt-missing-in-the-assembly-
> files,1
> 
> > Thus, I'm stack; I have no idea how to play around this and fix this
> >  inconsistency (I didn't see any parameters which might be related). 
Am I
> >  doing something wrong?
> > [...]
> > 454_SETTINGS
> > -ED:ace=yes
> 
> Don't turn on the automatic contig editor is your only chance.
> 
> Of course, you might as well ask the authors of gigabayes to write an 
import 
> filter for assembly format which do not have the flaws of ACE. Did I 
already 
> mention Ace is a horrible format? No?
> 
> Bastien
> 
> "Ceterum censeo, ACE esse delendam."
> 
> 
> Oh, did I already mention that ACE is a horrible format?
> 
> -- 
> You have received this mail because you are subscribed to the 
> mira_talk mailing list. For information on how to subscribe or 
> unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html
[mira_talk] RE Re: Inconsistent unpadded reads length in ACE file

Other related posts: