[mira_talk] Re: Removing a duplicate entry from a CAF file

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 13 Oct 2011 20:58:46 +0200

On Oct 13, 2011, at 16:56 , John Nash wrote:

> *sigh*  I just got the following message from convert_project... and from 
> mira when I tried using the same CAF file (which came from a de novo 
> assembly) in a mapping assembly:
> 
> Fatal error (may be due to problems of the input data or parameters):
> 
> "Duplicate readname in CAF file: HWI-ST813_B0202ABXX:2:1108:6120:28342#0/"
> 
> ->Thrown: Read & CAF::createCafRead()
> ->Caught: Error while creating CAF-Object.
> 
> 
> True enough, a grep of the fastq file from my sequencing provider showed that 
> the sequence was duplicated in the results.  The assembly took two days on a 
> 64 cpu computer and I really do not want to do it again.  Is there any way to 
> remove the offending entry from the CAF file?

Oh f*ck! (sorry)

Normally MIRA should have could that on loading already!? Which version ... and 
what parameters did you use?

Back to your problem: if it is just one read, then chances are good one can fix 
that. Easiest way is to edit the MAF file, because there all data needed is 
together in one blob. Have a look at 

  
http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html#sect2_contigs

In the small example there, deleting the read U13a05e07.t1 is done by deleting 
everything from 

RD      U13a05e07.t1

up to and including the next line with

AT

That should do the trick. Make sure to delete only one copy of that read or 
else you might end up with a illegal contig if the coverage drops to zero.

B.

PS: Sorry for the inconvenience.
--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: