[mira_talk] Re: Removing a duplicate entry from a CAF file

  • From: John Nash <john.he.nash@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 13 Oct 2011 15:46:08 -0400

On 2011-10-13, at 2:58 PM, Bastien Chevreux wrote:

> On Oct 13, 2011, at 16:56 , John Nash wrote:
> 
>> *sigh*  I just got the following message from convert_project... and from 
>> mira when I tried using the same CAF file (which came from a de novo 
>> assembly) in a mapping assembly:
>> 
>> Fatal error (may be due to problems of the input data or parameters):
>> 
>> "Duplicate readname in CAF file: HWI-ST813_B0202ABXX:2:1108:6120:28342#0/"
>> 
>> ->Thrown: Read & CAF::createCafRead()
>> ->Caught: Error while creating CAF-Object.
>> 
>> 
>> True enough, a grep of the fastq file from my sequencing provider showed 
>> that the sequence was duplicated in the results.  The assembly took two days 
>> on a 64 cpu computer and I really do not want to do it again.  Is there any 
>> way to remove the offending entry from the CAF file?
> 
> Oh f*ck! (sorry)
> 
> Normally MIRA should have could that on loading already!? Which version ... 
> and what parameters did you use?

My turn to say "Oh fsck! (sorry).

After sending me SIX genomes in casava 1.8 format, it appears that the SEVENTH 
genome came in the OLD format (as evidenced by the "/2" and "/1" at the ends of 
the lines of the headers).  Of course, I didn't check and just popped the new 
sequence in the pipeline.  My converter happily but incorrectly converted the 
headers - thus removing the "/1" and "/2" at the ends. That resulted in the 
error that convert_project threw when I was trimming the CAF file to decent 
sized contigs.  The sequence assembly looks really weird!

Moral: Check your fastq formatted sequences EVERY time after downloading from 
the sequence provider.

Sorry to upset you, Bastien.

John




--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: