On 2011-10-13, at 2:58 PM, Bastien Chevreux wrote: > On Oct 13, 2011, at 16:56 , John Nash wrote: > >> *sigh* I just got the following message from convert_project... and from >> mira when I tried using the same CAF file (which came from a de novo >> assembly) in a mapping assembly: >> >> Fatal error (may be due to problems of the input data or parameters): >> >> "Duplicate readname in CAF file: HWI-ST813_B0202ABXX:2:1108:6120:28342#0/" >> >> ->Thrown: Read & CAF::createCafRead() >> ->Caught: Error while creating CAF-Object. >> >> >> True enough, a grep of the fastq file from my sequencing provider showed >> that the sequence was duplicated in the results. The assembly took two days >> on a 64 cpu computer and I really do not want to do it again. Is there any >> way to remove the offending entry from the CAF file? > > Oh f*ck! (sorry) > > Normally MIRA should have could that on loading already!? Which version ... > and what parameters did you use? My turn to say "Oh fsck! (sorry). After sending me SIX genomes in casava 1.8 format, it appears that the SEVENTH genome came in the OLD format (as evidenced by the "/2" and "/1" at the ends of the lines of the headers). Of course, I didn't check and just popped the new sequence in the pipeline. My converter happily but incorrectly converted the headers - thus removing the "/1" and "/2" at the ends. That resulted in the error that convert_project threw when I was trimming the CAF file to decent sized contigs. The sequence assembly looks really weird! Moral: Check your fastq formatted sequences EVERY time after downloading from the sequence provider. Sorry to upset you, Bastien. John -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html