[mira_talk] caf2phdball

  • From: Lionel Guy <guy.lionel@xxxxxxxxx>
  • To: mira_talk <mira_talk@xxxxxxxxxxxxx>
  • Date: Fri, 23 Oct 2009 16:33:44 +0200

Hi there,

Following my yesterday's message, I changed my original idea and finally
parsed the mira-produced caf file to obtain a phd.ball file to be used
with consed. The idea behind that is to have qualities associated with
reads when editing mira assemblies within consed. This is very important
for example when merging/tearing contigs, because the consensus is
recalculated in a very, very bad way if you don't have qualities
(especially because mira doesn't physically trims the reads from the
vector sequences...).

The result is a small perl script that works for my data, but I would be
glad if others could test it to see if it works with other types of
data. All comments are welcome!

CAVEAT: this script produces huuuuge files, because it writes one line
per base, plus headers. For example, I have 350'000 reads and some long
Sanger, and I get a file which is 1.4 Gb...

Cheers,

Lionel

Other related posts: