[mira_talk] Re: phd.ball

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 27 Sep 2009 14:45:15 +0200

On Freitag 25 September 2009 Sven Klages wrote:
> as there are people who want to use the ace output of MIRA3 for further
> editing in consed, wouldn't it be a good idea to (optionally) write
> distinct phd.balls for each chemistry used in assembly? I mean you got
> everything together during assembly ... doing this afterwards is always
> some kind of hassle (for some people simply not possible).
> What do you think?

Hi Sven,

I'm pretty much for it, but there's one big problem which has stopped me from 
doing so in the past: the ACE format.

In fact, beside missing base quality values for reads, it also misses another 
vital part: information regarding inserted or deleted bases. In the 
documentation to consed I have seen no way to describe in ACE the fact that a 
base has been deleted (be it automatically by the assembler or manually in a 
finishing program)

Take this simple example of a read with three bases.

  ATA

When deleteing the "T", the read is stored as 

  AA

in the ACE ... and there's no information there ever was a base between the 
two A. In other formats (for example CAF), there is adjustment information 
pertaining to the read which show that there was "something" between the A.

If you now combine the above facts (no quality values in ACE and no adjustment 
information) with the MIRA editors for Sanger, 454 and Solexa data, you 
certainly see the problem: as soon as a read is edited in a way that bases are 
inserted or deleted, the mapping between the sequence in the ACE and the 
quality values in the PHD-ball will be completely bogus. That is, unless 
consed alters the PHD files or does some other funny things.

If you have an idea how this should be handled ... I'm all ears :-)

> Hopefully alignment format will change in the future ... :-)

I my despair (ACE is no good, CAF too complicated/slow to parse, BAF not ready 
yet, ASM also not really ideal), I got MIRA to write an own format which 
should be easily parsable ... but whether it was a good idea only the future 
will tell.

Regards,
  Bastien

PS: "Ceterum censeo: .ace esse delendam."


-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: