[mira_talk] Re: phd.ball

  • From: Björn Nystedt <bjorn.nystedt@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 29 Sep 2009 09:16:00 +0200

>as soon as a read is edited in a way that bases are 
> inserted or deleted, the mapping between the sequence in the ACE and the 
> quality values in the PHD-ball will be completely bogus. That is, unless 
> consed alters the PHD files or does some other funny things.

When things are edited in consed, edited copies of the .phd-files are created 
for reads that have changed; the original phd-copy is *.phd.1 and subsequent 
versions are *.phd.2, *.phd.3 and so on, making sure different versions of the 
ace-file always relate back to the right phd-versions. I am not entirely sure 
at the moment what happens to the phd.ball file, but I would be very surprised 
if consed would end up with bogus quality values. (But yes, the whole idea with 
.phd files instead of a decent complete alignment file is quite bad). 

Björn



On Sun, 27 Sep 2009 14:45:15 +0200
Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Freitag 25 September 2009 Sven Klages wrote:
> > as there are people who want to use the ace output of MIRA3 for further
> > editing in consed, wouldn't it be a good idea to (optionally) write
> > distinct phd.balls for each chemistry used in assembly? I mean you got
> > everything together during assembly ... doing this afterwards is always
> > some kind of hassle (for some people simply not possible).
> > What do you think?
> 
> Hi Sven,
> 
> I'm pretty much for it, but there's one big problem which has stopped me from 
> doing so in the past: the ACE format.
> 
> In fact, beside missing base quality values for reads, it also misses another 
> vital part: information regarding inserted or deleted bases. In the 
> documentation to consed I have seen no way to describe in ACE the fact that a 
> base has been deleted (be it automatically by the assembler or manually in a 
> finishing program)
> 
> Take this simple example of a read with three bases.
> 
>   ATA
> 
> When deleteing the "T", the read is stored as 
> 
>   AA
> 
> in the ACE ... and there's no information there ever was a base between the 
> two A. In other formats (for example CAF), there is adjustment information 
> pertaining to the read which show that there was "something" between the A.
> 
> If you now combine the above facts (no quality values in ACE and no 
> adjustment 
> information) with the MIRA editors for Sanger, 454 and Solexa data, you 
> certainly see the problem: as soon as a read is edited in a way that bases 
> are 
> inserted or deleted, the mapping between the sequence in the ACE and the 
> quality values in the PHD-ball will be completely bogus. That is, unless 
> consed alters the PHD files or does some other funny things.
> 
> If you have an idea how this should be handled ... I'm all ears :-)
> 
> > Hopefully alignment format will change in the future ... :-)
> 
> I my despair (ACE is no good, CAF too complicated/slow to parse, BAF not 
> ready 
> yet, ASM also not really ideal), I got MIRA to write an own format which 
> should be easily parsable ... but whether it was a good idea only the future 
> will tell.
> 
> Regards,
>   Bastien
> 
> PS: "Ceterum censeo: .ace esse delendam."
> 
> 
> -- 
> You have received this mail because you are subscribed to the mira_talk 
> mailing list. For information on how to subscribe or unsubscribe, please 
> visit http://www.chevreux.org/mira_mailinglists.html


-- 
====================================
Björn Nystedt
PhD Student
Molecular Evolution
EBC, Uppsala University
Norbyv. 18C, 752 36  Uppsala
Sweden
phone: +46 (0)18-471 45 88
email: Bjorn.Nystedt@xxxxxxxxx
====================================

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: