On Donnerstag 22 Oktober 2009 Lionel Guy wrote: > Well, to produce the phdball I guess I have to start from the caf > file... Hi Guy, let me start with an old citation: "Ceterum censeo, ACE esse delendam." ACE is the worst of all possibilities to represent an assembly as some vital information is missing: alignment of reads to the original sequence. Splitting away the qualities into other files (phdballs) doesn't make things any easier. Indeed, the only format that MIRA currently supports and which contains everything needed is CAF. In a short while, MAF may also be used but I'm not sure whether I want to keep the fomrat private to MIRA and wait for larger sequencing centers to come up with something workable. > I did a bit a digging into the files, and indeed, > some reads are edited during the assembly (I checked both ace and caf > files). Same thing for the quality values, they change between the input > files and the caf files... I hope that with "change" you mean: some are deleted. Other changes shouldn't be. > I also checked the tags, and it seems that the R454 tags correspond to > such deletions (marked between underscores in the sequences below. They are only hints. The only viable way to detect correct alignment of an assembled read to the original sequence is to use the Align_to_SCF info lines from the CAF. > What are the criteria that MIRA uses to decide to delete a nucleotide in > a read? Dependingon the read type: for Sanger reads, Thomas wrote a pretty nifty automatic editor (EdIt) back in 1999, with bells and whistles like trace analysis using neural networks; insert/delete/basechange edits etc.pp. That's the integrated "EdIt" editor (SANGER_SETTINGS -ED:ace=yes). For 454 and Solexa reads, I wrote a much simpler editor which look for common overcall problems which it can safely delete. There's a whole set of rules behind it, but basically this editor wants a certain base to gap ratio, forward/reverse reads and a few things more before it decides to edit away a base in a read. > Wouldn't it be more appropriate to make gaps in the other sequences? This is actually done. But as MIRA works in multiple passes, deleting bases improves the overlap graph in subsequent passes, which leads to better overlap alignments and - as a side effect - a slight speed increase. Regards, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html