[mira_talk] Re: MIRA edits reads during assembly?
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Sun, 25 Oct 2009 11:15:30 +0100
On Donnerstag 22 Oktober 2009 Lionel Guy wrote:
> Well, to produce the phdball I guess I have to start from the caf
> file...
Hi Guy,
let me start with an old citation: "Ceterum censeo, ACE esse delendam."
ACE is the worst of all possibilities to represent an assembly as some vital
information is missing: alignment of reads to the original sequence. Splitting
away the qualities into other files (phdballs) doesn't make things any easier.
Indeed, the only format that MIRA currently supports and which contains
everything needed is CAF. In a short while, MAF may also be used but I'm not
sure whether I want to keep the fomrat private to MIRA and wait for larger
sequencing centers to come up with something workable.
> I did a bit a digging into the files, and indeed,
> some reads are edited during the assembly (I checked both ace and caf
> files). Same thing for the quality values, they change between the input
> files and the caf files...
I hope that with "change" you mean: some are deleted. Other changes shouldn't
be.
> I also checked the tags, and it seems that the R454 tags correspond to
> such deletions (marked between underscores in the sequences below.
They are only hints. The only viable way to detect correct alignment of an
assembled read to the original sequence is to use the Align_to_SCF info lines
from the CAF.
> What are the criteria that MIRA uses to decide to delete a nucleotide in
> a read?
Dependingon the read type: for Sanger reads, Thomas wrote a pretty nifty
automatic editor (EdIt) back in 1999, with bells and whistles like trace
analysis using neural networks; insert/delete/basechange edits etc.pp. That's
the integrated "EdIt" editor (SANGER_SETTINGS -ED:ace=yes).
For 454 and Solexa reads, I wrote a much simpler editor which look for common
overcall problems which it can safely delete. There's a whole set of rules
behind it, but basically this editor wants a certain base to gap ratio,
forward/reverse reads and a few things more before it decides to edit away a
base in a read.
> Wouldn't it be more appropriate to make gaps in the other sequences?
This is actually done. But as MIRA works in multiple passes, deleting bases
improves the overlap graph in subsequent passes, which leads to better overlap
alignments and - as a side effect - a slight speed increase.
Regards,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: