Hello Bastien, Sorry if i didn't make myself clear. events stands for all the tags mira add's to the sequencing. Gaps, SNPs, repeats. I have tried the AO process but, after placing the gaps, there are still discrepancies. for one chromossome my FASTA reference has 1.462.416 bp. the "PADDED" (w/ gaps) generated sequence has 1.462.514 bp and the "UNPADDED" sequence has 1.462.431 bp. There's this difference (+17) between the original and unpadded which i cant explain neither map back. If i want to know where in the reference sequence a SNP was found, i'm not able with a 17bp discrepancy and, for other chromossome, this difference get's even bigger. 1.462.416 bp fasta 1.462.514 bp padded 1.462.431 bp unpadded Regards, ---------------- s. On Thu, Jun 10, 2010 at 6:40 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > On Donnerstag 10 Juni 2010 Saulo Alves wrote: >> I'm facing a problem where I'm trying to map all the "events" on the query >> sequence (the new genome) to the reference genome. > > Hello Saulo, > > I'm not sure I can follow you. What are "events"? > >> The problem is, MIRA inserts several * on the reference prior to assembly >> and shift frames (as reported in the AT field in the MAF file). > > Well, looking at a case like this: > > ref ....acgta*cgt.... > s1 ....acgtaGcgt.... > s2 ....acgtaGcgt.... > s3 ....acgtaGcgt.... > s4 ....acgtaGcgt.... > s5 ....acgtaGcgt.... > > then why should MIRA not insert a gap? This is what makes most sense and > reflects accurately the change of the new genome against the reference. > >> After all those modification I'm not able to map BACK base-by-base each >> problem. > > You can parse the positions with gaps in the reference sequence by looking at > "AO" lines in the reference reads. It's pretty easy actually: fill an array > with "-1", then apply all "AO" line from the reference read. Positions having > "-1" at the end of this procedure are gaps. From there, creating a mapping to > your original sequence is a breeze. > >> I'm planning on creating a "multiple alignment" of the two sequences for >> high density annotation. > > This actually I do not understand: what do you want to do? > > Regards, > Bastien > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html > -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html