[mira_talk] MIRA edits reads during assembly?

  • From: Lionel Guy <guy.lionel@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 22 Oct 2009 18:24:06 +0200

Hi!

I was trying to write a small script to produce phdballs to use in
consed... I started to use the input reads and qualities (fasta/qual) to
produce fake phds and concatenate them. Was working fine but, darn,
consed tells me that some nucleotides are inconsistent between the phd
and the ace file... I did a bit a digging into the files, and indeed,
some reads are edited during the assembly (I checked both ace and caf
files). Same thing for the quality values, they change between the input
files and the caf files... 

I also checked the tags, and it seems that the R454 tags correspond to
such deletions (marked between underscores in the sequences below.

What are the criteria that MIRA uses to decide to delete a nucleotide in
a read? Wouldn't it be more appropriate to make gaps in the other
sequences?

Well, to produce the phdball I guess I have to start from the caf
file... 

Cheers,

Lionel


caf:
tcagTTTCCATTTGGTCTGGATCGATCGCACCTTGACGGTGATTCGCGCTTTATTAGACA
ATATTCGGTCGCGCGCATGTTCACCATCGTAGATTCCGAAGAACTCTTGTTGAACAGCTG
GTGCACTCAAAAGCGCTGTCGTGTCATAGTTCAACTGAATAGTATCATAGTCGTAGCTCT
CACGGTTGAGCACATACTGATTAAGCCAGTATTTGTCTACAACTTCACCATAACTCGTTT
CATGTTCACGCATTACCGAAATAGCATCGACTGCGCCGGTAGCGTTATCGACGCGCAGCA
CCAA-GGGGTA-GGaggcgtggttgggtaaaaacccggtaacgtaaactacgacgcacgc
tataccgagccaatccgaagactaacaagaccaaggcaccgcctcgtccgccaacgcggc
gcgtgcgcgatttacgtaacttcgctgagactgccaaggccacgaccagggagtaggnng
n

fasta:
tcagTTTCCATTTGGTCTGGATCGATCGCACCTTGACGGTGATTCGCGCTTTATTAGACA
ATATTCGGTCGCGCGCATGTTCACCATCGTAG_G_ATTCCGAAGAACTCTTGTTGAACAGCT
GGTGCACTCAAAAGCGCTGTCGTGTCATAGTTCAACTGAATAGTATCATAGTCGTAGCTC
TCACGGTTGAGCACATACTGATTAAGCCAGTATTTGTCTACAACTTCACCATAACTCGTT
TCATGTTCACGCATTACCGAAATAAGCATCGACTGCGCCGGTAGCGTTATCGACGCGCAG
CACCAAGGGGTAGGaggcgtggttgggtaaaaacccggtaacgtaaactacgacgcacgc
tataccgagccaatccgaagactaacaagaccaaggcaccgcctcgtccgccaacgcggc
gcgtgcgcgatttacgtaacttcgctgagactgccaaggccacgaccagggagtaggnng
n

tags in caf file:
Sequence : FR3I0X301A0PY0
Is_read
Padded
Template "FR3I0X301A0PY0"
Strand Forward
SCF_File "FR3I0X301A0PY0"
Seq_vec SVEC 1 5
Seq_vec SVEC 315 481
Clipping QUAL 17 481
Align_to_SCF 1 92 1 92
Align_to_SCF 93 262 94 263
Align_to_SCF 263 304 265 306
Align_to_SCF 306 311 307 312
Align_to_SCF 313 481 313 481
Tag HAF3 29 309 ""
Tag R454 91 92 ""
Tag R454 262 263 ""
Tag MINF 1 1 "ST=454"



-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: