On Tuesday 22 March 2011 12:32:50 Yvan Wenger wrote: > I'm in a similar situation, although I'm working on cDNA with a large > eukaryotic transcriptome (without reference). I get a very high > representation of sequences I know of with Mira, but frequents 1 base > insertions/deletions when compared to Newbler 2.5 output. > > Finally in my experience, Newbler performs slightly better with the > sff files are input than with fasta+qual, but the difference is not > dramatic. I see still more "future frameshift" after in-silico > translation of mira seqs than after newbler seqs even when the input > is the same for both. (here too: which version of MIRA?) If you had some data (MAF or CAF) with a couple of places with these wrong calls, I'd be happy to have a look at whether I can improve consensus calling. > Finally about the difference between chromatograms and fasta(+qual), I > was wondering if there is any tool allowing to remove adapters/vector > sequences directly in the sff or xml file used by mira? The problem > here is that my sff file is correct, but some prior adapters used for > normalisation are still in the sequences. Use the SSAHA2/SMALT clipping functions of MIRA. In short: just use FASTA+QUAL+XML as you do normally, but tell MIRA you have some more info in ssaha2/smalt format to look at. And the ssaha2/smalt you should created by running your sequences against the adaptor. Note: nowadays I recommend to use SMALT and not SSAHA2 anymore. B. PS: due to the fact that I had to implement some adaptor screening for Solexa, I think one of the next versions will have a facility to have MIRA perform this kind of screening for any sequencing tech. PPS: related question: when screening for your adaptors in 454, are they of type a) when an adaptor occurs, mask it and everything to the right of the read or b) a) when an adaptor occurs, mask it and everything to the left of the read or c) both a) and b) ?