On Thu, Sep 30, 2010 at 2:46 PM, Daniel Depledge <d.depledge@xxxxxxxxx> wrote: > > Hi Peter, > > I have now had a chance to run your script against a MIRA reference > assembly of a paired-end solexa run for H1N1 flu. I had no troubles with > the updated script and the sam output was then run through a number of > samtools packages (and other scripts i have available) to ensure that full > functionality was retained (it was). > > I have also checked the script against some single and paired-end 454 data > and everything works very nicely. > > Regards > > Dan Hi all, My thanks to Dan for his testing. I have attached the current version for wider trial, use at your own risk etc. There is plenty of scope for improvement: MIRA3 puts a lot of stuff into RT lines (read tags). These might be stored in the SAM/BAM read tags... There are lots of things that could be added, like using the MAF ST and/or MT lines to record the sequencing type via the RG (read group) and PL (platform) tags. Perhaps also the SN lines (strain information) could be done with a SAM read group? Perhaps I should look at MIRA's AO lines (as well as the AT line) when constructing a CIGAR string? Note this is a Python script, and currently uses Biopython. That dependency could be avoided with a little more effort if there was demand for this (e.g. to include it with MIRA). Peter