On Thu, Sep 30, 2010 at 7:22 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > > On Donnerstag 30 September 2010 Peter wrote: >> MIRA3 puts a lot of stuff into RT lines (read tags). These might >> be stored in the SAM/BAM read tags... > > These read tags actually help understanding some decisions MIRA made. > Especially useful in finishing, though I am not aware of any finishing > software using SAM/BAM as input. > > Oh, and when doing SNP analysis, MIRA also tags SNPs in reads, > not only in the consensus (consensus tags). Even in the SAM/BAM 1.3 draft spec, I don't see any obvious read tags for this kind of thing. Bastien - are you on the samtools devel mailing list? I think your input on this topic would be valuable. In any case, do any of the current SAM/BAM viewers show much or any of the read tag information? >> There are lots of things that could be added, like using the MAF >> ST and/or MT lines to record the sequencing type via the RG >> (read group) and PL (platform) tags. Perhaps also the SN lines >> (strain information) could be done with a SAM read group? > > Would certainly be helpful. Again, from a lazy/practical point of view - do any of the current SAM/BAM viewers show the read group information? >> Perhaps I should look at MIRA's AO lines (as well as the AT >> line) when constructing a CIGAR string? > > Hmmm ... I think you even have to if you want to reconstruct > deletions in reads. I currently take the gapped read sequence and the matching gapped contig sequence as the sole input to build the CIGAR string. A deletion in a read compared to the contig is just a gap character in the read where the contig has a letter. That is unless you mean something more subtle... >> Note this is a Python script, and currently uses Biopython. That >> dependency could be avoided with a little more effort if there >> was demand for this (e.g. to include it with MIRA). > > Depending on "little more effort" I would suggest delaying that > until things stabilise for the script. At the moment I suppose > that Biopython helps you developing more effectively, that > should be your primary concern at the moment > :-) I'm using a FASTA parser and reverse complement function from Biopython, both of which are fairly easy to make self contained (even with IUPAC ambiguity support). Alternatively, I might go to the other extreme and turn this into a full MAF parser (for inclusion with Biopython), but for now it works but is rather crude. But first, lets see if MAF to SAM (and thus BAM) is useful. Peter -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html