[mira_talk] Re: Paired end info - SAM/BAM output from MIRA?

  • From: Peter <peter@xxxxxxxxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 30 Sep 2010 19:43:07 +0100

On Thu, Sep 30, 2010 at 7:22 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:
>
> On Donnerstag 30 September 2010 Peter wrote:
>> MIRA3 puts a lot of stuff into RT lines (read tags). These might
>> be stored in the SAM/BAM read tags...
>
> These read tags actually help understanding some decisions MIRA made.
> Especially useful in finishing, though I am not aware of any finishing
> software using SAM/BAM as input.
>
> Oh, and when doing SNP analysis, MIRA also tags SNPs in reads,
> not only in the consensus (consensus tags).

Even in the SAM/BAM 1.3 draft spec, I don't see any obvious
read tags for this kind of thing. Bastien - are you on the samtools
devel mailing list? I think your input on this topic would be valuable.

In any case, do any of the current SAM/BAM viewers show much
or any of the read tag information?

>> There are lots of things that could be added, like using the MAF
>> ST and/or MT lines to record the sequencing type via the RG
>> (read group) and PL (platform) tags. Perhaps also the SN lines
>> (strain information) could be done with a SAM read group?
>
> Would certainly be helpful.

Again, from a lazy/practical point of view - do any of the current
SAM/BAM viewers show the read group information?

>> Perhaps I should look at MIRA's AO lines (as well as the AT
>> line) when constructing a CIGAR string?
>
> Hmmm ... I think you even have to if you want to reconstruct
> deletions in reads.

I currently take the gapped read sequence and the matching
gapped contig sequence as the sole input to build the CIGAR
string. A deletion in a read compared to the contig is just a
gap character in the read where the contig has a letter.
That is unless you mean something more subtle...

>> Note this is a Python script, and currently uses Biopython. That
>> dependency could be avoided with a little more effort if there
>> was demand for this (e.g. to include it with MIRA).
>
> Depending on "little more effort" I would suggest delaying that
> until things stabilise for the script. At the moment I suppose
> that Biopython helps you developing more effectively, that
> should be your primary concern at the moment
> :-)

I'm using a FASTA parser and reverse complement function from
Biopython, both of which are fairly easy to make self contained
(even with IUPAC ambiguity support). Alternatively, I might go
to the other extreme and turn this into a full MAF parser (for
inclusion with Biopython), but for now it works but is rather crude.

But first, lets see if MAF to SAM (and thus BAM) is useful.

Peter

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: