[mira_talk] Re: Paired end info - SAM/BAM output from MIRA?

  • From: Peter <peter@xxxxxxxxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 30 Sep 2010 15:14:08 +0100

On Thu, Sep 30, 2010 at 2:46 PM, Daniel Depledge <d.depledge@xxxxxxxxx> wrote:
>
> Hi Peter,
>
> I have now had a chance to run your script against a MIRA reference
> assembly of a paired-end solexa run for H1N1 flu. I had no troubles with
> the updated script and the sam output was then run through a number of
> samtools packages (and other scripts i have available) to ensure that full
> functionality was retained (it was).
>
> I have also checked the script against some single and paired-end 454 data
> and everything works very nicely.
>
> Regards
>
> Dan

Hi all,

My thanks to Dan for his testing. I have attached the current version
for wider trial, use at your own risk etc.

There is plenty of scope for improvement:

MIRA3 puts a lot of stuff into RT lines (read tags). These might
be stored in the SAM/BAM read tags...

There are lots of things that could be added, like using the MAF
ST and/or MT lines to record the sequencing type via the RG
(read group) and PL (platform) tags. Perhaps also the SN lines
(strain information) could be done with a SAM read group?

Perhaps I should look at MIRA's AO lines (as well as the AT
line) when constructing a CIGAR string?

Note this is a Python script, and currently uses Biopython. That
dependency could be avoided with a little more effort if there
was demand for this (e.g. to include it with MIRA).

Peter

Other related posts: