[mira_talk] Re: scaffolded contigs

  • From: Peter <peter@xxxxxxxxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 21 Jul 2010 14:48:28 +0100

On Wed, Jul 21, 2010 at 2:14 PM, Adnan Niazi <niazi84@xxxxxxxxxxx> wrote:
> you mean unpadded fasta, then yes.
> Regards
> /adnan

OK guys, here you go - one short python script attached. Feedback on
the MIRA list is fine with me (assuming Bastien doesn't mind), or on the
Biopython mailing list if you are more interested in the code details. It
should be self documenting, but to summarise:

Requires Python, should be OK with version 2.4 to 2.7, I used 2.6
Requires Biopython, should be OK with version 1.49 or later, I used 1.54
Should work on Unix, Linux, Windows etc, I used Mac OS X.

The script takes two command line arguments, the ACE input filename,
and the FASTA output filename. It will process all the contigs in the ACE
file, giving a multiple entry FASTA file. Each sequence is masked using
the coverage (see below) and the gaps (padding) are removed.

As written it uses a mixture of upper case (coverage of 10+), lower case
(coverage of 5+) or "n" for coverage from zero to 4 inclusive. This should
be trivial to adjust - there is a commented out alternative.

Maybe we should include this as an example script with Biopython...


