[mira_talk] New development version 4.9.0

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 26 Oct 2014 22:57:30 +0100

Dear all,

it took me a while to realise that SPAdes wasn’t just “Yet Another Assembler.” 
But indeed, after testing it out early this summer I understood why that 
beautiful piece of software from the team in St. Petersburg quickly spread 
throughout the assembly community: it’s simply great.

And it outclasses MIRA 4.0.2. Darn, SPAdes is the first assembler I ever tested 
which gave longer contigs AND still didn’t have way more misassemblies than 
MIRA. So, I spent my summer vacation and quite some after work hours to play 
catch-up. Note that MIRA will never be as fast as SPAdes (MIRA being overlap 
based, SPAdes De-Bruijn), but from what I can see, MIRA is back on-par and even 
gives better results for many of the data sets I’m working on.

Version 4.9.0 is the first public version of my current development tree which 
will lead up to 5.0. Quite a number of improvements I’d been working on since 
4.0 and the ones induced by SPAdes are included. There are still problems with 
some Illumina PE 300 sets I have (they display new … oddities), but I think I 
know how to tackle these soon.

In the meantime, I invite everyone interested to give 4.9.0 a thorough test. I 
do not expect major problems, but be prepared to encounter a few hiccups :-)

Binary packages for Linux and OSX can be found at

  https://sourceforge.net/projects/mira-assembler/files/MIRA/development/ 
<https://sourceforge.net/projects/mira-assembler/files/MIRA/development/>

while documentation for the development version is at

  http://mira-assembler.sourceforge.net/docs-dev/DefinitiveGuideToMIRA.html 
<http://mira-assembler.sourceforge.net/docs-dev/DefinitiveGuideToMIRA.html>

Among the highlights of this release (but see the CHANGES.txt file for a full 
list):
MIRA
- improvement: better overall assemblies.
- improvement: mira can now use kmer sizes up to 256 bases
- improvement: new functionality to automatically determine optimal number of
  passes and different kmer sizes in a denovo assembly (see -AS:nop=0 below)
- improvement: new parameter -AS:kms as one-stop-shop to configure number of
  passes and used kmer sizes. E.g.: -AS:kms=17,31,63,127,127
- improvement: better assembly of data with self-hybridising read chimeras
  (seen in Illumina 300bp data). Not perfect yet, but an improvement
- improvement: in manifest, new segment_naming scheme "SRA" for reads comming
  from the short read archive. New attribute 'rollcomment'.
- improvement: new MIRA parameter -CO:cmrs for better control on reads
  incorporated in contigs
- improvement: faster mapping of long Illumina reads with lots of differences
- improvement: MIRA now uses the SIOc tag also in mapping. Allows finding
  ploidy differences in multiploid genomes.
- improvement: new info file "*_readgroups.txt” for statistics on paired reads
- improvement: some temporary files compressed to minimise impact on disk
  space.

MIRABAIT
- all new mirabait functionality: work on read pairs; multiple bait files;
  simultaneous filtering of matches and non matches; safety checks on -L data
- change: mirabait lowercases all sequences, uppercasing just kmers hitting
  bait sequences. Use -c if not wanted.
- improvement: mirabait can now use kmer sizes up to 256 bases

As always: feedback appreciated.

Have a lot of fun with MIRA,
  Bastien

Other related posts: