[mira_announce] Release Candidate 1 for MIRA 4.0

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Wed, 21 Aug 2013 21:12:46 +0200

Dear all,

some of you have seen it already: I uploaded MIRA 4.0rc1 as source, Linux 64 
bit and OSX 64 bit binary to SourceForge this week-end:

  http://sourceforge.net/projects/mira-assembler/files/MIRA/stable/

I just didn't have the time to write a proper announcement.

Contrary to the MIRA 3.9.x releases, I put them into the "stable" download 
directories. This is the first Release Candidate of MIRA 4.0. Which means: 
things should be pretty stable by now and bugs eradicated. The documentation 
hasn't caught up completely yet, especially walk-throughs are missing. But 
things should be pretty understandable even so. I hope.

For the people who already used the latter versions of the 3.9 line: 4.0rc1 is 
essentially a 3.9.18 where I added the last few things I felt were missing and 
fixed the bugs I knew of (or had the time to work on) until this week-end. 
E.g., MIRA now still dumps a complete project file for de-novo assemblies, but 
it also dumps a subset of contigs to "LargeContigs" as too many people either 
did not care or did not read the docs when comparing the number of contigs in 
assembly comparisons. And I do admit: having a proposal for large, if not 
useful, contigs fetched out automatically does simplify life.

For the people who used only 3.4.x until now: 4.0 is a quite different beast 
compared to 3.4. Go get it, read about the new manifest files and you should be 
good to go.

Please report oddities in 4.0rc1 immediately so that I can iron them out before 
the final release of 4.0.

Have fun with MIRA :-)

Bastien


List of main difference MIRA 3.4.x --> 4.0rc1

Main improvements made to simplify life:
- flexibilised parametrisation to easily define input data and assembly job:
  the "manifest" configuration files allow using concepts of read groups as
  well as segment orientation & segment placement
- SAM output via miraconvert simplifies interaction with outside world
  (gap5, tablet etc.pp)
- full GFF3 input and output compatibility, using Sequence Ontology,
  translation to and from gap4/gap5
- CASAVA 1.8 read naming for the new Illumina read name scheme
- new sequencing "technology" TEXT for unspecified data, i.e., from databases
  like NCBI etc.
- lots of other improvements left and right which add up ;-)

Main speed improvements:
- faster contig handling routines, improves de-novo assembly times with Ion
  Torrent or 454 / Solexa hybrid by 30%
- faster mapping routines, allows MIRA to more or less gracefully handle
  projects with several thousand reference sequences. Useful for mapping
  against EST / RNASeq assemblies.
- faster handling of deep coverage RNASeq and genome data (to be improved
  still)
- faster kmer counting & smaller footprint in RAM and on disk
- faster checking of template restrictions
- optimized: faster data reading, does not need to count reads beforehand
  anymore
- lots of other improvements left and right which add up ;-)

Main assembly quality improvements:
- new "fire & forget" mode of MIRA which basically should reduce misassemblies
  in result files to zero: earlier version of MIRA would dump out
  misassemblied contigs (with markers pointing at the misassembly), now
  contigs dumped out do not contain any misassembly (at least none which MIRA
  could discover).
- improved assembly quality for ultra-high coverage Solexa RNASeq data contigs
  (new parameter -CL:rkm)
- lots of other improvements left and right which add up, like, handling of
  newer MiSeq and Ion data, lossless digital normalisation etc.pp ;-)

Documentation
- rewritten in large parts for manifest files
- started to update walkthrough for newer public data sets.



Other related posts:

  • » [mira_announce] Release Candidate 1 for MIRA 4.0 - Bastien Chevreux