Dear all, some of you have seen it already: I uploaded MIRA 4.0rc1 as source, Linux 64 bit and OSX 64 bit binary to SourceForge this week-end: http://sourceforge.net/projects/mira-assembler/files/MIRA/stable/ I just didn't have the time to write a proper announcement. Contrary to the MIRA 3.9.x releases, I put them into the "stable" download directories. This is the first Release Candidate of MIRA 4.0. Which means: things should be pretty stable by now and bugs eradicated. The documentation hasn't caught up completely yet, especially walk-throughs are missing. But things should be pretty understandable even so. I hope. For the people who already used the latter versions of the 3.9 line: 4.0rc1 is essentially a 3.9.18 where I added the last few things I felt were missing and fixed the bugs I knew of (or had the time to work on) until this week-end. E.g., MIRA now still dumps a complete project file for de-novo assemblies, but it also dumps a subset of contigs to "LargeContigs" as too many people either did not care or did not read the docs when comparing the number of contigs in assembly comparisons. And I do admit: having a proposal for large, if not useful, contigs fetched out automatically does simplify life. For the people who used only 3.4.x until now: 4.0 is a quite different beast compared to 3.4. Go get it, read about the new manifest files and you should be good to go. Please report oddities in 4.0rc1 immediately so that I can iron them out before the final release of 4.0. Have fun with MIRA :-) Bastien List of main difference MIRA 3.4.x --> 4.0rc1 Main improvements made to simplify life: - flexibilised parametrisation to easily define input data and assembly job: the "manifest" configuration files allow using concepts of read groups as well as segment orientation & segment placement - SAM output via miraconvert simplifies interaction with outside world (gap5, tablet etc.pp) - full GFF3 input and output compatibility, using Sequence Ontology, translation to and from gap4/gap5 - CASAVA 1.8 read naming for the new Illumina read name scheme - new sequencing "technology" TEXT for unspecified data, i.e., from databases like NCBI etc. - lots of other improvements left and right which add up ;-) Main speed improvements: - faster contig handling routines, improves de-novo assembly times with Ion Torrent or 454 / Solexa hybrid by 30% - faster mapping routines, allows MIRA to more or less gracefully handle projects with several thousand reference sequences. Useful for mapping against EST / RNASeq assemblies. - faster handling of deep coverage RNASeq and genome data (to be improved still) - faster kmer counting & smaller footprint in RAM and on disk - faster checking of template restrictions - optimized: faster data reading, does not need to count reads beforehand anymore - lots of other improvements left and right which add up ;-) Main assembly quality improvements: - new "fire & forget" mode of MIRA which basically should reduce misassemblies in result files to zero: earlier version of MIRA would dump out misassemblied contigs (with markers pointing at the misassembly), now contigs dumped out do not contain any misassembly (at least none which MIRA could discover). - improved assembly quality for ultra-high coverage Solexa RNASeq data contigs (new parameter -CL:rkm) - lots of other improvements left and right which add up, like, handling of newer MiSeq and Ion data, lossless digital normalisation etc.pp ;-) Documentation - rewritten in large parts for manifest files - started to update walkthrough for newer public data sets.