[mira_announce] New version 2.9.37
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_announce@xxxxxxxxxxxxx
- Date: Tue, 9 Dec 2008 00:02:55 +0100
Dear all,
upgrading to this release is recommended.
http://www.chevreux.org/uploads/media/mira_2.9.37_dev_linux-gnu_x86_64.tar.bz2
First, the previous releases had a very embarassing bug in the assembly
report: the number of 'large' contigs was wrongly computed ... with the
effect that real number of large contigs usually was a lower than reported
there. This nasty bug I missed because I mostly work directly with the contig
statistics file *sigh*
Second, this release contains a new strategy to clean up reads, reducing 3'
junk and this improving the assembly. Helps to resolve some borderline cases
where MIRA failed in earlier versions.
Third, the definition of which reads may be repetitive has been worked on,
leading to less false positives in borderline cases, which in turn reduces
the scatter in repetitive regions.
Fourth, a new option has been introduced to separate contigs with "normal"
sequence from contigs with repetitive sequence (-AS:klrs). It is not perfect
yet (I have ideas to improve it especially when used with paired-end data),
but I would be glad to have some feedback on it from people who are working
on especially ugly genomes. It is not swicthed on by default, please
use -AS:klrs=yes if you want to test it out.
And of course a fair number of smaller tweaks and bug fixes.
Here's the complete list of changes since 2.9.34:
2.9.37
------
- new option: -CL:prc (propose right clip). This is a new strategy to ensure a
good "high confidence region" (hcr) in reads, basically eliminating all junk
at the 3' end of reads. Extremely effective, but should not be used for very
low coverage data or for EST projects.
This option is now default for genome assemblies in "normal" or "accurate"
mode.
2.9.36
------
- renamed -AS:urdufrd:urdrdct to -AS:ard:ardct
- added -AS:ardml:ardgl. This allows for a better control of which reads are
defined as repeats.
- added -AS:klrs. Needs testing is not switched on by default.
- bugfix: number of large contigs was reported too high in the report of the
assembly ... because of a really dumb bug in the statistics calculation
routine. This had no effect on the assembly itself, just on the
*_info_asembly.txt report and also on the summary given after the usage of
"convert_project".
- bugfix: SSAHA clips were wrongly logged to file
- change: log file with clips more verbose
- change: 454 reads without explicit forward/reverse naming scheme
(e.g. "somename" instead of "somename.f") are now considered to be forward
2.9.35x2
--------
- when running the SKIM in parallel threads, MIRA can give
different results when started with the same data and same
arguments. The effect is now reduced (it is still present), but at the price
of a table loaded after SKIM ran through now being 25% larger, but this can
not be helped.
- a few fixes in "convert_project" to allow conversion of assemblies in CAF
format into clippedfasta and maskedfasta (was previously allowed only for
single reads)
- typo fix: -OUT:rrol:rld were shown as sequencing type dependent while they
are not.
2.9.35x1
--------
- CAF files with 454 data now contain the necessary info to allow gap4 opening
the flowgrams. Works only for reads that are NOT paired-end.
- slight tweak in the pathfinder that should enhance the assembly with
paired-end in a few cases
- changed sff_extract so that it runs again with the Python 2.4 series
Other related posts:
- » [mira_announce] New version 2.9.37 - Bastien Chevreux