[mira_announce] New version 2.9.37

Dear all,

upgrading to this release is recommended. 

http://www.chevreux.org/uploads/media/mira_2.9.37_dev_linux-gnu_x86_64.tar.bz2

First, the previous releases had a very embarassing bug in the assembly 
report: the number of 'large' contigs was wrongly computed ... with the 
effect that real number of large contigs usually was a lower than reported 
there. This nasty bug I missed because I mostly work directly with the contig 
statistics file *sigh*

Second, this release contains a new strategy to clean up reads, reducing 3' 
junk and this improving the assembly. Helps to resolve some borderline cases 
where MIRA failed in earlier versions.

Third, the definition of which reads may be repetitive has been worked on, 
leading to less false positives in borderline cases, which in turn reduces 
the scatter in repetitive regions.

Fourth, a new option has been introduced to separate contigs with "normal" 
sequence from contigs with repetitive sequence (-AS:klrs). It is not perfect 
yet (I have ideas to improve it especially when used with paired-end data), 
but I would be glad to have some feedback on it from people who are working 
on especially ugly genomes. It is not swicthed on by default, please 
use -AS:klrs=yes if you want to test it out.

And of course a fair number of smaller tweaks and bug fixes.


Here's the complete list of changes since 2.9.34:

2.9.37
------
- new option: -CL:prc (propose right clip). This is a new strategy to ensure a
  good "high confidence region" (hcr) in reads, basically eliminating all junk
  at the 3' end of reads. Extremely effective, but should not be used for very
  low coverage data or for EST projects.
  This option is now default for genome assemblies in "normal" or "accurate"
  mode.


2.9.36
------
- renamed -AS:urdufrd:urdrdct to -AS:ard:ardct
- added -AS:ardml:ardgl. This allows for a better control of which reads are
  defined as repeats.
- added -AS:klrs. Needs testing is not switched on by default.
- bugfix: number of large contigs was reported too high in the report of the
  assembly ... because of a really dumb bug in the statistics calculation
  routine. This had no effect on the assembly itself, just on the
  *_info_asembly.txt report and also on the summary given after the usage of
  "convert_project".
- bugfix: SSAHA clips were wrongly logged to file
- change: log file with clips more verbose
- change: 454 reads without explicit forward/reverse naming scheme
  (e.g. "somename" instead of "somename.f") are now considered to be forward


2.9.35x2
--------
- when running the SKIM in parallel threads, MIRA can give 
  different results when started with the same data and same
  arguments. The effect is now reduced (it is still present), but at the price
  of a table loaded after SKIM ran through now being 25% larger, but this can
  not be helped.
- a few fixes in "convert_project" to allow conversion of assemblies in CAF
  format into clippedfasta and maskedfasta (was previously allowed only for
  single reads)
- typo fix: -OUT:rrol:rld were shown as sequencing type dependent while they
  are not.


2.9.35x1
--------
- CAF files with 454 data now contain the necessary info to allow gap4 opening
  the flowgrams. Works only for reads that are NOT paired-end.
- slight tweak in the pathfinder that should enhance the assembly with
  paired-end in a few cases
- changed sff_extract so that it runs again with the Python 2.4 series


Other related posts: