[mira_announce] MIRA 2.9.29x2

Dear all,

http://www.chevreux.org/tmp/mira_2.9.29x2_dev_linux-gnu_x86_64.tar.bz2

This version comes with three important new features:
- first, as a premiere, multiple processors are now supported. At the moment
  only in the SKIM part, others will follow. The default uses two processors
  as most machines nowadays have at least a dual core.
- second, there are some new routines to disentangle repeats. These became
  necessary as after analysing a number of 454 FLX projects, I discovered that
  the quality value distribution of 454 data made MIRA blind for certain
  types of repeats (which would have been discovered with Sanger data).
- third, the default parameters for 454 data are slowly shifting towards
  optimal handling of FLX data. GS20 is still handled quite well, but FLX is
  handled better.

On the half a dozen FLX data sets I can play around with, the combination of 
enhanced repeat disentangling algorithms and optimised parameter sets 
increased the N50 contig size between 50 and 100%.

In short: please test this version thoroughly. If no obvious problems are 
reported, it'll make an official appearance on the web site in one or two 
weeks.

Regards,
  Bastien


Change log since 2.8.28x4

2.9.29x2
--------
- additional algorithms to search and mark repeats marker bases that existing
  routines missed in 454 data.


2.9.29x1
--------
- MIRA now uses full overlap graph repeat resolving algorithms which leads to
  better and quicker resolving of repeats in bacteria. May be slower for
  eukaryotes, more tests needed.
- new clipping options: -CL:emrc:mrcr:smrc
- for 454 reads, MIRA now follows a strategy of cut back first (-CL:emrc),
  uncover afterwards via read extension. Highly recommended.
- default parameter -CO:mrpg=5 for repeat marker base detection in 454 data
  was to lax, changed back to 4.
- fixed bug: when mapping microread data (Solexa, SOLiD), -SB:sbuip was
  wrongly interpreted and de-novo algorithm started instead of mapping
  (error introduced in 2.9.28x4)
- change: when not being able to delete a temporary log file, MIRA now gives
  a warning but does not abort


2.9.28x7
--------
- added quality information of consensus sequence to output of CAF files.


2.9.28x6
--------
- Premiere for MIRA: multi-threading makes its appearance. At the moment only
  for the SKIM algorithm as it's the easiest part and no adverse effects
  are expected.
  New parameter -SK:not is for controlling the number of threads.
- Test: MIRA now saves more information on failed alignments to build a better
  overlap graph in following passes. The overall assembly quality gains, but
  memory consumption rises unpredictably. This may become a problem for highly
  repetitive genomes of eukaryotic size. To be monitored.
- the rawhashhit log file is not written anymore as it was useful only for
  debugging and just ate memory and time of SKIM.
- bugfix: the new read mapping chooser sometimes led to an abort() of the
  process (error introduced in 2.9.28x4)


2.9.28x5
--------
- renamed 'est_splitsplices' of the -AL:egpl parameter to 'reject_codongaps'
- when 454 data is used via the --job=...,454,... switch,
  -AL:egp=yes:egpl=reject_codongaps are now set for *all* technologies


Other related posts: