[mira_announce] MIRA 2.9.29x2
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_announce@xxxxxxxxxxxxx
- Date: Sun, 28 Sep 2008 18:55:35 +0200
Dear all,
http://www.chevreux.org/tmp/mira_2.9.29x2_dev_linux-gnu_x86_64.tar.bz2
This version comes with three important new features:
- first, as a premiere, multiple processors are now supported. At the moment
only in the SKIM part, others will follow. The default uses two processors
as most machines nowadays have at least a dual core.
- second, there are some new routines to disentangle repeats. These became
necessary as after analysing a number of 454 FLX projects, I discovered that
the quality value distribution of 454 data made MIRA blind for certain
types of repeats (which would have been discovered with Sanger data).
- third, the default parameters for 454 data are slowly shifting towards
optimal handling of FLX data. GS20 is still handled quite well, but FLX is
handled better.
On the half a dozen FLX data sets I can play around with, the combination of
enhanced repeat disentangling algorithms and optimised parameter sets
increased the N50 contig size between 50 and 100%.
In short: please test this version thoroughly. If no obvious problems are
reported, it'll make an official appearance on the web site in one or two
weeks.
Regards,
Bastien
Change log since 2.8.28x4
2.9.29x2
--------
- additional algorithms to search and mark repeats marker bases that existing
routines missed in 454 data.
2.9.29x1
--------
- MIRA now uses full overlap graph repeat resolving algorithms which leads to
better and quicker resolving of repeats in bacteria. May be slower for
eukaryotes, more tests needed.
- new clipping options: -CL:emrc:mrcr:smrc
- for 454 reads, MIRA now follows a strategy of cut back first (-CL:emrc),
uncover afterwards via read extension. Highly recommended.
- default parameter -CO:mrpg=5 for repeat marker base detection in 454 data
was to lax, changed back to 4.
- fixed bug: when mapping microread data (Solexa, SOLiD), -SB:sbuip was
wrongly interpreted and de-novo algorithm started instead of mapping
(error introduced in 2.9.28x4)
- change: when not being able to delete a temporary log file, MIRA now gives
a warning but does not abort
2.9.28x7
--------
- added quality information of consensus sequence to output of CAF files.
2.9.28x6
--------
- Premiere for MIRA: multi-threading makes its appearance. At the moment only
for the SKIM algorithm as it's the easiest part and no adverse effects
are expected.
New parameter -SK:not is for controlling the number of threads.
- Test: MIRA now saves more information on failed alignments to build a better
overlap graph in following passes. The overall assembly quality gains, but
memory consumption rises unpredictably. This may become a problem for highly
repetitive genomes of eukaryotic size. To be monitored.
- the rawhashhit log file is not written anymore as it was useful only for
debugging and just ate memory and time of SKIM.
- bugfix: the new read mapping chooser sometimes led to an abort() of the
process (error introduced in 2.9.28x4)
2.9.28x5
--------
- renamed 'est_splitsplices' of the -AL:egpl parameter to 'reject_codongaps'
- when 454 data is used via the --job=...,454,... switch,
-AL:egp=yes:egpl=reject_codongaps are now set for *all* technologies
Other related posts:
- » [mira_announce] MIRA 2.9.29x2