[mira_talk] Re: MIRA 3.1.15: test driving for interested parties

  • From: Davide Scaglione <gianza@xxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Thu, 3 Jun 2010 08:08:41 +0000

I'm so grateful to you; you are giving us such a real solid support.
thanks. thanks. thanks.
I will write a good chunk of my PhD thesis on MIRA.

Bye

Davide

> From: bach@xxxxxxxxxxxx
> To: mira_talk@xxxxxxxxxxxxx
> Subject: [mira_talk] MIRA 3.1.15: test driving for interested parties
> Date: Wed, 2 Jun 2010 20:58:55 +0200
> 
> Dear all,
> 
> 3.0.5 contains a nasty bug ("extendADS" problem) some people people are 
> running into and which stops an assembly cold. While a workaround is simple 
> (tunring off -DP:ure), it robs some of the power of the de-novo assembly when 
> having Sanger sequences. I'm not ready yet to release a new full version as I 
> made some important changes lately to improve speed while handling really 
> large read numbers.
> 
> The current head of the development branch (3.1.15) passes my usual tests for 
> de-novo assemblies and I also have worked on 4 mapping projects with it, so I 
> feel that it should be OK from an algorithm point of view.
> 
> However, the documentation is not up-to-date (I'm changing it to DocBook 
> right 
> now and rework it a bit in the process) and I still want to polish a few 
> things.
> 
> But if anyone is interested to test drive the current head and give feedback, 
> please feel free to do so:
> 
>   http://www.chevreux.org/tmp/mira_3.1.15_dev_linux-gnu_x86_64_static.tar.bz2
> 
> Note that docs are missing completely in this archive, please refer to the 
> (rather terse) change log down below to learn about new features / parameters 
> of MIRA.
> 
> Regards,
>   Bastien
> 
> 3.1.15
> ------
> - new parameter -CO:emeas1clpec. Automatically sets emea to 1 if proposed end
>   clipping is used (ends will be "clean"). Improves recognition of
>   misassemblies in cases where only the outer fringes of reads differ.
> - change in template handling: to be lenient, MIRA internally added/subtracted
>   10% of the given insertsizes (or at least 1kb). Not anymore! This would give
>   problems with very small libraries (Solexa) or when the given values were
>   "lenient enough" and were made "too lenient" by this and subsequently
>   flagged in different post-processing tools.
> - change in handling template insert size info from XML: previously, MIRA set
>   stdev to a minimum of 500 bases and used 2*stdev to calculate minimum and
>   maximum insert sizes. The 500 bases minimum rule has been removed, and now
>   using 3*stdev
> - new parameter: -GE:tpbd to give template partner build direction on the
>   command line. Defines whether the template partner of a read (in a
>   read-pair) must have the same direction (1) or reverse direction (-1) in a
>   contig.
> - change: when --job=...,454 is used, the default minimum overlap is not 40
>   anymore, but 20. 40 was too conservative, overlaps at weak contig joins were
>   discarded too often.
> - improved graph reduction algorithm: some more small overlaps at low coverage
>   sites are taken to Smith-Waterman. This helps to find some more weak contig
>   joins.
> 
> 
> 3.1.14
> ------
> - speed up of routine to find and mark IUPAC bases and unsure bases (IUPc &
>   UNSc). Very noticeable when using annotated genomes as mapping reference.
> - bugfix: IUPC & UNSc were not searched for anymore (introduced in 3.1.12 with
>   the -CO:asir bugfix)
> - re-activated '-d' in convert_project
> - adjusted miramem estimator for mapping of Solexa reads
> 
> 
> 3.1.13
> ------
> - improvements for large assemblies with millions of reads where setting up
>   data for new contigs during build is sped up. Especially noticeable in EST
>   assemblies, but also genome assemblies with Solexa.
> 
> 
> 3.1.12
> ------
> - new option to speed up assemblies with millions of reads: -AS:mrpc controls
>   the minimum number of reads a contig must potentially have before it is
>   really assembled. This prevents all the small junk contigs with very low
>   numbers of reads in, e.g., Solexa sequencing to be assembled and can speed
>   up the assembly by days.
> - MIRA now uses the tcmalloc library from Google perftools if available. It is
>   highly recommended as it optimises memory allocation and saves a lot of
>   memory on multiple pass assemblies. E.g., memory usage for 810k 454 FLX
>   reads, 45x coverage, 5 pass genome de-novo accurate:
>               3.0.5    8272988 kB
>              3.1.11    8273012 kB
>              3.1.12    9492956 kB
>      3.1.12tcmalloc    6758916 kB
> - change: adapted some estimators in miramem, hopefully giving better
>   estimates for RAM usage during MIRA assemblies.
> - bugfix: array iterator overrun in contig building which had probably no
>   noticeable effect. If, then perhaps rejecting weak matches it would have
>   barely accepted.
> - bugfix: -CO:asir sometimes set repeat markers instead of SNP markers.
> - bugfix: mira could try to check physical presence of SCF data even for
>   non-Sanger reads
> 
> 
> 3.1.11
> ------
> - optimisation: memory pre-allocation routines for read growth help to get
>   down memory fragmentation and hence less memory requirement
>   overall.
> - bugfix: -CO:mr=no was not fully respected. While not used during contig
>   building, possible repeats were always marked in result files and then
>   tranferred to following iterations.
> - bugfix extendADS(): acquireSequences() could throw due to 0 length of a
>   sequence
> 
> -- 
> You have received this mail because you are subscribed to the mira_talk 
> mailing list. For information on how to subscribe or unsubscribe, please 
> visit http://www.chevreux.org/mira_mailinglists.html
                                          
_________________________________________________________________
nome.cognome @... Verifica la disponibilità sui NUOVI domini
https://signup.live.com/signup.aspx?mkt=it-it&rollrs=12&lic=1

Other related posts: