Dear all, 3.0.5 contains a nasty bug ("extendADS" problem) some people people are running into and which stops an assembly cold. While a workaround is simple (tunring off -DP:ure), it robs some of the power of the de-novo assembly when having Sanger sequences. I'm not ready yet to release a new full version as I made some important changes lately to improve speed while handling really large read numbers. The current head of the development branch (3.1.15) passes my usual tests for de-novo assemblies and I also have worked on 4 mapping projects with it, so I feel that it should be OK from an algorithm point of view. However, the documentation is not up-to-date (I'm changing it to DocBook right now and rework it a bit in the process) and I still want to polish a few things. But if anyone is interested to test drive the current head and give feedback, please feel free to do so: http://www.chevreux.org/tmp/mira_3.1.15_dev_linux-gnu_x86_64_static.tar.bz2 Note that docs are missing completely in this archive, please refer to the (rather terse) change log down below to learn about new features / parameters of MIRA. Regards, Bastien 3.1.15 ------ - new parameter -CO:emeas1clpec. Automatically sets emea to 1 if proposed end clipping is used (ends will be "clean"). Improves recognition of misassemblies in cases where only the outer fringes of reads differ. - change in template handling: to be lenient, MIRA internally added/subtracted 10% of the given insertsizes (or at least 1kb). Not anymore! This would give problems with very small libraries (Solexa) or when the given values were "lenient enough" and were made "too lenient" by this and subsequently flagged in different post-processing tools. - change in handling template insert size info from XML: previously, MIRA set stdev to a minimum of 500 bases and used 2*stdev to calculate minimum and maximum insert sizes. The 500 bases minimum rule has been removed, and now using 3*stdev - new parameter: -GE:tpbd to give template partner build direction on the command line. Defines whether the template partner of a read (in a read-pair) must have the same direction (1) or reverse direction (-1) in a contig. - change: when --job=...,454 is used, the default minimum overlap is not 40 anymore, but 20. 40 was too conservative, overlaps at weak contig joins were discarded too often. - improved graph reduction algorithm: some more small overlaps at low coverage sites are taken to Smith-Waterman. This helps to find some more weak contig joins. 3.1.14 ------ - speed up of routine to find and mark IUPAC bases and unsure bases (IUPc & UNSc). Very noticeable when using annotated genomes as mapping reference. - bugfix: IUPC & UNSc were not searched for anymore (introduced in 3.1.12 with the -CO:asir bugfix) - re-activated '-d' in convert_project - adjusted miramem estimator for mapping of Solexa reads 3.1.13 ------ - improvements for large assemblies with millions of reads where setting up data for new contigs during build is sped up. Especially noticeable in EST assemblies, but also genome assemblies with Solexa. 3.1.12 ------ - new option to speed up assemblies with millions of reads: -AS:mrpc controls the minimum number of reads a contig must potentially have before it is really assembled. This prevents all the small junk contigs with very low numbers of reads in, e.g., Solexa sequencing to be assembled and can speed up the assembly by days. - MIRA now uses the tcmalloc library from Google perftools if available. It is highly recommended as it optimises memory allocation and saves a lot of memory on multiple pass assemblies. E.g., memory usage for 810k 454 FLX reads, 45x coverage, 5 pass genome de-novo accurate: 3.0.5 8272988 kB 3.1.11 8273012 kB 3.1.12 9492956 kB 3.1.12tcmalloc 6758916 kB - change: adapted some estimators in miramem, hopefully giving better estimates for RAM usage during MIRA assemblies. - bugfix: array iterator overrun in contig building which had probably no noticeable effect. If, then perhaps rejecting weak matches it would have barely accepted. - bugfix: -CO:asir sometimes set repeat markers instead of SNP markers. - bugfix: mira could try to check physical presence of SCF data even for non-Sanger reads 3.1.11 ------ - optimisation: memory pre-allocation routines for read growth help to get down memory fragmentation and hence less memory requirement overall. - bugfix: -CO:mr=no was not fully respected. While not used during contig building, possible repeats were always marked in result files and then tranferred to following iterations. - bugfix extendADS(): acquireSequences() could throw due to 0 length of a sequence -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html