I'm so grateful to you; you are giving us such a real solid support. thanks. thanks. thanks. I will write a good chunk of my PhD thesis on MIRA. Bye Davide > From: bach@xxxxxxxxxxxx > To: mira_talk@xxxxxxxxxxxxx > Subject: [mira_talk] MIRA 3.1.15: test driving for interested parties > Date: Wed, 2 Jun 2010 20:58:55 +0200 > > Dear all, > > 3.0.5 contains a nasty bug ("extendADS" problem) some people people are > running into and which stops an assembly cold. While a workaround is simple > (tunring off -DP:ure), it robs some of the power of the de-novo assembly when > having Sanger sequences. I'm not ready yet to release a new full version as I > made some important changes lately to improve speed while handling really > large read numbers. > > The current head of the development branch (3.1.15) passes my usual tests for > de-novo assemblies and I also have worked on 4 mapping projects with it, so I > feel that it should be OK from an algorithm point of view. > > However, the documentation is not up-to-date (I'm changing it to DocBook > right > now and rework it a bit in the process) and I still want to polish a few > things. > > But if anyone is interested to test drive the current head and give feedback, > please feel free to do so: > > http://www.chevreux.org/tmp/mira_3.1.15_dev_linux-gnu_x86_64_static.tar.bz2 > > Note that docs are missing completely in this archive, please refer to the > (rather terse) change log down below to learn about new features / parameters > of MIRA. > > Regards, > Bastien > > 3.1.15 > ------ > - new parameter -CO:emeas1clpec. Automatically sets emea to 1 if proposed end > clipping is used (ends will be "clean"). Improves recognition of > misassemblies in cases where only the outer fringes of reads differ. > - change in template handling: to be lenient, MIRA internally added/subtracted > 10% of the given insertsizes (or at least 1kb). Not anymore! This would give > problems with very small libraries (Solexa) or when the given values were > "lenient enough" and were made "too lenient" by this and subsequently > flagged in different post-processing tools. > - change in handling template insert size info from XML: previously, MIRA set > stdev to a minimum of 500 bases and used 2*stdev to calculate minimum and > maximum insert sizes. The 500 bases minimum rule has been removed, and now > using 3*stdev > - new parameter: -GE:tpbd to give template partner build direction on the > command line. Defines whether the template partner of a read (in a > read-pair) must have the same direction (1) or reverse direction (-1) in a > contig. > - change: when --job=...,454 is used, the default minimum overlap is not 40 > anymore, but 20. 40 was too conservative, overlaps at weak contig joins were > discarded too often. > - improved graph reduction algorithm: some more small overlaps at low coverage > sites are taken to Smith-Waterman. This helps to find some more weak contig > joins. > > > 3.1.14 > ------ > - speed up of routine to find and mark IUPAC bases and unsure bases (IUPc & > UNSc). Very noticeable when using annotated genomes as mapping reference. > - bugfix: IUPC & UNSc were not searched for anymore (introduced in 3.1.12 with > the -CO:asir bugfix) > - re-activated '-d' in convert_project > - adjusted miramem estimator for mapping of Solexa reads > > > 3.1.13 > ------ > - improvements for large assemblies with millions of reads where setting up > data for new contigs during build is sped up. Especially noticeable in EST > assemblies, but also genome assemblies with Solexa. > > > 3.1.12 > ------ > - new option to speed up assemblies with millions of reads: -AS:mrpc controls > the minimum number of reads a contig must potentially have before it is > really assembled. This prevents all the small junk contigs with very low > numbers of reads in, e.g., Solexa sequencing to be assembled and can speed > up the assembly by days. > - MIRA now uses the tcmalloc library from Google perftools if available. It is > highly recommended as it optimises memory allocation and saves a lot of > memory on multiple pass assemblies. E.g., memory usage for 810k 454 FLX > reads, 45x coverage, 5 pass genome de-novo accurate: > 3.0.5 8272988 kB > 3.1.11 8273012 kB > 3.1.12 9492956 kB > 3.1.12tcmalloc 6758916 kB > - change: adapted some estimators in miramem, hopefully giving better > estimates for RAM usage during MIRA assemblies. > - bugfix: array iterator overrun in contig building which had probably no > noticeable effect. If, then perhaps rejecting weak matches it would have > barely accepted. > - bugfix: -CO:asir sometimes set repeat markers instead of SNP markers. > - bugfix: mira could try to check physical presence of SCF data even for > non-Sanger reads > > > 3.1.11 > ------ > - optimisation: memory pre-allocation routines for read growth help to get > down memory fragmentation and hence less memory requirement > overall. > - bugfix: -CO:mr=no was not fully respected. While not used during contig > building, possible repeats were always marked in result files and then > tranferred to following iterations. > - bugfix extendADS(): acquireSequences() could throw due to 0 length of a > sequence > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html _________________________________________________________________ nome.cognome @... Verifica la disponibilità sui NUOVI domini https://signup.live.com/signup.aspx?mkt=it-it&rollrs=12&lic=1