Dear Bastien, dear all,Some of you may remember me for a series of mails on problems with my assembly more than a month ago. It's not that I did not want to share my results, as Bastien asked, it's just that the assembly (version 2.9.43) is still running (34 days now), due to massive memory use which caused a lot of swapping (13GB).
By the way...I did some test on the 2.9.44, comparing the assembly with 2.9.37 and 2.9.43
I used a subset of my dataset: 280623 gs-flx reads 455 sanger reads estimated genome size 1.7MB estimated GC% 36% chimera presence: yes, due to MDA contaminating DNA presence: probable (host DNA) machine: Intel core duo 3,16 GHZ 8 GB of RAM I ran the following:mira -job=denovo,accurate,454,sanger,genome -GE:not=2 -AS:klrs=1 -AL:mrs=90 -CO:rodirs=15 -SK:mmhr=2 454_SETTINGS -AS:mrl=80 -AL:mrs=90 -CO:rodirs=20:mrpg=10
-SK:mmhr=2 was because I had hubs considerations:the 2.9.44 seems to have run smoothly, with no particular memory hogging (9.43 seemed much more demanding). It has found a number of possible chimeras and cut them.
The contig length is much higher than both 2.9.43 and 37.37 has longer contigs than 43, but some seem to be misassemblies (wrong GC%, wrong blast hits).
I attach here the 9.44 log and a small assemblystats file in which I highlight some features (in the attached archive).
If you are interested I can send the 9.37 and 9.43 logs as well. Hope it helpsI also hope my 9.43 assembly will finish soon, so I will be able to try the 9.44 with the entire dataset (but it seems it will take at least two more weeks).
thanks to Bastien for the novel version, it's very promising! D. -- Davide Sassera Sezione di Patologia Generale e ParassitologiaDipartimento di Patologia Animale, Igiene e Sanità Pubblica Veterinaria Facoltà di Veterinaria
Università degli Studi di Milano Via Celoria 10, 20133, Milano, ITALY Tel: +39 0250318094 Fax: +39 0250318095