The miramem output was very close to your example, so I used your settings without any changes except the project and job settings. The now are: ~/code/mira/mira_2.9.41x4_dev_linux-gnu_x86_64/bin/mira --project=clado --job=denovo,genome,accurate,454 -CL:pvlc=no -SK:mhpr=100:mchr=5120 &>log_assembly_pvlc I'll report the results. Bye, Jan On Mon, Mar 2, 2009 at 13:56, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > On Monday 02 March 2009 Jan van Haarst wrote: > > [...] > > Is there a setting I can use, so I can assemble this thing ? > > Hi Jan, > > there is: reduce the number of SKIM hits (-SK:mhpr) to something lower, say > 100 and try again ... but this might not be enough. > > I'm in the midst of a larger algorithm replacement, but if you feel > adventurous you can try out the current head of my development branch: > http://www.chevreux.org/tmp/mira_2.9.41x4_dev_linux-gnu_x86_64.tar.bz2 > > (Note: it should work as expected but I don't give any guarantee, it does > have > a few new algorithms that passed just a few tests) > > As a few things have changed, I'll give you a short walkthrough on how to > use > as some things a still a bit bumpy. > > Step 1: estimating some memory parameters. Run "miramem" like in the > transcript shown below, but entering your "correct" values. The transcript > below simulates 900 thousand paired-end FLX reads (which are approximately > the > same size as the old GS20) and 5 million FLX reads. Additionally, I guessed > a > 50m genome (take here the avg. of Newbler and Celera) and the biggest > chromosome/contig to be 5 megabases (take the largest value from Newbler / > Celera): > > > ------------------------------------------------------------------------------- > Is it a genome or transcript (EST/tag/etc.) project? (g/e/) [g] > g > Size of genome? [4.5m] 50m > 50000000 > Looks like a larger eukaryote, guessing largest chromosome size: 30m > Change if needed! > Size of largest chromosome? [30000000] 5m > 5000000 > Is it a denovo or mapping assembly? (d/m/) [d] > d > Number of Sanger reads? [40k] 0 > 0 > Are there 454 reads? (y/n/) [n] y > y > Number of 454 GS20 reads? [0] 900k > 900000 > Number of 454 FLX reads? [0] 5m > 5000000 > Number of 454 Titanium reads? [0] > 0 > Are there Solexa reads? (y/n/) [n] > n > > > ************************* Estimates ************************* > > The contigs will have an average coverage of ~ 25.2 (+/- 10%) > > RAM estimates: > reads+contigs (unavoidable): 22.2 GiB > large tables (tunable): 1.1 GiB > --------- > total (peak): 23.3 GiB > > add if using -CL:pvlc (tunable): 10.8 GiB > > ************************************************************* > > ------------------------------------------------------------------------------- > > Now, mira estimated it would need 23.3GiB with standard parameters > (additionally 10.8 more if -CL:pvlc is used (which might be standard in > some > "--job=" configurations, check that for your call and turn it off if > needed)). > > The important number is the "total peak" (plus -CL:pvlc if used): assuming > if > your machine had 32 GiB, this would leave ~8 GiB (8192 KiB) unused when - > CL:pvlc is off. Take half of the unused memory in KiB (4096) and add this > number to the -SK:mchr number (which should be 1024 by default), leading to > a > parameter "-SK:mchr=5120" > > 2) call mira like this > > mira --project=yournamehere > --job=yourjobdefaultshere > -CL:pvlc=no > -SK:mhpr=100:mchr=5120 > >log_assembly.txt > > and see whether this helps. > > Regards, > Bastien > > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html > -- Dag, Jan