On Monday 02 March 2009 Jan van Haarst wrote: > [...] > Is there a setting I can use, so I can assemble this thing ? Hi Jan, there is: reduce the number of SKIM hits (-SK:mhpr) to something lower, say 100 and try again ... but this might not be enough. I'm in the midst of a larger algorithm replacement, but if you feel adventurous you can try out the current head of my development branch: http://www.chevreux.org/tmp/mira_2.9.41x4_dev_linux-gnu_x86_64.tar.bz2 (Note: it should work as expected but I don't give any guarantee, it does have a few new algorithms that passed just a few tests) As a few things have changed, I'll give you a short walkthrough on how to use as some things a still a bit bumpy. Step 1: estimating some memory parameters. Run "miramem" like in the transcript shown below, but entering your "correct" values. The transcript below simulates 900 thousand paired-end FLX reads (which are approximately the same size as the old GS20) and 5 million FLX reads. Additionally, I guessed a 50m genome (take here the avg. of Newbler and Celera) and the biggest chromosome/contig to be 5 megabases (take the largest value from Newbler / Celera): ------------------------------------------------------------------------------- Is it a genome or transcript (EST/tag/etc.) project? (g/e/) [g] g Size of genome? [4.5m] 50m 50000000 Looks like a larger eukaryote, guessing largest chromosome size: 30m Change if needed! Size of largest chromosome? [30000000] 5m 5000000 Is it a denovo or mapping assembly? (d/m/) [d] d Number of Sanger reads? [40k] 0 0 Are there 454 reads? (y/n/) [n] y y Number of 454 GS20 reads? [0] 900k 900000 Number of 454 FLX reads? [0] 5m 5000000 Number of 454 Titanium reads? [0] 0 Are there Solexa reads? (y/n/) [n] n ************************* Estimates ************************* The contigs will have an average coverage of ~ 25.2 (+/- 10%) RAM estimates: reads+contigs (unavoidable): 22.2 GiB large tables (tunable): 1.1 GiB --------- total (peak): 23.3 GiB add if using -CL:pvlc (tunable): 10.8 GiB ************************************************************* ------------------------------------------------------------------------------- Now, mira estimated it would need 23.3GiB with standard parameters (additionally 10.8 more if -CL:pvlc is used (which might be standard in some "--job=" configurations, check that for your call and turn it off if needed)). The important number is the "total peak" (plus -CL:pvlc if used): assuming if your machine had 32 GiB, this would leave ~8 GiB (8192 KiB) unused when - CL:pvlc is off. Take half of the unused memory in KiB (4096) and add this number to the -SK:mchr number (which should be 1024 by default), leading to a parameter "-SK:mchr=5120" 2) call mira like this mira --project=yournamehere --job=yourjobdefaultshere -CL:pvlc=no -SK:mhpr=100:mchr=5120 >log_assembly.txt and see whether this helps. Regards, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html