[mira_talk] Re: megahubs ?
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Mon, 2 Mar 2009 13:56:37 +0100
On Monday 02 March 2009 Jan van Haarst wrote:
> [...]
> Is there a setting I can use, so I can assemble this thing ?
Hi Jan,
there is: reduce the number of SKIM hits (-SK:mhpr) to something lower, say
100 and try again ... but this might not be enough.
I'm in the midst of a larger algorithm replacement, but if you feel
adventurous you can try out the current head of my development branch:
http://www.chevreux.org/tmp/mira_2.9.41x4_dev_linux-gnu_x86_64.tar.bz2
(Note: it should work as expected but I don't give any guarantee, it does have
a few new algorithms that passed just a few tests)
As a few things have changed, I'll give you a short walkthrough on how to use
as some things a still a bit bumpy.
Step 1: estimating some memory parameters. Run "miramem" like in the
transcript shown below, but entering your "correct" values. The transcript
below simulates 900 thousand paired-end FLX reads (which are approximately the
same size as the old GS20) and 5 million FLX reads. Additionally, I guessed a
50m genome (take here the avg. of Newbler and Celera) and the biggest
chromosome/contig to be 5 megabases (take the largest value from Newbler /
Celera):
-------------------------------------------------------------------------------
Is it a genome or transcript (EST/tag/etc.) project? (g/e/) [g]
g
Size of genome? [4.5m] 50m
50000000
Looks like a larger eukaryote, guessing largest chromosome size: 30m
Change if needed!
Size of largest chromosome? [30000000] 5m
5000000
Is it a denovo or mapping assembly? (d/m/) [d]
d
Number of Sanger reads? [40k] 0
0
Are there 454 reads? (y/n/) [n] y
y
Number of 454 GS20 reads? [0] 900k
900000
Number of 454 FLX reads? [0] 5m
5000000
Number of 454 Titanium reads? [0]
0
Are there Solexa reads? (y/n/) [n]
n
************************* Estimates *************************
The contigs will have an average coverage of ~ 25.2 (+/- 10%)
RAM estimates:
reads+contigs (unavoidable): 22.2 GiB
large tables (tunable): 1.1 GiB
---------
total (peak): 23.3 GiB
add if using -CL:pvlc (tunable): 10.8 GiB
*************************************************************
-------------------------------------------------------------------------------
Now, mira estimated it would need 23.3GiB with standard parameters
(additionally 10.8 more if -CL:pvlc is used (which might be standard in some
"--job=" configurations, check that for your call and turn it off if needed)).
The important number is the "total peak" (plus -CL:pvlc if used): assuming if
your machine had 32 GiB, this would leave ~8 GiB (8192 KiB) unused when -
CL:pvlc is off. Take half of the unused memory in KiB (4096) and add this
number to the -SK:mchr number (which should be 1024 by default), leading to a
parameter "-SK:mchr=5120"
2) call mira like this
mira --project=yournamehere
--job=yourjobdefaultshere
-CL:pvlc=no
-SK:mhpr=100:mchr=5120
>log_assembly.txt
and see whether this helps.
Regards,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: