[mira_talk] Re: megahubs ?

  • From: Jan van Haarst <jan@xxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 2 Mar 2009 16:23:17 +0100

The miramem output was very close to your example, so I used your settings
without any changes except the project and job settings.
The now are:

~/code/mira/mira_2.9.41x4_dev_linux-gnu_x86_64/bin/mira --project=clado
--job=denovo,genome,accurate,454 -CL:pvlc=no -SK:mhpr=100:mchr=5120
&>log_assembly_pvlc


I'll report the results.

Bye,
Jan

On Mon, Mar 2, 2009 at 13:56, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Monday 02 March 2009 Jan van Haarst wrote:
> > [...]
> > Is there a setting I can use, so I can assemble this thing ?
>
> Hi Jan,
>
> there is: reduce the number of SKIM hits (-SK:mhpr) to something lower, say
> 100 and try again ... but this might not be enough.
>
> I'm in the midst of a larger algorithm replacement, but if you feel
> adventurous you can try out the current head of my development branch:
> http://www.chevreux.org/tmp/mira_2.9.41x4_dev_linux-gnu_x86_64.tar.bz2
>
> (Note: it should work as expected but I don't give any guarantee, it does
> have
> a few new algorithms that passed just a few tests)
>
> As a few things have changed, I'll give you a short walkthrough on how to
> use
> as some things a still a bit bumpy.
>
> Step 1: estimating some memory parameters. Run "miramem" like in the
> transcript shown below, but entering your "correct" values. The transcript
> below simulates 900 thousand paired-end FLX reads (which are approximately
> the
> same size as the old GS20) and 5 million FLX reads. Additionally, I guessed
> a
> 50m genome (take here the avg. of Newbler and Celera) and the biggest
> chromosome/contig to be 5 megabases (take the largest value from Newbler /
> Celera):
>
>
> -------------------------------------------------------------------------------
> Is it a genome or transcript (EST/tag/etc.) project? (g/e/) [g]
> g
> Size of genome? [4.5m] 50m
> 50000000
> Looks like a larger eukaryote, guessing largest chromosome size: 30m
> Change if needed!
> Size of largest chromosome? [30000000] 5m
> 5000000
> Is it a denovo or mapping assembly? (d/m/) [d]
> d
> Number of Sanger reads? [40k] 0
> 0
> Are there 454 reads? (y/n/) [n] y
> y
> Number of 454 GS20 reads? [0] 900k
> 900000
> Number of 454 FLX reads? [0] 5m
> 5000000
> Number of 454 Titanium reads? [0]
> 0
> Are there Solexa reads? (y/n/) [n]
> n
>
>
> ************************* Estimates *************************
>
> The contigs will have an average coverage of ~ 25.2 (+/- 10%)
>
> RAM estimates:
>           reads+contigs (unavoidable): 22.2 GiB
>                large tables (tunable): 1.1 GiB
>                                        ---------
>                          total (peak): 23.3 GiB
>
>       add if using -CL:pvlc (tunable): 10.8 GiB
>
> *************************************************************
>
> -------------------------------------------------------------------------------
>
> Now, mira estimated it would need 23.3GiB with standard parameters
> (additionally 10.8 more if -CL:pvlc is used (which might be standard in
> some
> "--job=" configurations, check that for your call and turn it off if
> needed)).
>
> The important number is the "total peak" (plus -CL:pvlc if used): assuming
> if
> your machine had 32 GiB, this would leave ~8 GiB (8192 KiB) unused when -
> CL:pvlc is off. Take half of the unused memory in KiB (4096) and add this
> number to the -SK:mchr number (which should be 1024 by default), leading to
> a
> parameter "-SK:mchr=5120"
>
> 2) call mira like this
>
>  mira --project=yournamehere
>       --job=yourjobdefaultshere
>       -CL:pvlc=no
>       -SK:mhpr=100:mchr=5120
>       >log_assembly.txt
>
> and see whether this helps.
>
> Regards,
>  Bastien
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>



-- 
Dag,
Jan

Other related posts: