Dear Mira users, I want to use Mira assembler to assembly a whole potato genome based on 454 and Solexa data. To test the assembler I first tried to map the Solexa reads against a reference genome in Genbank format. However when I run the Mira program after a few hours it terminates with an out of memory error. I tried to avoid the problem by switching off the Automatic memory management (amm) or increase the Keep percent memory free (kpmf). Unfortunatly this does not solve the problem and Im afraid if I will use Mira as an denovo assembler I will run into the same problem. Thank you in advance, Joel Klein Chinese academy of Sciences, Beijing China Ps these are the files in the directory I'm running the program in: drwxrwxr-x. 6 joel joel 4096 Mar 14 15:48 Potatomap_assembly -rw-rw-r--. 1 joel joel 819755346 Mar 2 16:24 Potatomap_backbone_in.gbf This is MIRA V3.4.1.1 (production version). Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence Assembly Using Trace Signals and Additional Sequence Information. Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56. To (un-)subscribe the MIRA mailing lists, see: http://www.chevreux.org/mira_mailinglists.html After subscribing, mail general questions to the MIRA talk mailing list: mira_talk@xxxxxxxxxxxxx To report bugs or ask for features, please use the new ticketing system at: http://sourceforge.net/apps/trac/mira-assembler/ This ensures that requests don't get lost. Compiled by: root Fri Mar 1 21:57:31 CST 2013 On: Linux localhost 2.6.32-279.22.1.el6.centos.plus.x86_64 #1 SMP Wed Feb 6 05:16:56 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Compiled in boundtracking mode. Compiled in bugtracking mode. Compiled with ENABLE64 activated. Runtime settings (sorry, for debug): Size of size_t : 8 Size of uint32 : 4 Size of uint32_t: 4 Size of uint64 : 8 Size of uint64_t: 8 Current system: Linux localhost 2.6.32-279.22.1.el6.centos.plus.x86_64 #1 SMP Wed Feb 6 05:16:56 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Parsing parameters: --project=Potatomap --job=mapping,genome,accurate,solexa -GE:not=20 -AS:nop=1 -SB:lsd=yes:bsn=Potatomap_wt:bft=gbf:bbq=30 SOLEXA_SETTINGS -CO:msr=no -GE:uti=no:tismin=250:tismax=750 -SB:ads=yes:dsn=Potatomap Parameters parsed without error, perfect. -CL:pec and -CO:emeas1clpec are set, setting -CO:emea values to 1. ------------------------------------------------------------------------------ Parameter settings seen for: Sanger data (also common parameters), Solexa data Used parameter settings: General (-GE): Project name in (proin) : Potatomap Project name out (proout) : Potatomap Number of threads (not) : 20 Automatic memory management (amm) : yes Keep percent memory free (kpmf) : 15 Max. process size (mps) : 0 EST SNP pipeline step (esps) : 0 Use template information (uti) : [san] yes [sxa] no Template insert size minimum (tismin) : [san] -1 [sxa] 250 Template insert size maximum (tismax) : [san] -1 [sxa] 750 Template partner build direction (tpbd) : [san] -1 [sxa] -1 Colour reads by hash frequency (crhf) : yes Load reads options (-LR): Load sequence data (lsd) : [san] no [sxa] yes File type (ft) : [san] fasta [sxa] fastq External quality (eq) : from SCF (scf) Ext. qual. override (eqo) : no Discard reads on e.q. error (droeqe): no Solexa scores in qual file (ssiqf) : no FASTQ qual offset (fqqo) : [san] 0 [sxa] 0 Wants quality file (wqf) : [san] yes [sxa] yes Read naming scheme (rns) : [san] Sanger Institute (sanger) [sxa] Solexa (solexa) Merge with XML trace info (mxti) : [san] no [sxa] no Filecheck only (fo) : no Assembly options (-AS): Number of passes (nop) : 1 Skim each pass (sep) : yes Maximum number of RMB break loops (rbl) : 1 Maximum contigs per pass (mcpp) : 0 Minimum read length (mrl) : [san] 80 [sxa] 20 Minimum reads per contig (mrpc) : [san] 2 [sxa] 10 Base default quality (bdq) : [san] 10 [sxa] 10 Enforce presence of qualities (epoq) : [san] yes [sxa] yes Automatic repeat detection (ard) : yes Coverage threshold (ardct) : [san] 2 [sxa] 2 Minimum length (ardml) : [san] 400 [sxa] 200 Grace length (ardgl) : [san] 40 [sxa] 20 Use uniform read distribution (urd) : no Start in pass (urdsip) : 3 Cutoff multiplier (urdcm) : [san] 1.5 [sxa] 1.5 Keep long repeats separated (klrs) : no Spoiler detection (sd) : yes Last pass only (sdlpo) : yes Use genomic pathfinder (ugpf) : yes Use emergency search stop (uess) : yes ESS partner depth (esspd) : 500 Use emergency blacklist (uebl) : yes Use max. contig build time (umcbt) : no Build time in seconds (bts) : 10000 Strain and backbone options (-SB): Load straindata (lsd) : yes Assign default strain (ads) : [san] no [sxa] yes Default strain name (dsn) : [san] StrainX [sxa] Potatomap Load backbone (lb) : yes Start backbone usage in pass (sbuip) : 0 Backbone file type (bft) : gbf Backbone base quality (bbq) : 30 Backbone strain name (bsn) : Potatomap_wt Force for all (bsnffa) : no Backbone rail from strain (brfs) : Backbone rail length (brl) : 0 Backbone rail overlap (bro) : 0 Also build new contigs (abnc) : no Dataprocessing options (-DP): Use read extensions (ure) : [san] yes [sxa] no Read extension window length (rewl) : [san] 30 [sxa] 30 Read extension w. maxerrors (rewme) : [san] 2 [sxa] 2 First extension in pass (feip) : [san] 0 [sxa] 0 Last extension in pass (leip) : [san] 0 [sxa] 0 Clipping options (-CL): Merge with SSAHA2/SMALT vector screen (msvs): [san] no [sxa] no Gap size (msvsgs) : [san] 10 [sxa] 1 Max front gap (msvsmfg) : [san] 60 [sxa] 2 Max end gap (msvsmeg) : [san] 120 [sxa] 2 Strict front clip (msvssfc) : [san] 0 [sxa] 0 Strict end clip (msvssec) : [san] 0 [sxa] 0 Possible vector leftover clip (pvlc) : [san] yes [sxa] no maximum len allowed (pvcmla) : [san] 18 [sxa] 18 Min qual. threshold for entire read (mqtfer): [san] 0 [sxa] 0 Number of bases (mqtfernob) : [san] 0 [sxa] 15 Quality clip (qc) : [san] no [sxa] no Minimum quality (qcmq) : [san] 20 [sxa] 20 Window length (qcwl) : [san] 30 [sxa] 30 Bad stretch quality clip (bsqc) : [san] yes [sxa] no Minimum quality (bsqcmq) : [san] 20 [sxa] 5 Window length (bsqcwl) : [san] 30 [sxa] 20 Masked bases clip (mbc) : [san] yes [sxa] no Gap size (mbcgs) : [san] 20 [sxa] 5 Max front gap (mbcmfg) : [san] 40 [sxa] 12 Max end gap (mbcmeg) : [san] 60 [sxa] 12 Lower case clip (lcc) : [san] no [sxa] no Clip poly A/T at ends (cpat) : [san] no [sxa] no Keep poly-a signal (cpkps) : [san] no [sxa] no Minimum signal length (cpmsl) : [san] 12 [sxa] 12 Max errors allowed (cpmea) : [san] 1 [sxa] 1 Max gap from ends (cpmgfe) : [san] 9 [sxa] 9 Clip 3 prime polybase (c3pp) : [san] no [sxa] yes Minimum signal length (c3ppmsl) : [san] 12 [sxa] 12 Max errors allowed (c3ppmea) : [san] 2 [sxa] 2 Max gap from ends (c3ppmgfe) : [san] 9 [sxa] 9 Clip known adaptors right (ckar) : [san] no [sxa] yes Ensure minimum left clip (emlc) : [san] yes [sxa] no Minimum left clip req. (mlcr) : [san] 25 [sxa] 0 Set minimum left clip to (smlc) : [san] 30 [sxa] 0 Ensure minimum right clip (emrc) : [san] no [sxa] no Minimum right clip req. (mrcr) : [san] 10 [sxa] 10 Set minimum right clip to (smrc) : [san] 20 [sxa] 20 Apply SKIM chimera detection clip (ascdc) : no Apply SKIM junk detection clip (asjdc) : no Propose end clips (pec) : yes Bases per hash (pecbph) : 31 Handle Solexa GGCxG problem (pechsgp) : yes Clip bad solexa ends (cbse) : yes Parameters for SKIM algorithm (-SK): Number of threads (not) : 20 Also compute reverse complements (acrc) : yes Bases per hash (bph) : 10 Hash save stepping (hss) : 1 Percent required (pr) : [san] 60 [sxa] 60 Max hits per read (mhpr) : 2000 Max megahub ratio (mmhr) : 0 SW check on backbones (swcob) : yes Freq. est. min normal (fenn) : 0.4 Freq. est. max normal (fexn) : 1.6 Freq. est. repeat (fer) : 1.9 Freq. est. heavy repeat (fehr) : 8 Freq. est. crazy (fecr) : 20 Mask nasty repeats (mnr) : no Nasty repeat ratio (nrr) : 100 Repeat level in info file (rliif) : 6 Max hashes in memory (mhim) : 15000000 MemCap: hit reduction (mchr) : 4096 Pathfinder options (-PF): Use quick rule (uqr) : [san] yes [sxa] yes Quick rule min len 1 (qrml1) : [san] 200 [sxa] -90 Quick rule min sim 1 (qrms1) : [san] 90 [sxa] 100 Quick rule min len 2 (qrml2) : [san] 100 [sxa] -80 Quick rule min sim 2 (qrms2) : [san] 95 [sxa] 100 Backbone quick overlap min len (bqoml) : [san] 150 [sxa] 20 Max. start cache fill time (mscft) : 5 Align parameters for Smith-Waterman align (-AL): Bandwidth in percent (bip) : [san] 20 [sxa] 20 Bandwidth max (bmax) : [san] 130 [sxa] 80 Bandwidth min (bmin) : [san] 25 [sxa] 20 Minimum score (ms) : [san] 30 [sxa] 15 Minimum overlap (mo) : [san] 17 [sxa] 20 Minimum relative score in % (mrs) : [san] 65 [sxa] 60 Solexa_hack_max_errors (shme) : [san] -1 [sxa] -1 Extra gap penalty (egp) : [san] no [sxa] no extra gap penalty level (egpl) : [san] low [sxa] low Max. egp in percent (megpp) : [san] 100 [sxa] 100 Contig parameters (-CO): Name prefix (np) : Potatomap Reject on drop in relative alignment score in % (rodirs) : [san] 25 [sxa] 30 Mark repeats (mr) : yes Only in result (mroir) : yes Assume SNP instead of repeats (asir) : no Minimum reads per group needed for tagging (mrpg) : [san] 2 [sxa] 3 Minimum neighbour quality needed for tagging (mnq) : [san] 20 [sxa] 20 Minimum Group Quality needed for RMB Tagging (mgqrt) : [san] 30 [sxa] 30 End-read Marking Exclusion Area in bases (emea) : [san] 1 [sxa] 1 Set to 1 on clipping PEC (emeas1clpec) : yes Also mark gap bases (amgb) : [san] yes [sxa] yes Also mark gap bases - even multicolumn (amgbemc) : [san] yes [sxa] yes Also mark gap bases - need both strands (amgbnbs): [san] yes [sxa] yes Force non-IUPAC consensus per sequencing type (fnicpst) : [san] no [sxa] no Merge short reads (msr) : [san] no [sxa] no Keep ends unmerged (msrkeu) : [san] -1 [sxa] -1 Gap override ratio (gor) : [san] 66 [sxa] 66 Edit options (-ED): Automatic contig editing (ace) : [san] no [sxa] no Sanger only: Strict editing mode (sem) : no Confirmation threshold in percent (ct) : 50 Misc (-MI): Stop on NFS (sonfs) : yes Extended log (el) : no Large contig size (lcs) : 500 Large contig size for stats(lcs4s) : 5000 Stop on max read name length (somrnl) : 40 Directories (-DI): Working directory : When loading EXP files : When loading SCF files : Top directory for writing files : Potatomap_assembly For writing result files : Potatomap_assembly/Potatomap_d_results For writing result info files : Potatomap_assembly/Potatomap_d_info For writing tmp files : Potatomap_assembly/Potatomap_d_tmp Tmp redirected to (trt) : For writing checkpoint files : Potatomap_assembly/Potatomap_d_chkpt File names (-FN): When loading sequences from FASTA : [san] Potatomap_in.sanger.fasta [sxa] Potatomap_in.solexa.fasta When loading qualities from FASTA quality : [san] Potatomap_in.sanger.fasta.qual [sxa] Potatomap_in.solexa.fasta.qual When loading sequences from FASTQ : [san] Potatomap_in.sanger.fastq [sxa] Potatomap_in.solexa.fastq When loading project from CAF : Potatomap_in.sanger.caf When loading project from MAF (disabled) : Potatomap_in.sanger.maf When loading EXP fofn : Potatomap_in.sanger.fofn When loading project from PHD : Potatomap_in.phd.1 When loading strain data : Potatomap_straindata_in.txt When loading XML trace info files : [san] Potatomap_traceinfo_in.sanger.xml [sxa] Potatomap_traceinfo_in.solexa.xml When loading SSAHA2 vector screen results : Potatomap_ssaha2vectorscreen_in.txt When loading SMALT vector screen results : Potatomap_smaltvectorscreen_in.txt When loading backbone from MAF : Potatomap_backbone_in.maf When loading backbone from CAF : Potatomap_backbone_in.caf When loading backbone from GenBank : Potatomap_backbone_in.gbf When loading backbone from GFF3 : Potatomap_backbone_in.gff3 When loading backbone from FASTA : Potatomap_backbone_in.fasta Output files (-OUTPUT/-OUT): Save simple singlets in project (sssip) : [san] no [sxa] no Save tagged singlets in project (stsip) : [san] yes [sxa] yes Remove rollover tmps (rrot) : yes Remove tmp directory (rtd) : no Result files: Saved as CAF (orc) : yes Saved as MAF (orm) : yes Saved as FASTA (orf) : yes Saved as GAP4 (directed assembly) (org) : no Saved as phrap ACE (ora) : yes Saved as GFF3 (org3) : no Saved as HTML (orh) : no Saved as Transposed Contig Summary (ors) : yes Saved as simple text format (ort) : no Saved as wiggle (orw) : yes Temporary result files: Saved as CAF (otc) : yes Saved as MAF (otm) : no Saved as FASTA (otf) : no Saved as GAP4 (directed assembly) (otg) : no Saved as phrap ACE (ota) : no Saved as HTML (oth) : no Saved as Transposed Contig Summary (ots) : no Saved as simple text format (ott) : no Extended temporary result files: Saved as CAF (oetc) : no Saved as FASTA (oetf) : no Saved as GAP4 (directed assembly) (oetg) : no Saved as phrap ACE (oeta) : no Saved as HTML (oeth) : no Save also singlets (oetas) : no Alignment output customisation: TEXT characters per line (tcpl) : 60 HTML characters per line (hcpl) : 60 TEXT end gap fill character (tegfc) : HTML end gap fill character (hegfc) : File / directory output names: CAF : Potatomap_out.caf MAF : Potatomap_out.maf FASTA : Potatomap_out.unpadded.fasta FASTA quality : Potatomap_out.unpadded.fasta.qual FASTA (padded) : Potatomap_out.padded.fasta FASTA qual.(pad): Potatomap_out.padded.fasta.qual GAP4 (directory): Potatomap_out.gap4da ACE : Potatomap_out.ace HTML : Potatomap_out.html Simple text : Potatomap_out.txt TCS overview : Potatomap_out.tcs Wiggle : Potatomap_out.wig ------------------------------------------------------------------------------ Deleting old directory Potatomap_assembly ... done. Creating directory Potatomap_assembly ... done. Creating directory Potatomap_assembly/Potatomap_d_tmp ... done. Creating directory Potatomap_assembly/Potatomap_d_results ... done. Creating directory Potatomap_assembly/Potatomap_d_info ... done. Creating directory Potatomap_assembly/Potatomap_d_chkpt ... done. Tmp directory is not on a NFS mount, good. Localtime: Wed Mar 6 21:50:53 2013 Loading backbone from GBF file: Potatomap_backbone_in.gbf Localtime: Wed Mar 6 21:53:15 2013 Generated 0 unique strain ids for 12 reads. Done. Adding sequences as backbones ... done. Postprocessing backbone(s) ... this may take a while. 12 to process chr01_bb 81482218 chr02_bb 47066339 chr03_bb 47880312 chr04_bb 64339883 chr05_bb 47045015 chr06_bb 54975350 chr07_bb 53427889 chr08_bb 43648432 chr09_bb 53640104 chr10_bb 52313507 chr11_bb 42253929 chr12_bb 59100875 Strain "default" has 12 reads. Loading data (Solexa) from FASTQ files, Localtime: Wed Mar 6 21:58:48 2013 Counting sequences in FASTQ file: found 1583666226 sequences. Localtime: Wed Mar 6 22:49:06 2013 Solexa will load 1583666226 reads. Longest Sanger: 0 Longest 454: 0 Longest IonTor: 0 Longest PacBio: 0 Longest Solexa: 75 Longest Solid: 0 Longest overall: 75 Total reads to load: 1583666226 -AL:shme is <0, automatically determining optimal value. set -AL:shme 11 -SB:brl is 0, automatically determining optimal value. brl: 172 -SB:bro is 0, automatically determining optimal value. bro: 86 makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc makeIntelligentConsensus() complete calc Reserving space for reads (this may take a while) ========================== Memory self assessment ============================== Running in 64 bit mode. Dump from /proc/meminfo -------------------------------------------------------------------------------- MemTotal: 140433584 kB MemFree: 2339944 kB Buffers: 642316 kB Cached: 58669384 kB SwapCached: 56964 kB Active: 78543716 kB Inactive: 55517588 kB Active(anon): 71423140 kB Inactive(anon): 3327200 kB Active(file): 7120576 kB Inactive(file): 52190388 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 150994936 kB SwapFree: 150729380 kB Dirty: 94140 kB Writeback: 0 kB AnonPages: 74893612 kB Mapped: 20292 kB Shmem: 12 kB Slab: 2186144 kB SReclaimable: 2096852 kB SUnreclaim: 89292 kB KernelStack: 6648 kB PageTables: 167500 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 221211728 kB Committed_AS: 75697544 kB VmallocTotal: 34359738367 kB VmallocUsed: 714952 kB VmallocChunk: 34290457780 kB HardwareCorrupted: 276 kB AnonHugePages: 20023296 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 5504 kB DirectMap2M: 2082816 kB DirectMap1G: 140509184 kB -------------------------------------------------------------------------------- Dump from /proc/self/status -------------------------------------------------------------------------------- Name: mira State: R (running) Tgid: 22263 Pid: 22263 PPid: 22239 TracerPid: 0 Uid: 512 512 512 512 Gid: 514 514 514 514 Utrace: 0 FDSize: 256 Groups: 514 VmPeak: 68781076 kB VmSize: 68384884 kB VmLck: 0 kB VmHWM: 68662152 kB VmRSS: 68200576 kB VmData: 68323136 kB VmStk: 88 kB VmExe: 3252 kB VmLib: 23308 kB VmPTE: 133512 kB VmSwap: 59984 kB Threads: 1 SigQ: 0/1096978 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000000 SigCgt: 0000000180000000 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffffffffffff Cpus_allowed: ffffff Cpus_allowed_list: 0-23 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003 Mems_allowed_list: 0-1 voluntary_ctxt_switches: 170260 nonvoluntary_ctxt_switches: 742347 -------------------------------------------------------------------------------- Information on current assembly object: AS_readpool: 12 reads. AS_contigs: 0 contigs. AS_bbcontigs: 12 contigs. Mem used for reads: 5824570920 (5.4 GiB) Memory used in assembly structures: Eff. Size Free cap. LostByAlign AS_writtenskimhitsperid: 0 24 B 0 B 0 B AS_skim_edges: 0 24 B 0 B 0 B AS_adsfacts: 0 24 B 0 B 0 B AS_confirmed_edges: 0 24 B 0 B 0 B AS_permanent_overlap_bans: 1 24 B 0 B 0 B AS_readhitmiss: 0 24 B 0 B 0 B AS_readhmcovered: 0 24 B 0 B 0 B AS_count_rhm: 0 24 B 0 B 0 B AS_clipleft: 0 24 B 0 B 0 B AS_clipright: 0 24 B 0 B 0 B AS_used_ids: 0 24 B 0 B 0 B AS_multicopies: 0 24 B 0 B 0 B AS_hasmcoverlaps: 0 24 B 0 B 0 B AS_maxcoveragereached: 0 24 B 0 B 0 B AS_coverageperseqtype: 0 24 B 0 B 0 B AS_istroublemaker: 0 24 B 0 B 0 B AS_isdebris: 0 24 B 0 B 0 B AS_needalloverlaps: 0 40 B 0 B 0 B AS_readsforrepeatresolve: 0 40 B 0 B 0 B AS_allrmbsok: 0 24 B 0 B 0 B AS_probablermbsnotok: 0 24 B 0 B 0 B AS_weakrmbsnotok: 0 24 B 0 B 0 B AS_readmaytakeskim: 0 40 B 0 B 0 B AS_skimstaken: 0 40 B 0 B 0 B AS_numskimoverlaps: 0 24 B 0 B 0 B AS_numleftextendskims: 0 24 B 0 B 0 B AS_rightextendskims: 0 24 B 0 B 0 B AS_skimleftextendratio: 0 24 B 0 B 0 B AS_skimrightextendratio: 0 24 B 0 B 0 B AS_usedtmpfiles: 1 48 B 0 B 0 B Total: 5824571728 (5.4 GiB) ================================================================================ Dynamic allocs: 0 Align allocs: 0 Out of memory detected, exception message is: std::bad_alloc If you have questions on why this happened, please send the last 1000 lines of the output log (or better: the complete file) to the author together with a short summary of your assembly project. For general help, you will probably get a quicker response on the MIRA talk mailing list than if you mailed the author directly. To report bugs or ask for features, please use the new ticketing system at: http://sourceforge.net/apps/trac/mira-assembler/ This ensures that requests don't get lost. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html