Hi all, I'm trying to make use of mira's mapping assembly feature. As a reference I use genomic data (a fasta file around 400MB) and an EST dataset from the same species to be assembled. In a very early stage, mira quits with the message: Out of memory detected, exception message is: std::bad_alloc I would be very interested in what went wrong. I attached the log file and the parameters to this message. I worked with mira-3rc3e Any hint is highly appreciated. Cheers, Charles
This is MIRA V3rc4e (development version). Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence Assembly Using Trace Signals and Additional Sequence Information. Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56. Mail general questions to the MIRA talk mailing list: mira_talk@xxxxxxxxxxxxx To (un-)subsubcribe the MIRA mailing lists, see: http://www.chevreux.org/mira_mailinglists.html To report bugs or ask for features, please use the new ticketing system at: http://sourceforge.net/apps/trac/mira-assembler/ This ensures that requests don't get lost. Compiled by: bach Mon Dec 21 23:37:44 CET 2009 On: Linux varcadia32 2.6.27-11-generic #1 SMP Thu Jan 29 19:24:39 UTC 2009 i686 GNU/Linux Compiled in boundtracking mode. Compiled in bugtracking mode. Compilation settings (sorry, for debug): Size of size_t : 4 Size of uint32 : 4 Size of uint32_t: 4 Size of uint64 : 8 Size of uint64_t: 8 Current system: Linux nougat 2.6.26-1-amd64 #1 SMP Fri Mar 13 17:46:45 UTC 2009 x86_64 GNU/Linux Parsing parameters: -parameters=parameters.par Loading parameters from file: parameters.par Parameters parsed without error, perfect. ------------------------------------------------------------------------------ Parameter settings seen for: Sanger data (also common parameters), 454 data Used parameter settings: General (-GE): Project name in (proin) : Helicoverpa Project name out (proout) : Helicoverpa Number of threads (not) : 2 Automatic memory management (amm) : yes Keep percent memory free (kpmf) : 10 Max. process size (mps) : 0 Keep contigs in memory (kcim) : no EST SNP pipeline step (esps) : 1 Use template information (uti) : [san] yes [454] yes Template insert size minimum (tismin): [san] -1 [454] -1 Template insert size maximum (tismax): [san] -1 [454] -1 Colour reads by hash frequency (crhf) : no Load reads options (-LR): Load sequence data (lsd) : [san] no [454] yes File type (ft) : [san] fasta [454] fasta External quality (eq) : from SCF (scf) Ext. qual. override (eqo) : no Discard reads on e.q. error (droeqe): no Solexa scores in qual file (ssiqf) : no FASTQ qual offset (fqqo) : [san] 0 [454] 0 Wants quality file (wqf) : [san] yes [454] yes Read naming scheme (rns) : [san] Sanger Institute (sanger) [454] forward/reverse (fr) Merge with XML trace info (mxti) : [san] no [454] yes Filecheck only (fo) : no Assembly options (-AS): Number of passes (nop) : 4 Skim each pass (sep) : yes Maximum number of RMB break loops (rbl) : 2 Minimum read length (mrl) : [san] 80 [454] 40 Base default quality (bdq) : [san] 10 [454] 10 Enforce presence of qualities (epoq) : [san] yes [454] yes Automatic repeat detection (ard) : no Coverage threshold (ardct) : [san] 2 [454] 2 Minimum length (ardml) : [san] 400 [454] 200 Grace length (ardgl) : [san] 40 [454] 20 Use uniform read distribution (urd) : no Start in pass (urdsip) : 3 Cutoff multiplier (urdcm) : [san] 1.5 [454] 1.5 Keep long repeats separated (klrs) : no Spoiler detection (sd) : no Last pass only (sdlpo) : yes Use genomic pathfinder (ugpf) : no Use emergency search stop (uess) : yes ESS partner depth (esspd) : 500 Use emergency blacklist (uebl) : yes Use max. contig build time (umcbt) : yes Build time in seconds (bts) : 3600 Strain and backbone options (-SB): Load straindata (lsd) : no Load backbone (lb) : yes Start backbone usage in pass (sbuip) : 0 Backbone file type (bft) : fasta Backbone base quality (bbq) : -1 Backbone strain name (bsn) : Force for all (bsnffa) : no Backbone rail from strain (brfs) : Backbone rail length (brl) : 0 Backbone rail overlap (bro) : 0 Also build new contigs (abnc) : no Dataprocessing options (-DP): Use read extensions (ure) : [san] no [454] no Read extension window length (rewl) : [san] 30 [454] 15 Read extension w. maxerrors (rewme) : [san] 2 [454] 2 First extension in pass (feip) : [san] 0 [454] 0 Last extension in pass (leip) : [san] 0 [454] 0 Clipping options (-CL): Merge with SSAHA vector screen (msvs) : [san] no [454] no Gap size (msvsgs) : [san] 10 [454] 8 Max front gap (msvsmfg) : [san] 60 [454] 8 Max end gap (msvsmeg) : [san] 120 [454] 12 Strict front clip (msvssfc) : [san] 0 [454] 0 Strict end clip (msvssec) : [san] 0 [454] 0 Possible vector leftover clip (pvlc) : [san] no [454] no maximum len allowed (pvcmla) : [san] 18 [454] 18 Quality clip (qc) : [san] yes [454] no Minimum quality (qcmq) : [san] 20 [454] 20 Window length (qcwl) : [san] 30 [454] 30 Bad stretch quality clip (bsqc) : [san] no [454] no Minimum quality (bsqcmq) : [san] 20 [454] 5 Window length (bsqcwl) : [san] 30 [454] 20 Masked bases clip (mbc) : [san] yes [454] yes Gap size (mbcgs) : [san] 20 [454] 5 Max front gap (mbcmfg) : [san] 40 [454] 12 Max end gap (mbcmeg) : [san] 60 [454] 12 Lower case clip (llc) : [san] no [454] yes Clip poly A/T at ends (cpat) : [san] yes [454] yes Keep poly-a signal (cpkps) : [san] no [454] no Minimum signal length (cpmsl) : [san] 12 [454] 12 Max errors allowed (cpmea) : [san] 1 [454] 1 Max gap from ends (cpmgfe) : [san] 20000 [454] 20000 Ensure minimum left clip (emlc) : [san] no [454] no Minimum left clip req. (mlcr) : [san] 25 [454] 4 Set minimum left clip to (smlc) : [san] 30 [454] 4 Ensure minimum right clip (emrc) : [san] no [454] no Minimum right clip req. (mrcr) : [san] 10 [454] 10 Set minimum right clip to (smrc) : [san] 20 [454] 15 Propose end clips (pec) : no Bases per hash (pecbph) : 0 Parameters for SKIM algorithm (-SK): Number of threads (not) : 2 Bases per hash (bph) : 17 Hash save stepping (hss) : 4 Percent required (pr) : [san] 70 [454] 80 Max hits per read (mhpr) : 30 Mask nasty repeats (mnr) : no Nasty repeat ratio (nrr) : 100 Max. megahub ratio (mmhr) : 0 Max hashes in memory (mhim) : 15000000 MemCap: hit reduction (mchr) : 2048 Pathfinder options (-PF): Use quick rule (uqr) : [san] yes [454] yes Quick rule min len 1 (qrml1) : [san] 200 [454] 80 Quick rule min sim 1 (qrms1) : [san] 90 [454] 90 Quick rule min len 2 (qrml2) : [san] 100 [454] 60 Quick rule min sim 2 (qrms2) : [san] 95 [454] 95 Backbone quick overlap min len (bqoml) : [san] 150 [454] 80 Align parameters for Smith-Waterman align (-AL): Bandwidth in percent (bip) : [san] 15 [454] 20 Bandwidth max (bmax) : [san] 100 [454] 80 Bandwidth min (bmin) : [san] 25 [454] 20 Minimum score (ms) : [san] 30 [454] 15 Minimum overlap (mo) : [san] 15 [454] 40 Minimum relative score in % (mrs) : [san] 80 [454] 80 Solexa_hack_max_errors (shme) : [san] 0 [454] 0 Extra gap penalty (egp) : [san] yes [454] yes extra gap penalty level (egpl) : [san] reject_codongaps [454] reject_codongaps Max. egp in percent (megpp) : [san] 100 [454] 100 Contig parameters (-CO): Name prefix (np) : Helicoverpa Reject on drop in relative alignment score in % (rodirs) : [san] 20 [454] 30 Mark repeats (mr) : yes Only in result (mroir) : no Assume SNP instead of repeats (asir) : no Minimum reads per group needed for tagging (mrpg) : [san] 2 [454] 4 Minimum neighbour quality needed for tagging (mnq) : [san] 20 [454] 20 Minimum Group Quality needed for RMB Tagging (mgqrt) : [san] 30 [454] 25 End-read Marking Exclusion Area in bases (emea) : [san] 25 [454] 10 Also mark gap bases (amgb) : [san] yes [454] no Also mark gap bases - even multicolumn (amgbemc) : [san] yes [454] yes Also mark gap bases - need both strands (amgbnbs): [san] yes [454] yes Force non-IUPAC consensus per sequencing type (fnicpst) : [san] no [454] no Merge short reads (msr) : [san] no [454] no Gap override ratio (gor) : [san] 66 [454] 66 Edit options (-ED): Automatic contig editing (ace) : [san] no [454] yes Sanger only: Strict editing mode (sem) : no Confirmation threshold in percent (ct) : 50 Directories (-DI): When loading EXP files : When loading SCF files : Top directory for writing files : Helicoverpa_assembly For writing result files : Helicoverpa_assembly/Helicoverpa_d_results For writing result info files : Helicoverpa_assembly/Helicoverpa_d_info For writing log files : Helicoverpa_assembly/Helicoverpa_d_log For writing checkpoint files : Helicoverpa_assembly/Helicoverpa_d_chkpt File names (-FN): When loading sequences from FASTA : [san] Helicoverpa_in.sanger.fasta [454] Helicoverpa_in.454.fasta When loading qualities from FASTA quality : [san] Helicoverpa_in.sanger.fasta.qual [454] Helicoverpa_in.454.fasta.qual When loading sequences from FASTQ : [san] Helicoverpa_in.sanger.fastq [454] Helicoverpa_in.454.fastq When loading project from CAF : Helicoverpa_in.sanger.caf When loading project from MAF (disabled) : Helicoverpa_in.sanger.maf When loading EXP fofn : Helicoverpa_in.fofn When loading project from PHD : Helicoverpa_in.phd.1 When loading strain data : Helicoverpa_straindata_in.txt When loading XML trace info files : [san] Helicoverpa_traceinfo_in.sanger.xml [454] Helicoverpa_traceinfo_in.454.xml When loading SSAHA vector screen results : Helicoverpa_ssaha2vectorscreen_in.txt When loading backbone from CAF : Helicoverpa_backbone_in.caf When loading backbone from GenBank : Helicoverpa_backbone_in.gbf When loading backbone from FASTA : Helicoverpa_backbone_in.fasta Output files (-OUTPUT/-OUT): Save simple singlets in project (sssip) : [san] no [454] no Save tagged singlets in project (stsip) : [san] yes [454] yes Remove rollover logs (rrol) : yes Remove log directory (rld) : no Result files: Saved as CAF (orc) : yes Saved as FASTA (orf) : yes Saved as GAP4 (directed assembly) (org) : no Saved as phrap ACE (ora) : yes Saved as HTML (orh) : no Saved as Transposed Contig Summary (ors) : yes Saved as simple text format (ort) : no Saved as wiggle (orw) : no Temporary result files: Saved as CAF (otc) : yes Saved as CAF (otm) : no Saved as FASTA (otf) : no Saved as GAP4 (directed assembly) (otg) : no Saved as phrap ACE (ota) : no Saved as HTML (oth) : no Saved as Transposed Contig Summary (ots) : no Saved as simple text format (ott) : no Extended temporary result files: Saved as CAF (oetc) : no Saved as FASTA (oetf) : no Saved as GAP4 (directed assembly) (oetg) : no Saved as phrap ACE (oeta) : no Saved as HTML (oeth) : no Save also singlets (oetas) : no Alignment output customisation: TEXT characters per line (tcpl) : 60 HTML characters per line (hcpl) : 60 TEXT end gap fill character (tegfc) : HTML end gap fill character (hegfc) : File / directory output names: CAF : Helicoverpa_out.caf MAF : Helicoverpa_out.maf FASTA : Helicoverpa_out.unpadded.fasta FASTA quality : Helicoverpa_out.unpadded.fasta.qual FASTA (padded) : Helicoverpa_out.padded.fasta FASTA qual.(pad): Helicoverpa_out.padded.fasta.qual GAP4 (directory): Helicoverpa_out.gap4da ACE : Helicoverpa_out.ace HTML : Helicoverpa_out.html Simple text : Helicoverpa_out.txt TCS overview : Helicoverpa_out.tcs Wiggle : Helicoverpa_out.wig ------------------------------------------------------------------------------ Creating directory Helicoverpa_assembly ... done. Creating directory Helicoverpa_assembly/Helicoverpa_d_log ... done. Creating directory Helicoverpa_assembly/Helicoverpa_d_results ... done. Creating directory Helicoverpa_assembly/Helicoverpa_d_info ... done. Creating directory Helicoverpa_assembly/Helicoverpa_d_chkpt ... done. Localtime: Tue Dec 22 14:54:39 2009 Loading backbone from FASTA file: Helicoverpa_backbone_in.fasta (quality: Helicoverpa_backbone_in.fasta.qual) Counting sequences in FASTA file: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Found 360369 sequences. Loading data from FASTA file: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Could not find FASTA quality file Helicoverpa_backbone_in.fasta.qual, using default values for these reads. Done. Loaded 360369 reads, 0 of which have quality accounted for. ========================== Memory self assessment ============================== Running in 64 bit mode. Dump from /proc/meminfo -------------------------------------------------------------------------------- MemTotal: 16474196 kB MemFree: 88244 kB Buffers: 706472 kB Cached: 6774260 kB SwapCached: 0 kB Active: 6235844 kB Inactive: 5476676 kB SwapTotal: 3903752 kB SwapFree: 3895480 kB Dirty: 8 kB Writeback: 0 kB AnonPages: 4231868 kB Mapped: 30932 kB Slab: 4593788 kB SReclaimable: 4555952 kB SUnreclaim: 37836 kB PageTables: 29572 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 12140848 kB Committed_AS: 4504564 kB VmallocTotal: 34359738367 kB VmallocUsed: 302424 kB VmallocChunk: 34359435523 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB -------------------------------------------------------------------------------- Dump from /proc/self/status -------------------------------------------------------------------------------- Name: mira State: R (running) Tgid: 3847 Pid: 3847 PPid: 3215 TracerPid: 0 Uid: 1009 1009 1009 1009 Gid: 1010 1010 1010 1010 FDSize: 256 Groups: 1007 1010 1011 VmPeak: 4192040 kB VmSize: 4192040 kB VmLck: 0 kB VmHWM: 4171664 kB VmRSS: 4171664 kB VmData: 4187940 kB VmStk: 84 kB VmExe: 3992 kB VmLib: 0 kB VmPTE: 8192 kB Threads: 1 SigQ: 0/137216 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000000 SigCgt: 0000000180000000 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffffffffffff Cpus_allowed: 000000ff Cpus_allowed_list: 0-7 Mems_allowed: 00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 8 nonvoluntary_ctxt_switches: 434 -------------------------------------------------------------------------------- Information on current assembly object: AS_readpool: 360369 reads. AS_contigs: 0 contigs. AS_bbcontigs: 1490 contigs. Mem used for reads: 4103497564 (3.8 GiB) Memory used in assembly structures: Eff. Size Free cap. LostByAlign AS_writtenskimhitsperid: 0 12 B 0 B 0 B AS_skim_edges: 0 12 B 0 B 0 B AS_adsfacts: 0 12 B 0 B 0 B AS_confirmed_edges: 0 12 B 0 B 0 B AS_permanent_overlap_bans: 0 12 B 0 B 0 B AS_readhitmiss: 0 12 B 0 B 0 B AS_readhmcovered: 0 12 B 0 B 0 B AS_count_rhm: 0 12 B 0 B 0 B AS_clipleft: 0 12 B 0 B 0 B AS_clipright: 0 12 B 0 B 0 B AS_used_ids: 0 12 B 0 B 0 B AS_multicopies: 0 12 B 0 B 0 B AS_hasmcoverlaps: 0 12 B 0 B 0 B AS_maxcoveragereached: 0 12 B 0 B 0 B AS_coverageperseqtype: 0 12 B 0 B 0 B AS_istroublemaker: 0 12 B 0 B 0 B AS_isdebris: 0 12 B 0 B 0 B AS_needalloverlaps: 0 20 B 0 B 0 B AS_readsforrepeatresolve: 0 20 B 0 B 0 B AS_allrmbsok: 0 12 B 0 B 0 B AS_probablermbsnotok: 0 12 B 0 B 0 B AS_weakrmbsnotok: 0 12 B 0 B 0 B AS_readmaytakeskim: 0 20 B 0 B 0 B AS_skimstaken: 0 20 B 0 B 0 B AS_numskimoverlaps: 0 12 B 0 B 0 B AS_numleftextendskims: 0 12 B 0 B 0 B AS_rightextendskims: 0 12 B 0 B 0 B AS_skimleftextendratio: 0 12 B 0 B 0 B AS_skimrightextendratio: 0 12 B 0 B 0 B AS_usedlogfiles: 1 24 B 0 B 0 B Total: 4103497968 (3.8 GiB) ================================================================================ Dynamic allocs: 0 Align allocs: 0 Out of memory detected, exception message is: std::bad_alloc If you have questions on why this happened, please send the last 1000 lines (or better, the complete log) of the output log to the author (together with a short summary of your assembly project). For general help, you will probably get a quicker response on the MIRA talk mailing list than if you mailed the author directly. To report bugs or ask for features, please use the new ticketing system at: http://sourceforge.net/apps/trac/mira-assembler/ This ensures that requests don't get lost.
-fasta --job=mapping,est,normal,454 -project=Helicoverpa -SB:bbq=-1:bft=fasta 454_SETTINGS -CL:quality_clip=no:mbc=yes -ED:automatic_contig_editing=yes