[mira_talk] mapping assembly -> memory problem

  • From: Charles Imbusch <charles@xxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 22 Dec 2009 15:32:57 +0100

Hi all,

I'm trying to make use of mira's mapping assembly feature. As
a reference I use genomic data (a fasta file around 400MB) and
an EST dataset from the same species to be assembled.

In a very early stage, mira quits with the message:

Out of memory detected, exception message is: std::bad_alloc

I would be very interested in what went wrong. I attached the log file
and the parameters to this message. I worked with mira-3rc3e

Any hint is highly appreciated.

Cheers,
 Charles
This is MIRA V3rc4e (development version).

Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence
Assembly Using Trace Signals and Additional Sequence Information.
Computer Science and Biology: Proceedings of the German Conference on
Bioinformatics (GCB) 99, pp. 45-56.

Mail general questions to the MIRA talk mailing list:
        mira_talk@xxxxxxxxxxxxx

To (un-)subsubcribe the MIRA mailing lists, see:
        http://www.chevreux.org/mira_mailinglists.html

To report bugs or ask for features, please use the new ticketing system at:
        http://sourceforge.net/apps/trac/mira-assembler/
This ensures that requests don't get lost.


Compiled by: bach
Mon Dec 21 23:37:44 CET 2009
On: Linux varcadia32 2.6.27-11-generic #1 SMP Thu Jan 29 19:24:39 UTC 2009 i686 
GNU/Linux
Compiled in boundtracking mode.
Compiled in bugtracking mode.
Compilation settings (sorry, for debug):
        Size of size_t  : 4
        Size of uint32  : 4
        Size of uint32_t: 4
        Size of uint64  : 8
        Size of uint64_t: 8
Current system: Linux nougat 2.6.26-1-amd64 #1 SMP Fri Mar 13 17:46:45 UTC 2009 
x86_64 GNU/Linux



Parsing parameters: -parameters=parameters.par

Loading parameters from file: parameters.par





Parameters parsed without error, perfect.

------------------------------------------------------------------------------
Parameter settings seen for:
Sanger data (also common parameters), 454 data

Used parameter settings:
  General (-GE):
        Project name in (proin)                  : Helicoverpa
        Project name out (proout)                : Helicoverpa
        Number of threads (not)                  : 2
        Automatic memory management (amm)        : yes
            Keep percent memory free (kpmf)      : 10
            Max. process size (mps)              : 0
        Keep contigs in memory (kcim)            : no
        EST SNP pipeline step (esps)             : 1
        Use template information (uti)           :  [san]  yes
                                                    [454]  yes
            Template insert size minimum (tismin):  [san]  -1
                                                    [454]  -1
            Template insert size maximum (tismax):  [san]  -1
                                                    [454]  -1
        Colour reads by hash frequency (crhf)    : no

  Load reads options (-LR):
        Load sequence data (lsd)                    :  [san]  no
                                                       [454]  yes
            File type (ft)                          :  [san]  fasta
                                                       [454]  fasta
            External quality (eq)                   : from SCF (scf)
                Ext. qual. override (eqo)           : no
                Discard reads on e.q. error (droeqe): no
            Solexa scores in qual file (ssiqf)      : no
            FASTQ qual offset (fqqo)                :  [san]  0
                                                       [454]  0

        Wants quality file (wqf)                    :  [san]  yes
                                                       [454]  yes

        Read naming scheme (rns)                    :  [san] Sanger Institute 
(sanger)
                                                       [454] forward/reverse 
(fr)

        Merge with XML trace info (mxti)            :  [san]  no
                                                       [454]  yes

        Filecheck only (fo)                         : no

  Assembly options (-AS):
        Number of passes (nop)                      : 4
            Skim each pass (sep)                    : yes
        Maximum number of RMB break loops (rbl)     : 2

        Minimum read length (mrl)                   :  [san]  80
                                                       [454]  40
        Base default quality (bdq)                  :  [san]  10
                                                       [454]  10
        Enforce presence of qualities (epoq)        :  [san]  yes
                                                       [454]  yes

        Automatic repeat detection (ard)            : no
            Coverage threshold (ardct)              :  [san]  2
                                                       [454]  2
            Minimum length (ardml)                  :  [san]  400
                                                       [454]  200
            Grace length (ardgl)                    :  [san]  40
                                                       [454]  20
            Use uniform read distribution (urd)     : no
              Start in pass (urdsip)                : 3
              Cutoff multiplier (urdcm)             :  [san]  1.5
                                                       [454]  1.5
        Keep long repeats separated (klrs)          : no

        Spoiler detection (sd)                      : no
            Last pass only (sdlpo)                  : yes

        Use genomic pathfinder (ugpf)               : no

        Use emergency search stop (uess)            : yes
            ESS partner depth (esspd)               : 500
        Use emergency blacklist (uebl)              : yes
        Use max. contig build time (umcbt)          : yes
            Build time in seconds (bts)             : 3600

  Strain and backbone options (-SB):
        Load straindata (lsd)                       : no
        Load backbone (lb)                          : yes
            Start backbone usage in pass (sbuip)    : 0
            Backbone file type (bft)                : fasta
            Backbone base quality (bbq)             : -1
            Backbone strain name (bsn)              : 
                Force for all (bsnffa)              : no
            Backbone rail from strain (brfs)        : 
            Backbone rail length (brl)              : 0
            Backbone rail overlap (bro)             : 0
            Also build new contigs (abnc)           : no

  Dataprocessing options (-DP):
        Use read extensions (ure)                   :  [san]  no
                                                       [454]  no
            Read extension window length (rewl)     :  [san]  30
                                                       [454]  15
            Read extension w. maxerrors (rewme)     :  [san]  2
                                                       [454]  2
            First extension in pass (feip)          :  [san]  0
                                                       [454]  0
            Last extension in pass (leip)           :  [san]  0
                                                       [454]  0

  Clipping options (-CL):
        Merge with SSAHA vector screen (msvs)       :  [san]  no
                                                       [454]  no
            Gap size (msvsgs)                       :  [san]  10
                                                       [454]  8
            Max front gap (msvsmfg)                 :  [san]  60
                                                       [454]  8
            Max end gap (msvsmeg)                   :  [san]  120
                                                       [454]  12
            Strict front clip (msvssfc)             :  [san]  0
                                                       [454]  0
            Strict end clip (msvssec)               :  [san]  0
                                                       [454]  0
        Possible vector leftover clip (pvlc)        :  [san]  no
                                                       [454]  no
            maximum len allowed (pvcmla)            :  [san]  18
                                                       [454]  18
        Quality clip (qc)                           :  [san]  yes
                                                       [454]  no
            Minimum quality (qcmq)                  :  [san]  20
                                                       [454]  20
            Window length (qcwl)                    :  [san]  30
                                                       [454]  30
        Bad stretch quality clip (bsqc)             :  [san]  no
                                                       [454]  no
            Minimum quality (bsqcmq)                :  [san]  20
                                                       [454]  5
            Window length (bsqcwl)                  :  [san]  30
                                                       [454]  20
        Masked bases clip (mbc)                     :  [san]  yes
                                                       [454]  yes
            Gap size (mbcgs)                        :  [san]  20
                                                       [454]  5
            Max front gap (mbcmfg)                  :  [san]  40
                                                       [454]  12
            Max end gap (mbcmeg)                    :  [san]  60
                                                       [454]  12
        Lower case clip (llc)                       :  [san]  no
                                                       [454]  yes
        Clip poly A/T at ends (cpat)                :  [san]  yes
                                                       [454]  yes
            Keep poly-a signal (cpkps)              :  [san]  no
                                                       [454]  no
            Minimum signal length (cpmsl)           :  [san]  12
                                                       [454]  12
            Max errors allowed (cpmea)              :  [san]  1
                                                       [454]  1
            Max gap from ends (cpmgfe)              :  [san]  20000
                                                       [454]  20000
        Ensure minimum left clip (emlc)             :  [san]  no
                                                       [454]  no
            Minimum left clip req. (mlcr)           :  [san]  25
                                                       [454]  4
            Set minimum left clip to (smlc)         :  [san]  30
                                                       [454]  4
        Ensure minimum right clip (emrc)            :  [san]  no
                                                       [454]  no
            Minimum right clip req. (mrcr)          :  [san]  10
                                                       [454]  10
            Set minimum right clip to (smrc)        :  [san]  20
                                                       [454]  15

        Propose end clips (pec)                     : no
            Bases per hash (pecbph)                 : 0

  Parameters for SKIM algorithm (-SK):
        Number of threads (not)                     : 2

        Bases per hash (bph)                        : 17
        Hash save stepping (hss)                    : 4
        Percent required (pr)                       :  [san]  70
                                                       [454]  80

        Max hits per read (mhpr)                    : 30
        Mask nasty repeats (mnr)                    : no
            Nasty repeat ratio (nrr)                : 100
        Max. megahub ratio (mmhr)                   : 0

        Max hashes in memory (mhim)                 : 15000000
        MemCap: hit reduction (mchr)                : 2048

  Pathfinder options (-PF):
        Use quick rule (uqr)                        :  [san]  yes
                                                       [454]  yes
            Quick rule min len 1 (qrml1)            :  [san]  200
                                                       [454]  80
            Quick rule min sim 1 (qrms1)            :  [san]  90
                                                       [454]  90
            Quick rule min len 2 (qrml2)            :  [san]  100
                                                       [454]  60
            Quick rule min sim 2 (qrms2)            :  [san]  95
                                                       [454]  95
        Backbone quick overlap min len (bqoml)      :  [san]  150
                                                       [454]  80

  Align parameters for Smith-Waterman align (-AL):
        Bandwidth in percent (bip)             :  [san]  15
                                                  [454]  20
        Bandwidth max (bmax)                   :  [san]  100
                                                  [454]  80
        Bandwidth min (bmin)                   :  [san]  25
                                                  [454]  20
        Minimum score (ms)                     :  [san]  30
                                                  [454]  15
        Minimum overlap (mo)                   :  [san]  15
                                                  [454]  40
        Minimum relative score in % (mrs)      :  [san]  80
                                                  [454]  80
        Solexa_hack_max_errors (shme)          :  [san]  0
                                                  [454]  0
        Extra gap penalty (egp)                :  [san]  yes
                                                  [454]  yes
            extra gap penalty level (egpl)     :  [san]  reject_codongaps
                                                  [454]  reject_codongaps
            Max. egp in percent (megpp)        :  [san]  100
                                                  [454]  100

  Contig parameters (-CO):
        Name prefix (np)                                         : Helicoverpa
        Reject on drop in relative alignment score in % (rodirs) :  [san]  20
                                                                    [454]  30
        Mark repeats (mr)                                        : yes
            Only in result (mroir)                               : no
            Assume SNP instead of repeats (asir)                 : no
            Minimum reads per group needed for tagging (mrpg)    :  [san]  2
                                                                    [454]  4
            Minimum neighbour quality needed for tagging (mnq)   :  [san]  20
                                                                    [454]  20
            Minimum Group Quality needed for RMB Tagging (mgqrt) :  [san]  30
                                                                    [454]  25
            End-read Marking Exclusion Area in bases (emea)      :  [san]  25
                                                                    [454]  10
            Also mark gap bases (amgb)                           :  [san]  yes
                                                                    [454]  no
                Also mark gap bases - even multicolumn (amgbemc) :  [san]  yes
                                                                    [454]  yes
                Also mark gap bases - need both strands (amgbnbs):  [san]  yes
                                                                    [454]  yes
        Force non-IUPAC consensus per sequencing type (fnicpst)  :  [san]  no
                                                                    [454]  no
        Merge short reads (msr)                                  :  [san]  no
                                                                    [454]  no
        Gap override ratio (gor)                                 :  [san]  66
                                                                    [454]  66

  Edit options (-ED):
        Automatic contig editing (ace)              :  [san]  no
                                                       [454]  yes
     Sanger only:
        Strict editing mode (sem)                   : no
        Confirmation threshold in percent (ct)      : 50

  Directories (-DI):
        When loading EXP files            : 
        When loading SCF files            : 
        Top directory for writing files   : Helicoverpa_assembly
        For writing result files          : 
Helicoverpa_assembly/Helicoverpa_d_results
        For writing result info files     : 
Helicoverpa_assembly/Helicoverpa_d_info
        For writing log files             : 
Helicoverpa_assembly/Helicoverpa_d_log
        For writing checkpoint files      : 
Helicoverpa_assembly/Helicoverpa_d_chkpt

  File names (-FN):
        When loading sequences from FASTA            :  [san]  
Helicoverpa_in.sanger.fasta
                                                        [454]  
Helicoverpa_in.454.fasta
        When loading qualities from FASTA quality    :  [san]  
Helicoverpa_in.sanger.fasta.qual
                                                        [454]  
Helicoverpa_in.454.fasta.qual
        When loading sequences from FASTQ            :  [san]  
Helicoverpa_in.sanger.fastq
                                                        [454]  
Helicoverpa_in.454.fastq
        When loading project from CAF                : Helicoverpa_in.sanger.caf
        When loading project from MAF (disabled)     : Helicoverpa_in.sanger.maf
        When loading EXP fofn                        : Helicoverpa_in.fofn
        When loading project from PHD                : Helicoverpa_in.phd.1
        When loading strain data                     : 
Helicoverpa_straindata_in.txt
        When loading XML trace info files            :  [san]  
Helicoverpa_traceinfo_in.sanger.xml
                                                        [454]  
Helicoverpa_traceinfo_in.454.xml
        When loading SSAHA vector screen results     : 
Helicoverpa_ssaha2vectorscreen_in.txt

        When loading backbone from CAF               : 
Helicoverpa_backbone_in.caf
        When loading backbone from GenBank           : 
Helicoverpa_backbone_in.gbf
        When loading backbone from FASTA             : 
Helicoverpa_backbone_in.fasta


  Output files (-OUTPUT/-OUT):
        Save simple singlets in project (sssip)      :  [san]  no
                                                        [454]  no
        Save tagged singlets in project (stsip)      :  [san]  yes
                                                        [454]  yes

        Remove rollover logs (rrol)                  : yes
        Remove log directory (rld)                   : no

    Result files:
        Saved as CAF                       (orc)     : yes
        Saved as FASTA                     (orf)     : yes
        Saved as GAP4 (directed assembly)  (org)     : no
        Saved as phrap ACE                 (ora)     : yes
        Saved as HTML                      (orh)     : no
        Saved as Transposed Contig Summary (ors)     : yes
        Saved as simple text format        (ort)     : no
        Saved as wiggle                    (orw)     : no

    Temporary result files:
        Saved as CAF                       (otc)     : yes
        Saved as CAF                       (otm)     : no
        Saved as FASTA                     (otf)     : no
        Saved as GAP4 (directed assembly)  (otg)     : no
        Saved as phrap ACE                 (ota)     : no
        Saved as HTML                      (oth)     : no
        Saved as Transposed Contig Summary (ots)     : no
        Saved as simple text format        (ott)     : no

    Extended temporary result files:
        Saved as CAF                      (oetc)     : no
        Saved as FASTA                    (oetf)     : no
        Saved as GAP4 (directed assembly) (oetg)     : no
        Saved as phrap ACE                (oeta)     : no
        Saved as HTML                     (oeth)     : no
        Save also singlets               (oetas)     : no

    Alignment output customisation:
        TEXT characters per line (tcpl)              : 60
        HTML characters per line (hcpl)              : 60
        TEXT end gap fill character (tegfc)          :  
        HTML end gap fill character (hegfc)          :  

    File / directory output names:
        CAF             : Helicoverpa_out.caf
        MAF             : Helicoverpa_out.maf
        FASTA           : Helicoverpa_out.unpadded.fasta
        FASTA quality   : Helicoverpa_out.unpadded.fasta.qual
        FASTA (padded)  : Helicoverpa_out.padded.fasta
        FASTA qual.(pad): Helicoverpa_out.padded.fasta.qual
        GAP4 (directory): Helicoverpa_out.gap4da
        ACE             : Helicoverpa_out.ace
        HTML            : Helicoverpa_out.html
        Simple text     : Helicoverpa_out.txt
        TCS overview    : Helicoverpa_out.tcs
        Wiggle          : Helicoverpa_out.wig
------------------------------------------------------------------------------
Creating directory Helicoverpa_assembly ... done.
Creating directory Helicoverpa_assembly/Helicoverpa_d_log ... done.
Creating directory Helicoverpa_assembly/Helicoverpa_d_results ... done.
Creating directory Helicoverpa_assembly/Helicoverpa_d_info ... done.
Creating directory Helicoverpa_assembly/Helicoverpa_d_chkpt ... done.
Localtime: Tue Dec 22 14:54:39 2009

Loading backbone from FASTA file: Helicoverpa_backbone_in.fasta (quality: 
Helicoverpa_backbone_in.fasta.qual)
Counting sequences in FASTA file:
 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... 
[50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... 
[100%] 
Found 360369 sequences.
Loading data from FASTA file:
 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... 
[50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... 
[100%] 
Could not find FASTA quality file Helicoverpa_backbone_in.fasta.qual, using 
default values for these reads.

Done.
Loaded 360369 reads, 0 of which have quality accounted for.


========================== Memory self assessment ==============================
Running in 64 bit mode.

Dump from /proc/meminfo
--------------------------------------------------------------------------------
MemTotal:     16474196 kB
MemFree:         88244 kB
Buffers:        706472 kB
Cached:        6774260 kB
SwapCached:          0 kB
Active:        6235844 kB
Inactive:      5476676 kB
SwapTotal:     3903752 kB
SwapFree:      3895480 kB
Dirty:               8 kB
Writeback:           0 kB
AnonPages:     4231868 kB
Mapped:          30932 kB
Slab:          4593788 kB
SReclaimable:  4555952 kB
SUnreclaim:      37836 kB
PageTables:      29572 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
WritebackTmp:        0 kB
CommitLimit:  12140848 kB
Committed_AS:  4504564 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    302424 kB
VmallocChunk: 34359435523 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     2048 kB
--------------------------------------------------------------------------------

Dump from /proc/self/status
--------------------------------------------------------------------------------
Name:   mira
State:  R (running)
Tgid:   3847
Pid:    3847
PPid:   3215
TracerPid:      0
Uid:    1009    1009    1009    1009
Gid:    1010    1010    1010    1010
FDSize: 256
Groups: 1007 1010 1011 
VmPeak:  4192040 kB
VmSize:  4192040 kB
VmLck:         0 kB
VmHWM:   4171664 kB
VmRSS:   4171664 kB
VmData:  4187940 kB
VmStk:        84 kB
VmExe:      3992 kB
VmLib:         0 kB
VmPTE:      8192 kB
Threads:        1
SigQ:   0/137216
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed:   000000ff
Cpus_allowed_list:      0-7
Mems_allowed:   00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        8
nonvoluntary_ctxt_switches:     434
--------------------------------------------------------------------------------

Information on current assembly object:

AS_readpool: 360369 reads.
AS_contigs: 0 contigs.
AS_bbcontigs: 1490 contigs.
Mem used for reads: 4103497564 (3.8 GiB)

Memory used in assembly structures:
                                           Eff. Size   Free cap. LostByAlign
     AS_writtenskimhitsperid:          0        12 B         0 B         0 B
               AS_skim_edges:          0        12 B         0 B         0 B
                 AS_adsfacts:          0        12 B         0 B         0 B
          AS_confirmed_edges:          0        12 B         0 B         0 B
   AS_permanent_overlap_bans:          0        12 B         0 B         0 B
              AS_readhitmiss:          0        12 B         0 B         0 B
            AS_readhmcovered:          0        12 B         0 B         0 B
                AS_count_rhm:          0        12 B         0 B         0 B
                 AS_clipleft:          0        12 B         0 B         0 B
                AS_clipright:          0        12 B         0 B         0 B
                 AS_used_ids:          0        12 B         0 B         0 B
              AS_multicopies:          0        12 B         0 B         0 B
            AS_hasmcoverlaps:          0        12 B         0 B         0 B
       AS_maxcoveragereached:          0        12 B         0 B         0 B
       AS_coverageperseqtype:          0        12 B         0 B         0 B
           AS_istroublemaker:          0        12 B         0 B         0 B
                 AS_isdebris:          0        12 B         0 B         0 B
          AS_needalloverlaps:          0        20 B         0 B         0 B
    AS_readsforrepeatresolve:          0        20 B         0 B         0 B
                AS_allrmbsok:          0        12 B         0 B         0 B
        AS_probablermbsnotok:          0        12 B         0 B         0 B
            AS_weakrmbsnotok:          0        12 B         0 B         0 B
          AS_readmaytakeskim:          0        20 B         0 B         0 B
               AS_skimstaken:          0        20 B         0 B         0 B
          AS_numskimoverlaps:          0        12 B         0 B         0 B
       AS_numleftextendskims:          0        12 B         0 B         0 B
         AS_rightextendskims:          0        12 B         0 B         0 B
      AS_skimleftextendratio:          0        12 B         0 B         0 B
     AS_skimrightextendratio:          0        12 B         0 B         0 B
             AS_usedlogfiles:          1        24 B         0 B         0 B
Total: 4103497968 (3.8 GiB)

================================================================================
Dynamic allocs: 0
Align allocs: 0
Out of memory detected, exception message is: std::bad_alloc


If you have questions on why this happened, please send the last 1000
lines (or better, the complete log) of the output log to the author
(together with a short summary of your assembly project).



For general help, you will probably get a quicker response on the
    MIRA talk mailing list
than if you mailed the author directly.

To report bugs or ask for features, please use the new ticketing system at:
        http://sourceforge.net/apps/trac/mira-assembler/
This ensures that requests don't get lost.

-fasta 

--job=mapping,est,normal,454 

-project=Helicoverpa

-SB:bbq=-1:bft=fasta

454_SETTINGS
-CL:quality_clip=no:mbc=yes
-ED:automatic_contig_editing=yes

Other related posts: