[mira_talk] mira assembly error

  • From: "Chauhan, Archana" <achauha1@xxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Tue, 24 Apr 2012 19:25:04 +0000

Hi,
      I am trying to assemble my 454 upaired and the paired reads. I used the 
following command to direct the output to a directory "$TMPDIR".    This 
directory is guaranteed to use a local hard drive.
"mira --project=HK44 --job=denovo,genome,accurate,454, -DI:$TMPDIR 
>&log_assembly"

and got the following error:

Creating directory HK44_assembly ... done.
Creating directory /6621804.1.medium_chi ...
Fatal error (may be due to problems of the input data or parameters):

"Could not make sure that a needed directory exists, aborting MIRA."

->Thrown: void Assembly::ensureStandardDirectories(bool purge)
->Caught: main

Aborting process, probably due to error in the input data or parametrisation.
Please check the output log for more information.
For help, please write a mail to the mira talk mailing list.

I double checked with our administrator to make sure that the directory is not 
on the NFS mount.

I than ran the following command (not recomended) to make mira run:

"mira --project=HK44 --job=denovo,genome,accurate,454, -MI:sonfs=no 
>&log_assembly"

And I got the following error:

WARNING WARNING WARNING!

It looks like the tmp directory is on a NFS (Network File System) mount. This
will slow down MIRA *considerably* ... by about a factor of 10!
If you don't want that, you have three possibilities:

1) RECOMMENDED! Use -DI:trt to redirect the tmp directory somewhere else on a
   local disk or even SSD.
2) POSSIBLE: put the whole project somewhere else and restart MIRA.
3) ABSOLUTELY NOT RECOMMENDED AT ALL: use "-MI:sonfs=no" to tell MIRA not
   to stop when it finds the tmp directory on NFS.

If you do not know what NFS is and which directory to use in "-DI:trt", ask
your local system administrator to guide you.

Localtime: Tue Apr 24 13:50:39 2012

Loading data (454) from FASTQ files,
Localtime: Tue Apr 24 13:50:39 2012

Fatal error (may be due to problems of the input data or parameters):

"Could not open FASTQ file 'HK44_in.454.fastq'. Is it present? Is it readable? 
Did you want to load your data in another format?"

->Thrown: void ReadPool::loadDataFromFASTQ(const string & filename, const 
string & qualfilename, const bool generatefilenames, const uint8 seqtype, const 
uint8 loadaction)
->Caught: Assembly::loadFASTQ(const string & fastqfile, const string & 
fastaqualfile, const uint8 seqtype, const uint8 loadaction)
Aborting process, probably due to error in the input data or parametrisation.
Please check the output log for more information.
For help, please write a mail to the mira talk mailing list.

My data files look ok . Below is the snapshots of my qual and fasta files:


[cid:image003.jpg@01CD222E.60A1AA20]

[cid:image004.jpg@01CD222E.60A1AA20]

Also attached please find the log files. Can someone help me. I want mira to 
work because I have to do hybrid assembly using 454, illumina n sanger reads.

Thanks,
Arc

JPEG image

JPEG image

This is MIRA V3.4.0 (production version).

Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence
Assembly Using Trace Signals and Additional Sequence Information.
Computer Science and Biology: Proceedings of the German Conference on
Bioinformatics (GCB) 99, pp. 45-56.

To (un-)subscribe the MIRA mailing lists, see:
        http://www.chevreux.org/mira_mailinglists.html

After subscribing, mail general questions to the MIRA talk mailing list:
        mira_talk@xxxxxxxxxxxxx

To report bugs or ask for features, please use the new ticketing system at:
        http://sourceforge.net/apps/trac/mira-assembler/
This ensures that requests don't get lost.


Compiled by: bach
Sun Aug 21 17:50:30 CEST 2011
On: Linux arcadia 2.6.38-11-generic #48-Ubuntu SMP Fri Jul 29 19:02:55 UTC 2011 
x86_64 x86_64 x86_64 GNU/Linux
Compiled in boundtracking mode.
Compiled in bugtracking mode.
Compiled with ENABLE64 activated.
Runtime settings (sorry, for debug):
        Size of size_t  : 8
        Size of uint32  : 4
        Size of uint32_t: 4
        Size of uint64  : 8
        Size of uint64_t: 8
Current system: sh: module: line 1: syntax error: unexpected end of file
sh: error importing function definition for `module'
Linux chi29 2.6.18-194.17.1.el5 #1 SMP Wed Sep 29 12:09:22 EDT 2010 x86_64 
x86_64 x86_64 GNU/Linux



Parsing parameters: --project=HK44 --job=denovo,genome,accurate,454, 
-DI:/tmp/6621804.1.medium_chi


/

Parameters parsed without error, perfect.

-CL:pec and -CO:emeas1clpec are set, setting -CO:emea values to 1.
------------------------------------------------------------------------------
Parameter settings seen for:
Sanger data (also common parameters), 454 data

Used parameter settings:
  General (-GE):
        Project name in (proin)                     : HK44
        Project name out (proout)                   : HK44
        Number of threads (not)                     : 2
        Automatic memory management (amm)           : yes
            Keep percent memory free (kpmf)         : 15
            Max. process size (mps)                 : 0
        EST SNP pipeline step (esps)                : 0
        Use template information (uti)              :  [san]  yes
                                                       [454]  yes
            Template insert size minimum (tismin)   :  [san]  -1
                                                       [454]  -1
            Template insert size maximum (tismax)   :  [san]  -1
                                                       [454]  -1
            Template partner build direction (tpbd) :  [san]  -1
                                                       [454]  -1
        Colour reads by hash frequency (crhf)       : yes

  Load reads options (-LR):
        Load sequence data (lsd)                    :  [san]  no
                                                       [454]  yes
            File type (ft)                          :  [san]  fasta
                                                       [454]  fastq
            External quality (eq)                   : from SCF (scf)
                Ext. qual. override (eqo)           : no
                Discard reads on e.q. error (droeqe): no
            Solexa scores in qual file (ssiqf)      : no
            FASTQ qual offset (fqqo)                :  [san]  0
                                                       [454]  0

        Wants quality file (wqf)                    :  [san]  yes
                                                       [454]  yes

        Read naming scheme (rns)                    :  [san] Sanger Institute 
(sanger)
                                                       [454] forward/reverse 
(fr)

        Merge with XML trace info (mxti)            :  [san]  no
                                                       [454]  yes

        Filecheck only (fo)                         : no

  Assembly options (-AS):
        Number of passes (nop)                      : 5
            Skim each pass (sep)                    : yes
        Maximum number of RMB break loops (rbl)     : 3
        Maximum contigs per pass (mcpp)             : 0

        Minimum read length (mrl)                   :  [san]  80
                                                       [454]  40
        Minimum reads per contig (mrpc)             :  [san]  2
                                                       [454]  5
        Base default quality (bdq)                  :  [san]  10
                                                       [454]  10
        Enforce presence of qualities (epoq)        :  [san]  yes
                                                       [454]  yes

        Automatic repeat detection (ard)            : yes
            Coverage threshold (ardct)              :  [san]  2
                                                       [454]  2
            Minimum length (ardml)                  :  [san]  400
                                                       [454]  200
            Grace length (ardgl)                    :  [san]  40
                                                       [454]  20
            Use uniform read distribution (urd)     : no
              Start in pass (urdsip)                : 4
              Cutoff multiplier (urdcm)             :  [san]  1.5
                                                       [454]  1.5
        Keep long repeats separated (klrs)          : no

        Spoiler detection (sd)                      : yes
            Last pass only (sdlpo)                  : yes

        Use genomic pathfinder (ugpf)               : yes

        Use emergency search stop (uess)            : yes
            ESS partner depth (esspd)               : 500
        Use emergency blacklist (uebl)              : yes
        Use max. contig build time (umcbt)          : no
            Build time in seconds (bts)             : 10000

  Strain and backbone options (-SB):
        Load straindata (lsd)                       : no
        Assign default strain (ads)                 :  [san]  no
                                                       [454]  no
            Default strain name (dsn)               :  [san]  StrainX
                                                       [454]  StrainX
        Load backbone (lb)                          : no
            Start backbone usage in pass (sbuip)    : 3
            Backbone file type (bft)                : fasta
            Backbone base quality (bbq)             : 30
            Backbone strain name (bsn)              : ReferenceStrain
                Force for all (bsnffa)              : no
            Backbone rail from strain (brfs)        : 
            Backbone rail length (brl)              : 0
            Backbone rail overlap (bro)             : 0
            Also build new contigs (abnc)           : yes

  Dataprocessing options (-DP):
        Use read extensions (ure)                   :  [san]  yes
                                                       [454]  no
            Read extension window length (rewl)     :  [san]  30
                                                       [454]  15
            Read extension w. maxerrors (rewme)     :  [san]  2
                                                       [454]  2
            First extension in pass (feip)          :  [san]  0
                                                       [454]  0
            Last extension in pass (leip)           :  [san]  0
                                                       [454]  0

  Clipping options (-CL):
        Merge with SSAHA2/SMALT vector screen (msvs):  [san]  no
                                                       [454]  no
            Gap size (msvsgs)                       :  [san]  10
                                                       [454]  8
            Max front gap (msvsmfg)                 :  [san]  60
                                                       [454]  8
            Max end gap (msvsmeg)                   :  [san]  120
                                                       [454]  12
            Strict front clip (msvssfc)             :  [san]  0
                                                       [454]  0
            Strict end clip (msvssec)               :  [san]  0
                                                       [454]  0
        Possible vector leftover clip (pvlc)        :  [san]  yes
                                                       [454]  no
            maximum len allowed (pvcmla)            :  [san]  18
                                                       [454]  18
        Min qual. threshold for entire read (mqtfer):  [san]  0
                                                       [454]  0
            Number of bases (mqtfernob)             :  [san]  0
                                                       [454]  0
        Quality clip (qc)                           :  [san]  no
                                                       [454]  no
            Minimum quality (qcmq)                  :  [san]  20
                                                       [454]  20
            Window length (qcwl)                    :  [san]  30
                                                       [454]  30
        Bad stretch quality clip (bsqc)             :  [san]  yes
                                                       [454]  no
            Minimum quality (bsqcmq)                :  [san]  20
                                                       [454]  5
            Window length (bsqcwl)                  :  [san]  30
                                                       [454]  20
        Masked bases clip (mbc)                     :  [san]  yes
                                                       [454]  yes
            Gap size (mbcgs)                        :  [san]  20
                                                       [454]  5
            Max front gap (mbcmfg)                  :  [san]  40
                                                       [454]  12
            Max end gap (mbcmeg)                    :  [san]  60
                                                       [454]  12
        Lower case clip (lcc)                       :  [san]  no
                                                       [454]  yes
        Clip poly A/T at ends (cpat)                :  [san]  no
                                                       [454]  no
            Keep poly-a signal (cpkps)              :  [san]  no
                                                       [454]  no
            Minimum signal length (cpmsl)           :  [san]  12
                                                       [454]  12
            Max errors allowed (cpmea)              :  [san]  1
                                                       [454]  1
            Max gap from ends (cpmgfe)              :  [san]  9
                                                       [454]  9
        Clip 3 prime polybase (c3pp)                :  [san]  no
                                                       [454]  no
            Minimum signal length (c3ppmsl)         :  [san]  12
                                                       [454]  12
            Max errors allowed (c3ppmea)            :  [san]  2
                                                       [454]  2
            Max gap from ends (c3ppmgfe)            :  [san]  9
                                                       [454]  9
        Clip known adaptors right (ckar)            :  [san]  no
                                                       [454]  yes
        Ensure minimum left clip (emlc)             :  [san]  yes
                                                       [454]  no
            Minimum left clip req. (mlcr)           :  [san]  25
                                                       [454]  4
            Set minimum left clip to (smlc)         :  [san]  30
                                                       [454]  4
        Ensure minimum right clip (emrc)            :  [san]  no
                                                       [454]  no
            Minimum right clip req. (mrcr)          :  [san]  10
                                                       [454]  10
            Set minimum right clip to (smrc)        :  [san]  20
                                                       [454]  15

        Apply SKIM chimera detection clip (ascdc)   : yes
        Apply SKIM junk detection clip (asjdc)      : no

        Propose end clips (pec)                     : yes
            Bases per hash (pecbph)                 : 27
            Handle Solexa GGCxG problem (pechsgp)   : yes

        Clip bad solexa ends (cbse)                 : yes

  Parameters for SKIM algorithm (-SK):
        Number of threads (not)                     : 2

        Also compute reverse complements (acrc)     : yes
        Bases per hash (bph)                        : 21
        Hash save stepping (hss)                    : 1
        Percent required (pr)                       :  [san]  70
                                                       [454]  80

        Max hits per read (mhpr)                    : 2000
        Max megahub ratio (mmhr)                    : 0

        SW check on backbones (swcob)               : no

        Freq. est. min normal (fenn)                : 0.4
        Freq. est. max normal (fexn)                : 1.6
        Freq. est. repeat (fer)                     : 1.9
        Freq. est. heavy repeat (fehr)              : 8
        Freq. est. crazy (fecr)                     : 20
        Mask nasty repeats (mnr)                    : yes
            Nasty repeat ratio (nrr)                : 100
        Repeat level in info file (rliif)           : 6

        Max hashes in memory (mhim)                 : 15000000
        MemCap: hit reduction (mchr)                : 2048

  Pathfinder options (-PF):
        Use quick rule (uqr)                        :  [san]  yes
                                                       [454]  yes
            Quick rule min len 1 (qrml1)            :  [san]  200
                                                       [454]  80
            Quick rule min sim 1 (qrms1)            :  [san]  90
                                                       [454]  90
            Quick rule min len 2 (qrml2)            :  [san]  100
                                                       [454]  60
            Quick rule min sim 2 (qrms2)            :  [san]  95
                                                       [454]  95
        Backbone quick overlap min len (bqoml)      :  [san]  150
                                                       [454]  80
        Max. start cache fill time (mscft)          : 5

  Align parameters for Smith-Waterman align (-AL):
        Bandwidth in percent (bip)             :  [san]  20
                                                  [454]  20
        Bandwidth max (bmax)                   :  [san]  130
                                                  [454]  80
        Bandwidth min (bmin)                   :  [san]  25
                                                  [454]  20
        Minimum score (ms)                     :  [san]  30
                                                  [454]  15
        Minimum overlap (mo)                   :  [san]  17
                                                  [454]  20
        Minimum relative score in % (mrs)      :  [san]  70
                                                  [454]  70
        Solexa_hack_max_errors (shme)          :  [san]  0
                                                  [454]  0
        Extra gap penalty (egp)                :  [san]  no
                                                  [454]  yes
            extra gap penalty level (egpl)     :  [san] low
                                                  [454] reject_codongaps
            Max. egp in percent (megpp)        :  [san]  100
                                                  [454]  100

  Contig parameters (-CO):
        Name prefix (np)                                         : HK44
        Reject on drop in relative alignment score in % (rodirs) :  [san]  25
                                                                    [454]  30
        Mark repeats (mr)                                        : yes
            Only in result (mroir)                               : no
            Assume SNP instead of repeats (asir)                 : no
            Minimum reads per group needed for tagging (mrpg)    :  [san]  2
                                                                    [454]  4
            Minimum neighbour quality needed for tagging (mnq)   :  [san]  20
                                                                    [454]  20
            Minimum Group Quality needed for RMB Tagging (mgqrt) :  [san]  30
                                                                    [454]  25
            End-read Marking Exclusion Area in bases (emea)      :  [san]  1
                                                                    [454]  1
                Set to 1 on clipping PEC (emeas1clpec)           : yes
            Also mark gap bases (amgb)                           :  [san]  yes
                                                                    [454]  no
                Also mark gap bases - even multicolumn (amgbemc) :  [san]  yes
                                                                    [454]  yes
                Also mark gap bases - need both strands (amgbnbs):  [san]  yes
                                                                    [454]  yes
        Force non-IUPAC consensus per sequencing type (fnicpst)  :  [san]  no
                                                                    [454]  no
        Merge short reads (msr)                                  :  [san]  no
                                                                    [454]  no
            Keep ends unmerged (msrkeu)                          :  [san]  -1
                                                                    [454]  -1
        Gap override ratio (gor)                                 :  [san]  66
                                                                    [454]  66

  Edit options (-ED):
        Automatic contig editing (ace)              :  [san]  no
                                                       [454]  yes
     Sanger only:
        Strict editing mode (sem)                   : no
        Confirmation threshold in percent (ct)      : 50

  Misc (-MI):
        Stop on NFS (sonfs)                         : yes
        Extended log (el)                           : no
        Large contig size (lcs)                     : 500
        Large contig size for stats(lcs4s)          : 5000
        Stop on max read name length (somrnl)       : 40

  Directories (-DI):
        Working directory                 : 
        When loading EXP files            : 
        When loading SCF files            : 
        Top directory for writing files   : HK44_assembly
        For writing result files          : HK44_assembly/HK44_d_results
        For writing result info files     : HK44_assembly/HK44_d_info
        For writing tmp files             : /6621804.1.medium_chi
        Tmp redirected to (trt)           : 
        For writing checkpoint files      : HK44_assembly/HK44_d_chkpt

  File names (-FN):
        When loading sequences from FASTA            :  [san]  
HK44_in.sanger.fasta
                                                        [454]  HK44_in.454.fasta
        When loading qualities from FASTA quality    :  [san]  
HK44_in.sanger.fasta.qual
                                                        [454]  
HK44_in.454.fasta.qual
        When loading sequensh: module: line 1: syntax error: unexpected end of 
file
sh: error importing function definition for `module'
sh: module: line 1: syntax error: unexpected end of file
sh: error importing function definition for `module'
mkdir: cannot create directory `/6621804.1.medium_chi': Read-only file system
Could not create directory '/6621804.1.medium_chi'.: No such file or directory
ces from FASTQ            :  [san]  HK44_in.sanger.fastq
                                                        [454]  HK44_in.454.fastq
        When loading project from CAF                : HK44_in.sanger.caf
        When loading project from MAF (disabled)     : HK44_in.sanger.maf
        When loading EXP fofn                        : HK44_in.sanger.fofn
        When loading project from PHD                : HK44_in.phd.1
        When loading strain data                     : HK44_straindata_in.txt
        When loading XML trace info files            :  [san]  
HK44_traceinfo_in.sanger.xml
                                                        [454]  
HK44_traceinfo_in.454.xml
        When loading SSAHA2 vector screen results    : 
HK44_ssaha2vectorscreen_in.txt
        When loading SMALT vector screen results     : 
HK44_smaltvectorscreen_in.txt

        When loading backbone from MAF               : HK44_backbone_in.maf
        When loading backbone from CAF               : HK44_backbone_in.caf
        When loading backbone from GenBank           : HK44_backbone_in.gbf
        When loading backbone from GFF3              : HK44_backbone_in.gff3
        When loading backbone from FASTA             : HK44_backbone_in.fasta


  Output files (-OUTPUT/-OUT):
        Save simple singlets in project (sssip)      :  [san]  no
                                                        [454]  no
        Save tagged singlets in project (stsip)      :  [san]  yes
                                                        [454]  yes

        Remove rollover tmps (rrot)                  : yes
        Remove tmp directory (rtd)                   : no

    Result files:
        Saved as CAF                       (orc)     : yes
        Saved as MAF                       (orm)     : yes
        Saved as FASTA                     (orf)     : yes
        Saved as GAP4 (directed assembly)  (org)     : no
        Saved as phrap ACE                 (ora)     : yes
        Saved as GFF3                     (org3)     : no
        Saved as HTML                      (orh)     : no
        Saved as Transposed Contig Summary (ors)     : yes
        Saved as simple text format        (ort)     : no
        Saved as wiggle                    (orw)     : yes

    Temporary result files:
        Saved as CAF                       (otc)     : yes
        Saved as MAF                       (otm)     : no
        Saved as FASTA                     (otf)     : no
        Saved as GAP4 (directed assembly)  (otg)     : no
        Saved as phrap ACE                 (ota)     : no
        Saved as HTML                      (oth)     : no
        Saved as Transposed Contig Summary (ots)     : no
        Saved as simple text format        (ott)     : no

    Extended temporary result files:
        Saved as CAF                      (oetc)     : no
        Saved as FASTA                    (oetf)     : no
        Saved as GAP4 (directed assembly) (oetg)     : no
        Saved as phrap ACE                (oeta)     : no
        Saved as HTML                     (oeth)     : no
        Save also singlets               (oetas)     : no

    Alignment output customisation:
        TEXT characters per line (tcpl)              : 60
        HTML characters per line (hcpl)              : 60
        TEXT end gap fill character (tegfc)          :  
        HTML end gap fill character (hegfc)          :  

    File / directory output names:
        CAF             : HK44_out.caf
        MAF             : HK44_out.maf
        FASTA           : HK44_out.unpadded.fasta
        FASTA quality   : HK44_out.unpadded.fasta.qual
        FASTA (padded)  : HK44_out.padded.fasta
        FASTA qual.(pad): HK44_out.padded.fasta.qual
        GAP4 (directory): HK44_out.gap4da
        ACE             : HK44_out.ace
        HTML            : HK44_out.html
        Simple text     : HK44_out.txt
        TCS overview    : HK44_out.tcs
        Wiggle          : HK44_out.wig
------------------------------------------------------------------------------
Creating directory HK44_assembly ... done.
Creating directory /6621804.1.medium_chi ... 
Fatal error (may be due to problems of the input data or parameters):

"Could not make sure that a needed directory exists, aborting MIRA."

->Thrown: void Assembly::ensureStandardDirectories(bool purge)
->Caught: main

Aborting process, probably due to error in the input data or parametrisation.
Please check the output log for more information.
For help, please write a mail to the mira talk mailing list.

Subscribing / unsubscribing to mira talk, see: 
//www.freelists.org/list/mira_talk

CWD: /data/achauha1/grid/assembly/arc_042312
Thank you for noticing that this is *NOT* a crash, but a
controlled program stop.

Other related posts: