[mira_talk] mira Error

From: "zhangzan125@xxxxxxxxx" <zhangzan125@xxxxxxxxx>
To: mira_talk <mira_talk@xxxxxxxxxxxxx>
Date: Wed, 25 Jun 2014 09:37:32 +0800
Dear sir,
I successfully installed and do a run in mira. I got this error...:hereby i 
attached my manifest also. my data was the illumina RNA-seq data. The species 
is Drosophila.Could you give me some suggestion?Thanks & Regards,Zan Zhang
EORROR:This is MIRA 4.0.2 .



Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence

Assembly Using Trace Signals and Additional Sequence Information.

Computer Science and Biology: Proceedings of the German Conference on

Bioinformatics (GCB) 99, pp. 45-56.



To (un-)subscribe the MIRA mailing lists, see:

        http://www.chevreux.org/mira_mailinglists.html



After subscribing, mail general questions to the MIRA talk mailing list:

        mira_talk@xxxxxxxxxxxxx





To report bugs or ask for features, please use the SourceForge ticketing

system at:

        http://sourceforge.net/p/mira-assembler/tickets/

This ensures that requests do not get lost.





Compiled by: bach

Fri Apr 18 14:57:20 CEST 2014

On: Linux vk10464 2.6.32-41-generic #94-Ubuntu SMP Fri Jul 6 18:00:34 UTC 2012 
x86_64 GNU/Linux

Compiled in boundtracking mode.

Compiled in bugtracking mode.

Compiled with ENABLE64 activated.

Runtime settings (sorry, for debug):

        Size of size_t  : 8

        Size of uint32  : 4

        Size of uint32_t: 4

        Size of uint64  : 8

        Size of uint64_t: 8

Current system: Linux node7 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 
EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
Looking for files named in data ...Pushing back filename: 
"/data-SATA/home/liuying/transcriptome/illumina/Drosophila_melanogaster/SRR927153_assemble/remove_low_quality_data/SRR927153_1.fastq.cleaned.renamed.fastq"

Pushing back filename: 
"/data-SATA/home/liuying/transcriptome/illumina/Drosophila_melanogaster/SRR927153_assemble/remove_low_quality_data/SRR927153_2.fastq.cleaned.renamed.fastq"

Manifest:

projectname: MyFirstAssembly

job: est,denovo,accurate

parameters: COMMON_SETTINGS -GE:not=1 amm=off -HS:mnr=yes:nrr=8

Manifest load entries: 1

MLE 1:

RGID: 1

RGN: DataIlluminaPairedLib      SN: StrainX

SP:     SPio: 0 SPC: 0  IF: -1  IT: -1  TSio: 0

ST: 6 (Solexa)  namschem: 4     SID: 0

DQ: 30

BB: 0   Rail: 0 CER: 0



/data-SATA/home/liuying/transcriptome/illumina/Drosophila_melanogaster/SRR927153_assemble/remove_low_quality_data/SRR927153_1.fastq.cleaned.renamed.fastq
 
/data-SATA/home/liuying/transcriptome/illumina/Drosophila_melanogaster/SRR927153_assemble/remove_low_quality_data/SRR927153_2.fastq.cleaned.renamed.fastq
 



Parameters parsed without error, perfect.



-CL:pec and -CO:emeas1clpec are set, setting -CO:emea values to 1.

------------------------------------------------------------------------------

Parameter settings seen for:

Sanger data



Used parameter settings:

  General (-GE):

        Project name                                : MyFirstAssembly

        Number of threads (not)                     : 1

        Automatic memory management (amm)           : no

            Keep percent memory free (kpmf)         : 15

            Max. process size (mps)                 : 0

        EST SNP pipeline step (esps)                : 0

        Colour reads by hash frequency (crhf)       : no



  Load reads options (-LR):

        Wants quality file (wqf)                    :  [sxa]  yes



        Filecheck only (fo)                         : no



  Assembly options (-AS):

        Number of passes (nop)                      : 4

            Skim each pass (sep)                    : yes

        Maximum number of RMB break loops (rbl)     : 2

        Maximum contigs per pass (mcpp)             : 0



        Minimum read length (mrl)                   :  [sxa]  20

        Minimum reads per contig (mrpc)             :  [sxa]  4

        Enforce presence of qualities (epoq)        :  [sxa]  yes        
Automatic repeat detection (ard)            : no

            Coverage threshold (ardct)              :  [sxa]  2.5

            Minimum length (ardml)                  :  [sxa]  300

            Grace length (ardgl)                    :  [sxa]  20

            Use uniform read distribution (urd)     : no

              Start in pass (urdsip)                : 3

              Cutoff multiplier (urdcm)             :  [sxa]  1.5



        Spoiler detection (sd)                      : no

            Last pass only (sdlpo)                  : yes



        Use genomic pathfinder (ugpf)               : no



        Use emergency search stop (uess)            : yes

            ESS partner depth (esspd)               : 500

        Use emergency blacklist (uebl)              : yes

        Use max. contig build time (umcbt)          : yes

            Build time in seconds (bts)             : 360



  Strain and backbone options (-SB):

        Bootstrap new backbone (bnb)                : yes

        Start backbone usage in pass (sbuip)        : 3

        Backbone rail from strain (brfs)            : 

        Backbone rail length (brl)                  : 0

        Backbone rail overlap (bro)                 : 0

        Trim overhanging reads (tor)                : yes



        (Also build new contigs (abnc))             : yes



  Dataprocessing options (-DP):

        Use read extensions (ure)                   :  [sxa]  no

            Read extension window length (rewl)     :  [sxa]  30

            Read extension w. maxerrors (rewme)     :  [sxa]  2

            First extension in pass (feip)          :  [sxa]  0

            Last extension in pass (leip)           :  [sxa]  0



  Clipping options (-CL):

        SSAHA2 or SMALT clipping:

            Gap size (msvsgs)                       :  [sxa]  1

            Max front gap (msvsmfg)                 :  [sxa]  2

            Max end gap (msvsmeg)                   :  [sxa]  2

            Strict front clip (msvssfc)             :  [sxa]  0

            Strict end clip (msvssec)               :  [sxa]  0

        Possible vector leftover clip (pvlc)        :  [sxa]  no

            maximum len allowed (pvcmla)            :  [sxa]  18

        Min qual. threshold for entire read (mqtfer):  [sxa]  5

            Number of bases (mqtfernob)             :  [sxa]  15

        Quality clip (qc)                           :  [sxa]  no

            Minimum quality (qcmq)                  :  [sxa]  20

            Window length (qcwl)                    :  [sxa]  30

        Bad stretch quality clip (bsqc)             :  [sxa]  no

            Minimum quality (bsqcmq)                :  [sxa]  5            
Window length (bsqcwl)                  :  [sxa]  20

        Masked bases clip (mbc)                     :  [sxa]  yes

            Gap size (mbcgs)                        :  [sxa]  5

            Max front gap (mbcmfg)                  :  [sxa]  12

            Max end gap (mbcmeg)                    :  [sxa]  12

        Lower case clip front (lccf)                :  [sxa]  no

        Lower case clip back (lccb)                 :  [sxa]  no

        Clip poly A/T at ends (cpat)                :  [sxa]  yes

            Keep poly-a signal (cpkps)              :  [sxa]  yes

            Minimum signal length (cpmsl)           :  [sxa]  15

            Max errors allowed (cpmea)              :  [sxa]  1

            Max gap from ends (cpmgfe)              :  [sxa]  20000

        Clip 3 prime polybase (c3pp)                :  [sxa]  yes

            Minimum signal length (c3ppmsl)         :  [sxa]  15

            Max errors allowed (c3ppmea)            :  [sxa]  3

            Max gap from ends (c3ppmgfe)            :  [sxa]  9

        Clip known adaptors right (ckar)            :  [sxa]  yes

        Ensure minimum left clip (emlc)             :  [sxa]  no

            Minimum left clip req. (mlcr)           :  [sxa]  0

            Set minimum left clip to (smlc)         :  [sxa]  0

        Ensure minimum right clip (emrc)            :  [sxa]  no

            Minimum right clip req. (mrcr)          :  [sxa]  10

            Set minimum right clip to (smrc)        :  [sxa]  20



        Apply SKIM chimera detection clip (ascdc)   : yes

        Apply SKIM junk detection clip (asjdc)      : no



        Propose end clips (pec)                     :  [sxa]  yes

            Bases per hash (pecbph)                 : 31

            Handle Solexa GGCxG problem (pechsgp)   : yes

            Front freq (pffreq)                     :  [sxa]  0

            Back freq (pbfreq)                      :  [sxa]  0

            Minimum kmer for forward-rev (pmkfr)    : 1

            Front forward-rev (pffore)              :  [sxa]  yes

            Back forward-rev (pbfore)               :  [sxa]  yes

            Front conf. multi-seq type (pfcmst)     :  [sxa]  yes

            Back conf. multi-seq type (pbcmst)      :  [sxa]  yes

            Front seen at low pos (pfsalp)          :  [sxa]  no

            Back seen at low pos (pbsalp)           :  [sxa]  no



        Clip bad solexa ends (cbse)                 :  [sxa]  yes

        Search PhiX174 (spx174)                     :  [sxa]  yes

            Filter PhiX174 (fpx174)                 :  [sxa]  yes



        Rare kmer mask (rkm)                        :  [sxa]  2



  Parameters for SKIM algorithm (-SK):

        Number of threads (not)                     : 1



        Also compute reverse complements (acrc)     : yes

        Bases per hash (bph)                        : 23

            Automatic increase per pass (bphaipp)   : 1            Automatic 
incr. cov. threshold (bphaict): 20

        Hash save stepping (hss)                    : 1

        Percent required (pr)                       :  [sxa]  95



        Max hits per read (mhpr)                    : 30

        Max megahub ratio (mmhr)                    : 0



        SW check on backbones (swcob)               : no



        Max hashes in memory (mhim)                 : 15000000

        MemCap: hit reduction (mchr)                : 4096



  Parameters for Hash Statistics (-HS):

        Freq. cov. estim. min (fcem)                : 30

        Freq. estim. min normal (fenn)              : 0.4

        Freq. estim. max normal (fexn)              : 1.6

        Freq. estim. repeat (fer)                   : 1.9

        Freq. estim. heavy repeat (fehr)            : 8

        Freq. estim. crazy (fecr)                   : 20

        Mask nasty repeats (mnr)                    : yes

            Nasty repeat ratio (nrr)                : 8

            Nasty repeat coverage (nrc)             : 200

            Lossless digital normalisation (ldn)    : yes



        Repeat level in info file (rliif)           : 6



        Million hashes per buffer (mhpb)            : 16

        Rare kmer early kill (rkek)                 : no



  Pathfinder options (-PF):

        Use quick rule (uqr)                        :  [sxa]  yes

            Quick rule min len 1 (qrml1)            :  [sxa]  -95

            Quick rule min sim 1 (qrms1)            :  [sxa]  100

            Quick rule min len 2 (qrml2)            :  [sxa]  -85

            Quick rule min sim 2 (qrms2)            :  [sxa]  100

        Backbone quick overlap min len (bqoml)      :  [sxa]  20

        Max. start cache fill time (mscft)          : 5



  Align parameters for Smith-Waterman align (-AL):

        Bandwidth in percent (bip)             :  [sxa]  20

        Bandwidth max (bmax)                   :  [sxa]  80

        Bandwidth min (bmin)                   :  [sxa]  20

        Minimum score (ms)                     :  [sxa]  15

        Minimum overlap (mo)                   :  [sxa]  25

        Minimum relative score in % (mrs)      :  [sxa]  90

        Solexa_hack_max_errors (shme)          :  [sxa]  -1

        Extra gap penalty (egp)                :  [sxa]  yes

            extra gap penalty level (egpl)     :  [sxa] reject_codongaps

            Max. egp in percent (megpp)        :  [sxa]  100  Contig parameters 
(-CO):

        Name prefix (np)                                         : 
MyFirstAssembly

        Reject on drop in relative alignment score in % (rodirs) :  [sxa]  15

        Mark repeats (mr)                                        : yes

            Only in result (mroir)                               : no

            Assume SNP instead of repeats (asir)                 : no

            Minimum reads per group needed for tagging (mrpg)    :  [sxa]  4

            Minimum neighbour quality needed for tagging (mnq)   :  [sxa]  20

            Minimum Group Quality needed for RMB Tagging (mgqrt) :  [sxa]  30

            End-read Marking Exclusion Area in bases (emea)      :  [sxa]  1

                Set to 1 on clipping PEC (emeas1clpec)           : yes

            Also mark gap bases (amgb)                           :  [sxa]  yes

                Also mark gap bases - even multicolumn (amgbemc) :  [sxa]  yes

                Also mark gap bases - need both strands (amgbnbs):  [sxa]  yes

        Force non-IUPAC consensus per sequencing type (fnicpst)  :  [sxa]  no

        Merge short reads (msr)                                  :  [sxa]  yes

            Max errors (msrme)                                   :  [sxa]  0

            Keep ends unmerged (msrkeu)                          :  [sxa]  -1

        Gap override ratio (gor)                                 :  [sxa]  66



  Edit options (-ED):

        Mira automatic contig editing (mace)        : yes

            Edit kmer singlets (eks)                : yes

            Edit homopolymer overcalls (ehpo)       :  [sxa]  no



  Misc (-MI):

        Large contig size (lcs)                     : 500

        Large contig size for stats (lcs4s)         : 1000



        I know what I do (ikwid)                    : no



        Extra flag 1 / sanity track check (ef1)     : no

        Extra flag 2 / dnredreadsatpeaks (ef2)      : yes

        Extra flag 3 / pelibdisassemble (ef3)       : yes

        Extended log (el)                           : no



  Nag and Warn (-NW):

        Check NFS (cnfs)                            : stop

        Check multi pass mapping (cmpm)             : stop

        Check template problems (ctp)               : stop

        Check duplicate read names (cdrn)           : stop

        Check max read name length (cmrnl)          : stop

            Max read name length (mrnl)             : 40

        Check average coverage (cac)                : stop

            Average coverage value (acv)            : 80



  Directories (-DI):

        Top directory for writing files   : MyFirstAssembly_assembly

        For writing result files          : 
MyFirstAssembly_assembly/MyFirstAssembly_d_results

        For writing result info files     : 
MyFirstAssembly_assembly/MyFirstAssembly_d_info

        For writing tmp files             : 
MyFirstAssembly_assembly/MyFirstAssembly_d_tmp

        Tmp redirected to (trt)           : 

        For writing checkpoint files      : 
MyFirstAssembly_assembly/MyFirstAssembly_d_chkpt



  Output files (-OUTPUT/-OUT):

        Save simple singlets in project (sssip)      :  [sxa]  no

        Save tagged singlets in project (stsip)      :  [sxa]  yes



        Remove rollover tmps (rrot)                  : yes

        Remove tmp directory (rtd)                   : no



    Result files:

        Saved as CAF                       (orc)     : yes

        Saved as MAF                       (orm)     : yes

        Saved as FASTA                     (orf)     : yes

        Saved as GAP4 (directed assembly)  (org)     : no

        Saved as phrap ACE                 (ora)     : no

        Saved as GFF3                     (org3)     : no

        Saved as HTML                      (orh)     : no

        Saved as Transposed Contig Summary (ors)     : yes

        Saved as simple text format        (ort)     : no

        Saved as wiggle                    (orw)     : no



    Temporary result files:

        Saved as CAF                       (otc)     : yes

        Saved as MAF                       (otm)     : no

        Saved as FASTA                     (otf)     : no

        Saved as GAP4 (directed assembly)  (otg)     : no

        Saved as phrap ACE                 (ota)     : no

        Saved as HTML                      (oth)     : no

        Saved as Transposed Contig Summary (ots)     : no

        Saved as simple text format        (ott)     : no



    Extended temporary result files:

        Saved as CAF                      (oetc)     : no

        Saved as FASTA                    (oetf)     : no

        Saved as GAP4 (directed assembly) (oetg)     : no

        Saved as phrap ACE                (oeta)     : no

        Saved as HTML                     (oeth)     : no

        Save also singlets               (oetas)     : no



    Alignment output customisation:

        TEXT characters per line (tcpl)              : 60

        HTML characters per line (hcpl)              : 60

        TEXT end gap fill character (tegfc)          :  

        HTML end gap fill character (hegfc)          :  



    File / directory output names:

        CAF             : MyFirstAssembly_out.caf

        MAF             : MyFirstAssembly_out.maf

        FASTA           : MyFirstAssembly_out.unpadded.fasta

        FASTA quality   : MyFirstAssembly_out.unpadded.fasta.qual        FASTA 
(padded)  : MyFirstAssembly_out.padded.fasta

        FASTA qual.(pad): MyFirstAssembly_out.padded.fasta.qual

        GAP4 (directory): MyFirstAssembly_out.gap4da

        ACE             : MyFirstAssembly_out.ace

        HTML            : MyFirstAssembly_out.html

        Simple text     : MyFirstAssembly_out.txt

        TCS overview    : MyFirstAssembly_out.tcs

        Wiggle          : MyFirstAssembly_out.wig

------------------------------------------------------------------------------

Creating directory MyFirstAssembly_assembly ... done.

Creating directory MyFirstAssembly_assembly/MyFirstAssembly_d_results ... done.

Creating directory MyFirstAssembly_assembly/MyFirstAssembly_d_info ... done.

Creating directory MyFirstAssembly_assembly/MyFirstAssembly_d_chkpt ... done.

Creating directory MyFirstAssembly_assembly/MyFirstAssembly_d_tmp ... done.



Tmp directory is not on a NFS mount, good.



Localtime: Thu Jun 12 15:42:14 2014



Loading reads from 
/data-SATA/home/liuying/transcriptome/illumina/Drosophila_melanogaster/SRR927153_assemble/remove_low_quality_data/SRR927153_1.fastq.cleaned.renamed.fastq
 type fastq

Localtime: Thu Jun 12 15:42:14 2014

Loading data from FASTQ file: 
/data-SATA/home/liuying/transcriptome/illumina/Drosophila_melanogaster/SRR927153_assemble/remove_low_quality_data/SRR927153_1.fastq.cleaned.renamed.fastq

(sorry, no progress indicator for that, possible only with zlib >=1.34)





Done.

Loaded 31236815 reads, Localtime: Thu Jun 12 15:52:54 2014

Looking at FASTQ type ... guessing FASTQ-33 (Sanger)

Running quality values adaptation ... done.

Loading reads from 
/data-SATA/home/liuying/transcriptome/illumina/Drosophila_melanogaster/SRR927153_assemble/remove_low_quality_data/SRR927153_2.fastq.cleaned.renamed.fastq
 type fastq

Localtime: Thu Jun 12 15:53:17 2014

Loading data from FASTQ file: 
/data-SATA/home/liuying/transcriptome/illumina/Drosophila_melanogaster/SRR927153_assemble/remove_low_quality_data/SRR927153_2.fastq.cleaned.renamed.fastq

(sorry, no progress indicator for that, possible only with zlib >=1.34)


Done.

Loaded 31236815 reads, Localtime: Thu Jun 12 16:05:03 2014

Looking at FASTQ type ... guessing FASTQ-33 (Sanger)

Running quality values adaptation ... done.

Checking reads for trace data (loading qualities if needed):

 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... 
[50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... 
[100%] 

No SCF data present in any read, EdIt automatic contig editing for Sanger data 
is now switched off.

62473630 reads with valid data for assembly.

Localtime: Thu Jun 12 16:07:07 2014



Generated 31236815 unique DNA template ids for 62473630 valid reads.

TODO: Like Readpool: strain x has y reads

Have read pool with 62473630 reads.



===========================================================================

Pool statistics:

Backbones: 0    Backbone rails: 0



                Sanger  454     IonTor  PcBioHQ PcBioLQ Text    Solexa  SOLiD

                ------------------------------------------------------------

Total reads     0       0       0       0       0       0       62473630        0

Reads wo qual   0       0       0       0       0       0       0       0

Used reads      0       0       0       0       0       0       62473630        0

Avg tot rlen    0       0       0       0       0       0       99      0

Avg rlen used   0       0       0       0       0       0       99      0

W/o clips       0       0       0       0       0       0       62473630        0



Solexa  total bases: 6201635932 used bases in used reads: 6201635932

===========================================================================

...........................


Searching for possible overlaps:

Localtime: Sat Jun 21 14:16:56 2014

Now running threaded and partitioned skimmer with 38 partitions in 1 threads:

 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... 
[50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... 
[100%]  done.



truncating 
MyFirstAssembly_assembly/MyFirstAssembly_d_tmp/MyFirstAssembly_int_posmatchf_pass.2.bin

truncated 
MyFirstAssembly_assembly/MyFirstAssembly_d_tmp/MyFirstAssembly_int_posmatchf_pass.2.bin
 from 6901222944 to 3943452432



truncating 
MyFirstAssembly_assembly/MyFirstAssembly_d_tmp/MyFirstAssembly_int_posmatchc_pass.2.bin

truncated 
MyFirstAssembly_assembly/MyFirstAssembly_d_tmp/MyFirstAssembly_int_posmatchc_pass.2.bin
 from 6449929560 to 3850855680





Hits chosen: 649525676



Localtime: Sun Jun 22 00:49:31 2014



Total megahubs: 4948





MIRA has detected megahubs in your data.This may not be a problem, but most 
probably is, especially for eukaryotes.







You have more than 0.0000000000% of your reads found to be megahubs.



You should check the following:



        1) for Sanger sequences: are all the sequencing vectors masked / 
clipped?

        2) for 454 sequences: are all the adaptors masked / clipped?



You will find in the info directory a file called

    '*_info_readrepeats.lst',

consult the MIRA manual on how to extract repeat information from there.



*ONLY* when you are sure that no (or only a very negligible number) of 
sequencing

vector / adaptor sequence is remaining, try this:



        3) for organisms with complex repeats (eukaryots & some bacteria):

                - reduce the -HS:nrr parameter (divide by 2)



*ONLY* if the above fails, try increasing the -SK:mmhr parameter

Note that the number of present megahubs will increase computation time in

an exponential way, so be careful when changing -SK:mmhr.





You have 0.0079201417% of your reads as megahubs.

You have set a maximum allowed ratio of: 0.0000000000



Ending the assembly because the maximum ratio has been reached/surpassed.

Failure, wrapped MIRA process aborted.
Attachment: manifest.conf
Description: Binary data
Follow-Ups:
- [mira_talk] Re: mira Error
  - From: Bastien Chevreux
[mira_talk] mira Error

Other related posts: