[mira_talk] Re: Solexa seq assembly

From: "Ragupathy, Raja " <Raja.Ragupathy@xxxxxxxxx>
To: <mira_talk@xxxxxxxxxxxxx>
Date: Tue, 30 Mar 2010 15:54:15 -0400

Thanks Jeremiah and Bastien for suggestions. 
Still it didn't work. 
Hi Bastien, since the output log folder is empty, I am attaching the output 
from the monitor as an attachment.
Thanks again.
Raja

Raja Ragupathy PhD,
Post-Doctoral Fellow
Genomics and Sequencing labs,
AAFC-Cereal Research Centre
Winnipeg, Manitoba
Canada R3T2M9

Phone: 204-983 8194
E-mail: ragupathyr@xxxxxxxxx


-----Original Message-----
From: mira_talk-bounce@xxxxxxxxxxxxx [mailto:mira_talk-bounce@xxxxxxxxxxxxx] On 
Behalf Of Bastien Chevreux
Sent: March 29, 2010 5:38 PM
To: mira_talk@xxxxxxxxxxxxx
Subject: [mira_talk] Re: Solexa seq assembly

On Montag 29 März 2010 Ragupathy, Raja wrote:
> I just started working with Mira.  When I want to do denovo genome
> assembly of solexa short reads (fastq files), MIRA gives the error
> message-'program aborted due to error in input data or parametrisation.
> Could you please advice?

Can you please post the complete output log?

Regards,
  Bastien

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

tbanks@mbwinnr502727:~/Downloads/mira/bin$ ./mira -project=Lu000z 
-job=denovo,genome,draft,solexa SOLEXA_SETTINGS -AS:epoq=no -AS:mrl=20
This is MIRA V3.0.3 (production version).

Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence
Assembly Using Trace Signals and Additional Sequence Information.
Computer Science and Biology: Proceedings of the German Conference on
Bioinformatics (GCB) 99, pp. 45-56.

Mail general questions to the MIRA talk mailing list:
        mira_talk@xxxxxxxxxxxxx

To (un-)subsubcribe the MIRA mailing lists, see:
        http://www.chevreux.org/mira_mailinglists.html

To report bugs or ask for features, please use the new ticketing system at:
        http://sourceforge.net/apps/trac/mira-assembler/
This ensures that requests don't get lost.


Compiled by: bach
Sat Mar 13 10:04:34 CET 2010
On: Linux varcadia32 2.6.27-11-generic #1 SMP Thu Jan 29 19:24:39 UTC 2009 i686 
GNU/Linux
Compiled in boundtracking mode.
Compiled in bugtracking mode.
Compilation settings (sorry, for debug):
        Size of size_t  : 4
        Size of uint32  : 4
        Size of uint32_t: 4
        Size of uint64  : 8
        Size of uint64_t: 8
Current system: Linux mbwinnr502727 2.6.27-11-generic #1 SMP Wed Apr 1 20:57:48 
UTC 2009 i686 GNU/Linux



Parsing parameters: -project=Lu000z -job=denovo,genome,draft,solexa 
SOLEXA_SETTINGS -AS:epoq=no -AS:mrl=20


-SB:sbuip is 3, but must be no more than 1. Setting to 1


Parameters parsed without error, perfect.

------------------------------------------------------------------------------
Parameter settings seen for:
Sanger data (also common parameters), Solexa data

Used parameter settings:
  General (-GE):
        Project name in (proin)                  : Lu000z
        Project name out (proout)                : Lu000z
        Number of threads (not)                  : 2
        Automatic memory management (amm)        : yes
            Keep percent memory free (kpmf)      : 10
            Max. process size (mps)              : 0
        Keep contigs in memory (kcim)            : no
        EST SNP pipeline step (esps)             : 1
        Use template information (uti)           :  [san]  yes
                                                    [sxa]  yes
            Template insert size minimum (tismin):  [san]  -1
                                                    [sxa]  -1
            Template insert size maximum (tismax):  [san]  -1
                                                    [sxa]  -1
        Colour reads by hash frequency (crhf)    : yes

  Load reads options (-LR):
        Load sequence data (lsd)                    :  [san]  no
                                                       [sxa]  yes
            File type (ft)                          :  [san]  fasta
                                                       [sxa]  fastq
            External quality (eq)                   : from SCF (scf)
                Ext. qual. override (eqo)           : no
                Discard reads on e.q. error (droeqe): no
            Solexa scores in qual file (ssiqf)      : no
            FASTQ qual offset (fqqo)                :  [san]  0
                                                       [sxa]  0

        Wants quality file (wqf)                    :  [san]  yes
                                                       [sxa]  yes

        Read naming scheme (rns)                    :  [san] Sanger Institute 
(sanger)
                                                       [sxa] Solexa (solexa)

        Merge with XML trace info (mxti)            :  [san]  no
                                                       [sxa]  no

        Filecheck only (fo)                         : no

  Assembly options (-AS):
        Number of passes (nop)                      : 1
            Skim each pass (sep)                    : yes
        Maximum number of RMB break loops (rbl)     : 1

        Minimum read length (mrl)                   :  [san]  80
                                                       [sxa]  20
        Base default quality (bdq)                  :  [san]  10
                                                       [sxa]  10
        Enforce presence of qualities (epoq)        :  [san]  yes
                                                       [sxa]  no

        Automatic repeat detection (ard)            : yes
            Coverage threshold (ardct)              :  [san]  2
                                                       [sxa]  2.5
            Minimum length (ardml)                  :  [san]  400
                                                       [sxa]  300
            Grace length (ardgl)                    :  [san]  40
                                                       [sxa]  20
            Use uniform read distribution (urd)     : no
              Start in pass (urdsip)                : 3
              Cutoff multiplier (urdcm)             :  [san]  1.5
                                                       [sxa]  1.5
        Keep long repeats separated (klrs)          : no

        Spoiler detection (sd)                      : no
            Last pass only (sdlpo)                  : yes

        Use genomic pathfinder (ugpf)               : yes

        Use emergency search stop (uess)            : yes
            ESS partner depth (esspd)               : 500
        Use emergency blacklist (uebl)              : yes
        Use max. contig build time (umcbt)          : no
            Build time in seconds (bts)             : 10000

  Strain and backbone options (-SB):
        Load straindata (lsd)                       : no
        Load backbone (lb)                          : no
            Start backbone usage in pass (sbuip)    : 1
            Backbone file type (bft)                : fasta
            Backbone base quality (bbq)             : 30
            Backbone strain name (bsn)              : 
                Force for all (bsnffa)              : no
            Backbone rail from strain (brfs)        : 
            Backbone rail length (brl)              : 0
            Backbone rail overlap (bro)             : 0
            Also build new contigs (abnc)           : yes

  Dataprocessing options (-DP):
        Use read extensions (ure)                   :  [san]  no
                                                       [sxa]  no
            Read extension window length (rewl)     :  [san]  30
                                                       [sxa]  30
            Read extension w. maxerrors (rewme)     :  [san]  2
                                                       [sxa]  2
            First extension in pass (feip)          :  [san]  0
                                                       [sxa]  0
            Last extension in pass (leip)           :  [san]  0
                                                       [sxa]  0

  Clipping options (-CL):
        Merge with SSAHA vector screen (msvs)       :  [san]  no
                                                       [sxa]  no
            Gap size (msvsgs)                       :  [san]  10
                                                       [sxa]  1
            Max front gap (msvsmfg)                 :  [san]  60
                                                       [sxa]  2
            Max end gap (msvsmeg)                   :  [san]  120
                                                       [sxa]  2
            Strict front clip (msvssfc)             :  [san]  0
                                                       [sxa]  0
            Strict end clip (msvssec)               :  [san]  0
                                                       [sxa]  0
        Possible vector leftover clip (pvlc)        :  [san]  no
                                                       [sxa]  no
            maximum len allowed (pvcmla)            :  [san]  18
                                                       [sxa]  18
        Quality clip (qc)                           :  [san]  no
                                                       [sxa]  no
            Minimum quality (qcmq)                  :  [san]  20
                                                       [sxa]  20
            Window length (qcwl)                    :  [san]  30
                                                       [sxa]  30
        Bad stretch quality clip (bsqc)             :  [san]  yes
                                                       [sxa]  no
            Minimum quality (bsqcmq)                :  [san]  20
                                                       [sxa]  5
            Window length (bsqcwl)                  :  [san]  30
                                                       [sxa]  20
        Masked bases clip (mbc)                     :  [san]  yes
                                                       [sxa]  no
            Gap size (mbcgs)                        :  [san]  20
                                                       [sxa]  5
            Max front gap (mbcmfg)                  :  [san]  40
                                                       [sxa]  12
            Max end gap (mbcmeg)                    :  [san]  60
                                                       [sxa]  12
        Lower case clip (lcc)                       :  [san]  no
                                                       [sxa]  no
        Clip poly A/T at ends (cpat)                :  [san]  no
                                                       [sxa]  no
            Keep poly-a signal (cpkps)              :  [san]  no
                                                       [sxa]  no
            Minimum signal length (cpmsl)           :  [san]  12
                                                       [sxa]  12
            Max errors allowed (cpmea)              :  [san]  1
                                                       [sxa]  1
            Max gap from ends (cpmgfe)              :  [san]  9
                                                       [sxa]  9
        Ensure minimum left clip (emlc)             :  [san]  yes
                                                       [sxa]  no
            Minimum left clip req. (mlcr)           :  [san]  25
                                                       [sxa]  0
            Set minimum left clip to (smlc)         :  [san]  30
                                                       [sxa]  0
        Ensure minimum right clip (emrc)            :  [san]  no
                                                       [sxa]  no
            Minimum right clip req. (mrcr)          :  [san]  10
                                                       [sxa]  10
            Set minimum right clip to (smrc)        :  [san]  20
                                                       [sxa]  20

        Apply SKIM chimera detection clip (ascdc)   : yes
        Apply SKIM junk detection clip (asjdc)      : yes

        Propose end clips (pec)                     : yes
            Bases per hash (pecbph)                 : 21

  Parameters for SKIM algorithm (-SK):
        Number of threads (not)                     : 2

        Bases per hash (bph)                        : 17
        Hash save stepping (hss)                    : 4
        Percent required (pr)                       :  [san]  70
                                                       [sxa]  90

        Max hits per read (mhpr)                    : 200
        Max megahub ratio (mmhr)                    : 0

        Freq. est. min normal (fenn)                : 0.4
        Freq. est. max normal (fexn)                : 1.6
        Freq. est. repeat (fer)                     : 1.9
        Freq. est. heavy repeat (fehr)              : 8
        Freq. est. crazy (fecr)                     : 20
        Mask nasty repeats (mnr)                    : yes
            Nasty repeat ratio (nrr)                : 100

        Max hashes in memory (mhim)                 : 15000000
        MemCap: hit reduction (mchr)                : 4096

  Pathfinder options (-PF):
        Use quick rule (uqr)                        :  [san]  yes
                                                       [sxa]  yes
            Quick rule min len 1 (qrml1)            :  [san]  200
                                                       [sxa]  33
            Quick rule min sim 1 (qrms1)            :  [san]  90
                                                       [sxa]  100
            Quick rule min len 2 (qrml2)            :  [san]  100
                                                       [sxa]  30
            Quick rule min sim 2 (qrms2)            :  [san]  95
                                                       [sxa]  100
        Backbone quick overlap min len (bqoml)      :  [san]  150
                                                       [sxa]  20

  Align parameters for Smith-Waterman align (-AL):
        Bandwidth in percent (bip)             :  [san]  15
                                                  [sxa]  20
        Bandwidth max (bmax)                   :  [san]  70
                                                  [sxa]  80
        Bandwidth min (bmin)                   :  [san]  25
                                                  [sxa]  20
        Minimum score (ms)                     :  [san]  30
                                                  [sxa]  15
        Minimum overlap (mo)                   :  [san]  15
                                                  [sxa]  25
        Minimum relative score in % (mrs)      :  [san]  65
                                                  [sxa]  90
        Solexa_hack_max_errors (shme)          :  [san]  0
                                                  [sxa]  0
        Extra gap penalty (egp)                :  [san]  no
                                                  [sxa]  no
            extra gap penalty level (egpl)     :  [san]  low
                                                  [sxa]  low
            Max. egp in percent (megpp)        :  [san]  100
                                                  [sxa]  100

  Contig parameters (-CO):
        Name prefix (np)                                         : Lu000z
        Reject on drop in relative alignment score in % (rodirs) :  [san]  15
                                                                    [sxa]  30
        Mark repeats (mr)                                        : yes
            Only in result (mroir)                               : no
            Assume SNP instead of repeats (asir)                 : no
            Minimum reads per group needed for tagging (mrpg)    :  [san]  2
                                                                    [sxa]  4
            Minimum neighbour quality needed for tagging (mnq)   :  [san]  20
                                                                    [sxa]  20
            Minimum Group Quality needed for RMB Tagging (mgqrt) :  [san]  30
                                                                    [sxa]  30
            End-read Marking Exclusion Area in bases (emea)      :  [san]  25
                                                                    [sxa]  4
            Also mark gap bases (amgb)                           :  [san]  yes
                                                                    [sxa]  yes
                Also mark gap bases - even multicolumn (amgbemc) :  [san]  yes
                                                                    [sxa]  yes
                Also mark gap bases - need both strands (amgbnbs):  [san]  yes
                                                                    [sxa]  yes
        Force non-IUPAC consensus per sequencing type (fnicpst)  :  [san]  no
                                                                    [sxa]  no
        Merge short reads (msr)                                  :  [san]  no
                                                                    [sxa]  no
        Gap override ratio (gor)                                 :  [san]  66
                                                                    [sxa]  66

  Edit options (-ED):
        Automatic contig editing (ace)              :  [san]  no
                                                       [sxa]  no
     Sanger only:
        Strict editing mode (sem)                   : no
        Confirmation threshold in percent (ct)      : 50

  Directories (-DI):
        When loading EXP files            : 
        When loading SCF files            : 
        Top directory for writing files   : Lu000z_assembly
        For writing result files          : Lu000z_assembly/Lu000z_d_results
        For writing result info files     : Lu000z_assembly/Lu000z_d_info
        For writing log files             : Lu000z_assembly/Lu000z_d_log
        For writing checkpoint files      : Lu000z_assembly/Lu000z_d_chkpt

  File names (-FN):
        When loading sequences from FASTA            :  [san]  
Lu000z_in.sanger.fasta
                                                        [sxa]  
Lu000z_in.solexa.fasta
        When loading qualities from FASTA quality    :  [san]  
Lu000z_in.sanger.fasta.qual
                                                        [sxa]  
Lu000z_in.solexa.fasta.qual
        When loading sequences from FASTQ            :  [san]  
Lu000z_in.sanger.fastq
                                                        [sxa]  
Lu000z_in.solexa.fastq
        When loading project from CAF                : Lu000z_in.sanger.caf
        When loading project from MAF (disabled)     : Lu000z_in.sanger.maf
        When loading EXP fofn                        : Lu000z_in.fofn
        When loading project from PHD                : Lu000z_in.phd.1
        When loading strain data                     : Lu000z_straindata_in.txt
        When loading XML trace info files            :  [san]  
Lu000z_traceinfo_in.sanger.xml
                                                        [sxa]  
Lu000z_traceinfo_in.solexa.xml
        When loading SSAHA vector screen results     : 
Lu000z_ssaha2vectorscreen_in.txt

        When loading backbone from MAF               : Lu000z_backbone_in.maf
        When loading backbone from CAF               : Lu000z_backbone_in.caf
        When loading backbone from GenBank           : Lu000z_backbone_in.gbf
        When loading backbone from FASTA             : Lu000z_backbone_in.fasta


  Output files (-OUTPUT/-OUT):
        Save simple singlets in project (sssip)      :  [san]  no
                                                        [sxa]  no
        Save tagged singlets in project (stsip)      :  [san]  yes
                                                        [sxa]  yes

        Remove rollover logs (rrol)                  : yes
        Remove log directory (rld)                   : no

    Result files:
        Saved as CAF                       (orc)     : yes
        Saved as FASTA                     (orf)     : yes
        Saved as GAP4 (directed assembly)  (org)     : no
        Saved as phrap ACE                 (ora)     : yes
        Saved as HTML                      (orh)     : no
        Saved as Transposed Contig Summary (ors)     : yes
        Saved as simple text format        (ort)     : no
        Saved as wiggle                    (orw)     : yes

    Temporary result files:
        Saved as CAF                       (otc)     : yes
        Saved as CAF                       (otm)     : no
        Saved as FASTA                     (otf)     : no
        Saved as GAP4 (directed assembly)  (otg)     : no
        Saved as phrap ACE                 (ota)     : no
        Saved as HTML                      (oth)     : no
        Saved as Transposed Contig Summary (ots)     : no
        Saved as simple text format        (ott)     : no

    Extended temporary result files:
        Saved as CAF                      (oetc)     : no
        Saved as FASTA                    (oetf)     : no
        Saved as GAP4 (directed assembly) (oetg)     : no
        Saved as phrap ACE                (oeta)     : no
        Saved as HTML                     (oeth)     : no
        Save also singlets               (oetas)     : no

    Alignment output customisation:
        TEXT characters per line (tcpl)              : 60
        HTML characters per line (hcpl)              : 60
        TEXT end gap fill character (tegfc)          :  
        HTML end gap fill character (hegfc)          :  

    File / directory output names:
        CAF             : Lu000z_out.caf
        MAF             : Lu000z_out.maf
        FASTA           : Lu000z_out.unpadded.fasta
        FASTA quality   : Lu000z_out.unpadded.fasta.qual
        FASTA (padded)  : Lu000z_out.padded.fasta
        FASTA qual.(pad): Lu000z_out.padded.fasta.qual
        GAP4 (directory): Lu000z_out.gap4da
        ACE             : Lu000z_out.ace
        HTML            : Lu000z_out.html
        Simple text     : Lu000z_out.txt
        TCS overview    : Lu000z_out.tcs
        Wiggle          : Lu000z_out.wig
------------------------------------------------------------------------------
Creating directory Lu000z_assembly ... done.
Creating directory Lu000z_assembly/Lu000z_d_log ... done.
Creating directory Lu000z_assembly/Lu000z_d_results ... done.
Creating directory Lu000z_assembly/Lu000z_d_info ... done.
Creating directory Lu000z_assembly/Lu000z_d_chkpt ... done.
Localtime: Tue Mar 30 14:09:47 2010

Loading data (Solexa) from FASTQ files,

Fatal Error (may be due to problems of the input data):
"Could not open FASTQ file 'Lu000z_in.solexa.fastq'. Is it present? Is it 
readable? Did you want to load your data in another format?"

->Thrown: void ReadPool::loadDataFromFASTQ(const string & filename, const 
string & qualfilename, const bool generatefilenames, const uint8 seqtype, const 
uint8 loadaction)

->Caught: Assembly::loadFASTQ(const string & fastqfile, const string & 
fastaqualfile, const uint8 seqtype, const uint8 loadaction)

Program aborted, probably due to error in the input data or parametrisation.
Please check the output log for more information.
For help, please write a mail to the mira talk mailing list.

CWD: /home/tbanks/Downloads/mira/bin
Aborted
tbanks@mbwinnr502727:~/Downloads/mira/bin$

Follow-Ups:
- [mira_talk] Re: Solexa seq assembly
  - From: Sven Klages

References:
- [mira_talk] Re: Solexa seq assembly
  - From: Bastien Chevreux

[mira_talk] Re: Solexa seq assembly

Other related posts: