Thanks Jeremiah and Bastien for suggestions. Still it didn't work. Hi Bastien, since the output log folder is empty, I am attaching the output from the monitor as an attachment. Thanks again. Raja Raja Ragupathy PhD, Post-Doctoral Fellow Genomics and Sequencing labs, AAFC-Cereal Research Centre Winnipeg, Manitoba Canada R3T2M9 Phone: 204-983 8194 E-mail: ragupathyr@xxxxxxxxx -----Original Message----- From: mira_talk-bounce@xxxxxxxxxxxxx [mailto:mira_talk-bounce@xxxxxxxxxxxxx] On Behalf Of Bastien Chevreux Sent: March 29, 2010 5:38 PM To: mira_talk@xxxxxxxxxxxxx Subject: [mira_talk] Re: Solexa seq assembly On Montag 29 März 2010 Ragupathy, Raja wrote: > I just started working with Mira. When I want to do denovo genome > assembly of solexa short reads (fastq files), MIRA gives the error > message-'program aborted due to error in input data or parametrisation. > Could you please advice? Can you please post the complete output log? Regards, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html
tbanks@mbwinnr502727:~/Downloads/mira/bin$ ./mira -project=Lu000z -job=denovo,genome,draft,solexa SOLEXA_SETTINGS -AS:epoq=no -AS:mrl=20 This is MIRA V3.0.3 (production version). Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence Assembly Using Trace Signals and Additional Sequence Information. Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56. Mail general questions to the MIRA talk mailing list: mira_talk@xxxxxxxxxxxxx To (un-)subsubcribe the MIRA mailing lists, see: http://www.chevreux.org/mira_mailinglists.html To report bugs or ask for features, please use the new ticketing system at: http://sourceforge.net/apps/trac/mira-assembler/ This ensures that requests don't get lost. Compiled by: bach Sat Mar 13 10:04:34 CET 2010 On: Linux varcadia32 2.6.27-11-generic #1 SMP Thu Jan 29 19:24:39 UTC 2009 i686 GNU/Linux Compiled in boundtracking mode. Compiled in bugtracking mode. Compilation settings (sorry, for debug): Size of size_t : 4 Size of uint32 : 4 Size of uint32_t: 4 Size of uint64 : 8 Size of uint64_t: 8 Current system: Linux mbwinnr502727 2.6.27-11-generic #1 SMP Wed Apr 1 20:57:48 UTC 2009 i686 GNU/Linux Parsing parameters: -project=Lu000z -job=denovo,genome,draft,solexa SOLEXA_SETTINGS -AS:epoq=no -AS:mrl=20 -SB:sbuip is 3, but must be no more than 1. Setting to 1 Parameters parsed without error, perfect. ------------------------------------------------------------------------------ Parameter settings seen for: Sanger data (also common parameters), Solexa data Used parameter settings: General (-GE): Project name in (proin) : Lu000z Project name out (proout) : Lu000z Number of threads (not) : 2 Automatic memory management (amm) : yes Keep percent memory free (kpmf) : 10 Max. process size (mps) : 0 Keep contigs in memory (kcim) : no EST SNP pipeline step (esps) : 1 Use template information (uti) : [san] yes [sxa] yes Template insert size minimum (tismin): [san] -1 [sxa] -1 Template insert size maximum (tismax): [san] -1 [sxa] -1 Colour reads by hash frequency (crhf) : yes Load reads options (-LR): Load sequence data (lsd) : [san] no [sxa] yes File type (ft) : [san] fasta [sxa] fastq External quality (eq) : from SCF (scf) Ext. qual. override (eqo) : no Discard reads on e.q. error (droeqe): no Solexa scores in qual file (ssiqf) : no FASTQ qual offset (fqqo) : [san] 0 [sxa] 0 Wants quality file (wqf) : [san] yes [sxa] yes Read naming scheme (rns) : [san] Sanger Institute (sanger) [sxa] Solexa (solexa) Merge with XML trace info (mxti) : [san] no [sxa] no Filecheck only (fo) : no Assembly options (-AS): Number of passes (nop) : 1 Skim each pass (sep) : yes Maximum number of RMB break loops (rbl) : 1 Minimum read length (mrl) : [san] 80 [sxa] 20 Base default quality (bdq) : [san] 10 [sxa] 10 Enforce presence of qualities (epoq) : [san] yes [sxa] no Automatic repeat detection (ard) : yes Coverage threshold (ardct) : [san] 2 [sxa] 2.5 Minimum length (ardml) : [san] 400 [sxa] 300 Grace length (ardgl) : [san] 40 [sxa] 20 Use uniform read distribution (urd) : no Start in pass (urdsip) : 3 Cutoff multiplier (urdcm) : [san] 1.5 [sxa] 1.5 Keep long repeats separated (klrs) : no Spoiler detection (sd) : no Last pass only (sdlpo) : yes Use genomic pathfinder (ugpf) : yes Use emergency search stop (uess) : yes ESS partner depth (esspd) : 500 Use emergency blacklist (uebl) : yes Use max. contig build time (umcbt) : no Build time in seconds (bts) : 10000 Strain and backbone options (-SB): Load straindata (lsd) : no Load backbone (lb) : no Start backbone usage in pass (sbuip) : 1 Backbone file type (bft) : fasta Backbone base quality (bbq) : 30 Backbone strain name (bsn) : Force for all (bsnffa) : no Backbone rail from strain (brfs) : Backbone rail length (brl) : 0 Backbone rail overlap (bro) : 0 Also build new contigs (abnc) : yes Dataprocessing options (-DP): Use read extensions (ure) : [san] no [sxa] no Read extension window length (rewl) : [san] 30 [sxa] 30 Read extension w. maxerrors (rewme) : [san] 2 [sxa] 2 First extension in pass (feip) : [san] 0 [sxa] 0 Last extension in pass (leip) : [san] 0 [sxa] 0 Clipping options (-CL): Merge with SSAHA vector screen (msvs) : [san] no [sxa] no Gap size (msvsgs) : [san] 10 [sxa] 1 Max front gap (msvsmfg) : [san] 60 [sxa] 2 Max end gap (msvsmeg) : [san] 120 [sxa] 2 Strict front clip (msvssfc) : [san] 0 [sxa] 0 Strict end clip (msvssec) : [san] 0 [sxa] 0 Possible vector leftover clip (pvlc) : [san] no [sxa] no maximum len allowed (pvcmla) : [san] 18 [sxa] 18 Quality clip (qc) : [san] no [sxa] no Minimum quality (qcmq) : [san] 20 [sxa] 20 Window length (qcwl) : [san] 30 [sxa] 30 Bad stretch quality clip (bsqc) : [san] yes [sxa] no Minimum quality (bsqcmq) : [san] 20 [sxa] 5 Window length (bsqcwl) : [san] 30 [sxa] 20 Masked bases clip (mbc) : [san] yes [sxa] no Gap size (mbcgs) : [san] 20 [sxa] 5 Max front gap (mbcmfg) : [san] 40 [sxa] 12 Max end gap (mbcmeg) : [san] 60 [sxa] 12 Lower case clip (lcc) : [san] no [sxa] no Clip poly A/T at ends (cpat) : [san] no [sxa] no Keep poly-a signal (cpkps) : [san] no [sxa] no Minimum signal length (cpmsl) : [san] 12 [sxa] 12 Max errors allowed (cpmea) : [san] 1 [sxa] 1 Max gap from ends (cpmgfe) : [san] 9 [sxa] 9 Ensure minimum left clip (emlc) : [san] yes [sxa] no Minimum left clip req. (mlcr) : [san] 25 [sxa] 0 Set minimum left clip to (smlc) : [san] 30 [sxa] 0 Ensure minimum right clip (emrc) : [san] no [sxa] no Minimum right clip req. (mrcr) : [san] 10 [sxa] 10 Set minimum right clip to (smrc) : [san] 20 [sxa] 20 Apply SKIM chimera detection clip (ascdc) : yes Apply SKIM junk detection clip (asjdc) : yes Propose end clips (pec) : yes Bases per hash (pecbph) : 21 Parameters for SKIM algorithm (-SK): Number of threads (not) : 2 Bases per hash (bph) : 17 Hash save stepping (hss) : 4 Percent required (pr) : [san] 70 [sxa] 90 Max hits per read (mhpr) : 200 Max megahub ratio (mmhr) : 0 Freq. est. min normal (fenn) : 0.4 Freq. est. max normal (fexn) : 1.6 Freq. est. repeat (fer) : 1.9 Freq. est. heavy repeat (fehr) : 8 Freq. est. crazy (fecr) : 20 Mask nasty repeats (mnr) : yes Nasty repeat ratio (nrr) : 100 Max hashes in memory (mhim) : 15000000 MemCap: hit reduction (mchr) : 4096 Pathfinder options (-PF): Use quick rule (uqr) : [san] yes [sxa] yes Quick rule min len 1 (qrml1) : [san] 200 [sxa] 33 Quick rule min sim 1 (qrms1) : [san] 90 [sxa] 100 Quick rule min len 2 (qrml2) : [san] 100 [sxa] 30 Quick rule min sim 2 (qrms2) : [san] 95 [sxa] 100 Backbone quick overlap min len (bqoml) : [san] 150 [sxa] 20 Align parameters for Smith-Waterman align (-AL): Bandwidth in percent (bip) : [san] 15 [sxa] 20 Bandwidth max (bmax) : [san] 70 [sxa] 80 Bandwidth min (bmin) : [san] 25 [sxa] 20 Minimum score (ms) : [san] 30 [sxa] 15 Minimum overlap (mo) : [san] 15 [sxa] 25 Minimum relative score in % (mrs) : [san] 65 [sxa] 90 Solexa_hack_max_errors (shme) : [san] 0 [sxa] 0 Extra gap penalty (egp) : [san] no [sxa] no extra gap penalty level (egpl) : [san] low [sxa] low Max. egp in percent (megpp) : [san] 100 [sxa] 100 Contig parameters (-CO): Name prefix (np) : Lu000z Reject on drop in relative alignment score in % (rodirs) : [san] 15 [sxa] 30 Mark repeats (mr) : yes Only in result (mroir) : no Assume SNP instead of repeats (asir) : no Minimum reads per group needed for tagging (mrpg) : [san] 2 [sxa] 4 Minimum neighbour quality needed for tagging (mnq) : [san] 20 [sxa] 20 Minimum Group Quality needed for RMB Tagging (mgqrt) : [san] 30 [sxa] 30 End-read Marking Exclusion Area in bases (emea) : [san] 25 [sxa] 4 Also mark gap bases (amgb) : [san] yes [sxa] yes Also mark gap bases - even multicolumn (amgbemc) : [san] yes [sxa] yes Also mark gap bases - need both strands (amgbnbs): [san] yes [sxa] yes Force non-IUPAC consensus per sequencing type (fnicpst) : [san] no [sxa] no Merge short reads (msr) : [san] no [sxa] no Gap override ratio (gor) : [san] 66 [sxa] 66 Edit options (-ED): Automatic contig editing (ace) : [san] no [sxa] no Sanger only: Strict editing mode (sem) : no Confirmation threshold in percent (ct) : 50 Directories (-DI): When loading EXP files : When loading SCF files : Top directory for writing files : Lu000z_assembly For writing result files : Lu000z_assembly/Lu000z_d_results For writing result info files : Lu000z_assembly/Lu000z_d_info For writing log files : Lu000z_assembly/Lu000z_d_log For writing checkpoint files : Lu000z_assembly/Lu000z_d_chkpt File names (-FN): When loading sequences from FASTA : [san] Lu000z_in.sanger.fasta [sxa] Lu000z_in.solexa.fasta When loading qualities from FASTA quality : [san] Lu000z_in.sanger.fasta.qual [sxa] Lu000z_in.solexa.fasta.qual When loading sequences from FASTQ : [san] Lu000z_in.sanger.fastq [sxa] Lu000z_in.solexa.fastq When loading project from CAF : Lu000z_in.sanger.caf When loading project from MAF (disabled) : Lu000z_in.sanger.maf When loading EXP fofn : Lu000z_in.fofn When loading project from PHD : Lu000z_in.phd.1 When loading strain data : Lu000z_straindata_in.txt When loading XML trace info files : [san] Lu000z_traceinfo_in.sanger.xml [sxa] Lu000z_traceinfo_in.solexa.xml When loading SSAHA vector screen results : Lu000z_ssaha2vectorscreen_in.txt When loading backbone from MAF : Lu000z_backbone_in.maf When loading backbone from CAF : Lu000z_backbone_in.caf When loading backbone from GenBank : Lu000z_backbone_in.gbf When loading backbone from FASTA : Lu000z_backbone_in.fasta Output files (-OUTPUT/-OUT): Save simple singlets in project (sssip) : [san] no [sxa] no Save tagged singlets in project (stsip) : [san] yes [sxa] yes Remove rollover logs (rrol) : yes Remove log directory (rld) : no Result files: Saved as CAF (orc) : yes Saved as FASTA (orf) : yes Saved as GAP4 (directed assembly) (org) : no Saved as phrap ACE (ora) : yes Saved as HTML (orh) : no Saved as Transposed Contig Summary (ors) : yes Saved as simple text format (ort) : no Saved as wiggle (orw) : yes Temporary result files: Saved as CAF (otc) : yes Saved as CAF (otm) : no Saved as FASTA (otf) : no Saved as GAP4 (directed assembly) (otg) : no Saved as phrap ACE (ota) : no Saved as HTML (oth) : no Saved as Transposed Contig Summary (ots) : no Saved as simple text format (ott) : no Extended temporary result files: Saved as CAF (oetc) : no Saved as FASTA (oetf) : no Saved as GAP4 (directed assembly) (oetg) : no Saved as phrap ACE (oeta) : no Saved as HTML (oeth) : no Save also singlets (oetas) : no Alignment output customisation: TEXT characters per line (tcpl) : 60 HTML characters per line (hcpl) : 60 TEXT end gap fill character (tegfc) : HTML end gap fill character (hegfc) : File / directory output names: CAF : Lu000z_out.caf MAF : Lu000z_out.maf FASTA : Lu000z_out.unpadded.fasta FASTA quality : Lu000z_out.unpadded.fasta.qual FASTA (padded) : Lu000z_out.padded.fasta FASTA qual.(pad): Lu000z_out.padded.fasta.qual GAP4 (directory): Lu000z_out.gap4da ACE : Lu000z_out.ace HTML : Lu000z_out.html Simple text : Lu000z_out.txt TCS overview : Lu000z_out.tcs Wiggle : Lu000z_out.wig ------------------------------------------------------------------------------ Creating directory Lu000z_assembly ... done. Creating directory Lu000z_assembly/Lu000z_d_log ... done. Creating directory Lu000z_assembly/Lu000z_d_results ... done. Creating directory Lu000z_assembly/Lu000z_d_info ... done. Creating directory Lu000z_assembly/Lu000z_d_chkpt ... done. Localtime: Tue Mar 30 14:09:47 2010 Loading data (Solexa) from FASTQ files, Fatal Error (may be due to problems of the input data): "Could not open FASTQ file 'Lu000z_in.solexa.fastq'. Is it present? Is it readable? Did you want to load your data in another format?" ->Thrown: void ReadPool::loadDataFromFASTQ(const string & filename, const string & qualfilename, const bool generatefilenames, const uint8 seqtype, const uint8 loadaction) ->Caught: Assembly::loadFASTQ(const string & fastqfile, const string & fastaqualfile, const uint8 seqtype, const uint8 loadaction) Program aborted, probably due to error in the input data or parametrisation. Please check the output log for more information. For help, please write a mail to the mira talk mailing list. CWD: /home/tbanks/Downloads/mira/bin Aborted tbanks@mbwinnr502727:~/Downloads/mira/bin$