[mira_talk] Re: FW: 454 assembly

  • From: "Shabhonam Caim (TGAC)" <Shabhonam.Caim@xxxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Thu, 22 Jul 2010 11:04:07 +0100

Thanks Thomas
by using your second suggestion it loaded the files without any problem but I 
guess this one is the last error, which is as follows:

Command : mira --project=test --job=denovo,genome,draft,454 454_SETTINGS 
-FN:fai=2.GAC.454Reads.fna -FN:fqui=2.GAC.454Reads.qual

Error: Done.
Loaded 784833 reads with 211718637 raw bases.
784833 reads have quality accounted for.
Loaded 784833 454 reads.
Total reads loaded: 784833
Localtime: Thu Jul 22 10:57:50 2010

Merging data from XML trace info file test_traceinfo_in.454.xml ...


MIRA tried to load a XML TRACEINFO file containing ancillary data, but failed.
Loading ancillary data when using FASTA files as input is
really,
        really,
                REALLY encouraged, and therefore MIRA sets this as default.

However, if you are really sure that you do not want to load ancillary data
in TRACEINFO files, you can switch it off.
Either use '<technology>_SETTINGS -LR:mxti=no' (e.g. SANGER_SETTING 
-LR:mxti=no),
or use the '-notraceinfo' quickswitch to kill loading traceinfo files for all
types of sequencing technologies. (place it after -fasta and -job quickswitches)



Fatal Error (may be due to problems of the input data):
"TraceInfo XML file not found for loading: test_traceinfo_in.454.xml"

->Thrown: void NCBIInfoXML::readXMLFile(string filename)

->Caught: void ReadPool::mergeXMLTraceInfo(const string & filename)

Program aborted, probably due to error in the input data or parametrisation.
Please check the output log for more information.

Cheers

Shab

From: mira_talk-bounce@xxxxxxxxxxxxx [mailto:mira_talk-bounce@xxxxxxxxxxxxx] On 
Behalf Of Thomas Müller
Sent: 22 July 2010 10:49
To: mira_talk@xxxxxxxxxxxxx
Subject: [mira_talk] Re: FW: 454 assembly

Sorry my bad!!!
I meant:

--job=denovo,genome,draft,454

cheers
Thomas


On Jul 22, 2010, at 11:36 AM, Shabhonam Caim (TGAC) wrote:


Thanks thomas for your reply but still I am getting error:
I have used the following command and provided the qual file as well :

mira --project=test --job=denovo,genome,draft 454_SETTINGS 
-FN:fai=2.GAC.454Reads.fna -FN:fqui=2.GAC.454Reads.qual

            Minimum reads per group needed for tagging (mrpg)    :  [san]  2
                                                                    [454]  4
            Minimum neighbour quality needed for tagging (mnq)   :  [san]  20
                                                                    [454]  20
            Minimum Group Quality needed for RMB Tagging (mgqrt) :  [san]  30
                                                                    [454]  25
            End-read Marking Exclusion Area in bases (emea)      :  [san]  25
                                                                    [454]  10
            Also mark gap bases (amgb)                           :  [san]  yes
                                                                    [454]  no
                Also mark gap bases - even multicolumn (amgbemc) :  [san]  yes
                                                                    [454]  yes
                Also mark gap bases - need both strands (amgbnbs):  [san]  yes
                                                                    [454]  yes
        Force non-IUPAC consensus per sequencing type (fnicpst)  :  [san]  no
                                                                    [454]  no
        Merge short reads (msr)                                  :  [san]  no
                                                                    [454]  no
        Gap override ratio (gor)                                 :  [san]  66
                                                                    [454]  66

  Edit options (-ED):
        Automatic contig editing (ace)              :  [san]  no
                                                       [454]  yes
     Sanger only:
        Strict editing mode (sem)                   : no
        Confirmation threshold in percent (ct)      : 50

  Directories (-DI):
        When loading EXP files            :
        When loading SCF files            :
        Top directory for writing files   : test_assembly
        For writing result files          : test_assembly/test_d_results
        For writing result info files     : test_assembly/test_d_info
        For writing log files             : test_assembly/test_d_log
        For writing checkpoint files      : test_assembly/test_d_chkpt

  File names (-FN):
        When loading sequences from FASTA            :  [san]  
test_in.sanger.fasta
                                                        [454]  
2.GAC.454Reads.fna
        When loading qualities from FASTA quality    :  [san]  
test_in.sanger.fasta.qual
                                                        [454]  
2.GAC.454Reads.qual
        When loading sequences from FASTQ            :  [san]  
test_in.sanger.fastq
                                                        [454]  test_in.454.fastq
        When loading project from CAF                : test_in.sanger.caf
        When loading project from MAF (disabled)     : test_in.sanger.maf
        When loading EXP fofn                        : test_in.fofn
        When loading project from PHD                : test_in.phd.1
        When loading strain data                     : test_straindata_in.txt
        When loading XML trace info files            :  [san]  
test_traceinfo_in.sanger.xml
                                                        [454]  
test_traceinfo_in.454.xml
        When loading SSAHA vector screen results     : 
test_ssaha2vectorscreen_in.txt

        When loading backbone from MAF               : test_backbone_in.maf
        When loading backbone from CAF               : test_backbone_in.caf
        When loading backbone from GenBank           : test_backbone_in.gbf
        When loading backbone from FASTA             : test_backbone_in.fasta


  Output files (-OUTPUT/-OUT):
        Save simple singlets in project (sssip)      :  [san]  no
                                                        [454]  no
        Save tagged singlets in project (stsip)      :  [san]  yes
                                                        [454]  yes

        Remove rollover logs (rrol)                  : yes
        Remove log directory (rld)                   : no

    Result files:
        Saved as CAF                       (orc)     : yes
        Saved as FASTA                     (orf)     : yes
        Saved as GAP4 (directed assembly)  (org)     : no
        Saved as phrap ACE                 (ora)     : yes
        Saved as HTML                      (orh)     : no
        Saved as Transposed Contig Summary (ors)     : yes
        Saved as simple text format        (ort)     : no
        Saved as wiggle                    (orw)     : yes

    Temporary result files:
        Saved as CAF                       (otc)     : yes
        Saved as CAF                       (otm)     : no
        Saved as FASTA                     (otf)     : no
        Saved as GAP4 (directed assembly)  (otg)     : no
        Saved as phrap ACE                 (ota)     : no
        Saved as HTML                      (oth)     : no
        Saved as Transposed Contig Summary (ots)     : no
        Saved as simple text format        (ott)     : no

    Extended temporary result files:
        Saved as CAF                      (oetc)     : no
        Saved as FASTA                    (oetf)     : no
        Saved as GAP4 (directed assembly) (oetg)     : no
        Saved as phrap ACE                (oeta)     : no
        Saved as HTML                     (oeth)     : no
        Save also singlets               (oetas)     : no

    Alignment output customisation:
        TEXT characters per line (tcpl)              : 60
        HTML characters per line (hcpl)              : 60
        TEXT end gap fill character (tegfc)          :
        HTML end gap fill character (hegfc)          :

    File / directory output names:
        CAF             : test_out.caf
        MAF             : test_out.maf
        FASTA           : test_out.unpadded.fasta
        FASTA quality   : test_out.unpadded.fasta.qual
        FASTA (padded)  : test_out.padded.fasta
        FASTA qual.(pad): test_out.padded.fasta.qual
        GAP4 (directory): test_out.gap4da
        ACE             : test_out.ace
        HTML            : test_out.html
        Simple text     : test_out.txt
        TCS overview    : test_out.tcs
        Wiggle          : test_out.wig
------------------------------------------------------------------------------
Deleting old directory test_assembly ... done.
Creating directory test_assembly ... done.
Creating directory test_assembly/test_d_log ... done.
Creating directory test_assembly/test_d_results ... done.
Creating directory test_assembly/test_d_info ... done.
Creating directory test_assembly/test_d_chkpt ... done.
Localtime: Thu Jul 22 10:31:49 2010



========================== Memory self assessment ==============================
Running in 64 bit mode.

Dump from /proc/meminfo
--------------------------------------------------------------------------------
MemTotal:      8174224 kB
MemFree:         94968 kB
Buffers:          4644 kB
Cached:        4976444 kB
SwapCached:     287504 kB
Active:        6618732 kB
Inactive:      1307228 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      8174224 kB
LowFree:         94968 kB
SwapTotal:     2031608 kB
SwapFree:      1476740 kB
Dirty:            1984 kB
Writeback:           0 kB
AnonPages:     2936468 kB
Mapped:          32408 kB
Slab:            81840 kB
PageTables:      37252 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   6118720 kB
Committed_AS:  5290616 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    263728 kB
VmallocChunk: 34359474579 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB
--------------------------------------------------------------------------------

Dump from /proc/self/status
--------------------------------------------------------------------------------
Name:   mira
State:  R (running)
SleepAVG:       0%
Tgid:   6158
Pid:    6158
PPid:   17617
TracerPid:      0
Uid:    8395    8395    8395    8395
Gid:    3658    3658    3658    3658
FDSize: 256
Groups: 3658
VmPeak:     4972 kB
VmSize:     4920 kB
VmLck:         0 kB
VmHWM:      1744 kB
VmRSS:      1744 kB
VmData:      464 kB
VmStk:        84 kB
VmExe:      4336 kB
VmLib:         0 kB
VmPTE:        28 kB
StaBrk: 00a7e000 kB
Brk:    017f0000 kB
StaStk: 7fffc798fa70 kB
Threads:        1
SigQ:   0/71680
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
Cpus_allowed:   
00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff
Mems_allowed:   00000000,00000001
--------------------------------------------------------------------------------

Information on current assembly object:

AS_readpool: 0 reads.
AS_contigs: 0 contigs.
AS_bbcontigs: 0 contigs.
Mem used for reads: 112 (112 B)

Memory used in assembly structures:
                                           Eff. Size   Free cap. LostByAlign
     AS_writtenskimhitsperid:          0        24 B         0 B         0 B
               AS_skim_edges:          0        24 B         0 B         0 B
                 AS_adsfacts:          0        24 B         0 B         0 B
          AS_confirmed_edges:          0        24 B         0 B         0 B
   AS_permanent_overlap_bans:          0        24 B         0 B         0 B
              AS_readhitmiss:          0        24 B         0 B         0 B
            AS_readhmcovered:          0        24 B         0 B         0 B
                AS_count_rhm:          0        24 B         0 B         0 B
                 AS_clipleft:          0        24 B         0 B         0 B
                AS_clipright:          0        24 B         0 B         0 B
                 AS_used_ids:          0        24 B         0 B         0 B
              AS_multicopies:          0        24 B         0 B         0 B
            AS_hasmcoverlaps:          0        24 B         0 B         0 B
       AS_maxcoveragereached:          0        24 B         0 B         0 B
       AS_coverageperseqtype:          0        24 B         0 B         0 B
           AS_istroublemaker:          0        24 B         0 B         0 B
                 AS_isdebris:          0        24 B         0 B         0 B
          AS_needalloverlaps:          0        40 B         0 B         0 B
    AS_readsforrepeatresolve:          0        40 B         0 B         0 B
                AS_allrmbsok:          0        24 B         0 B         0 B
        AS_probablermbsnotok:          0        24 B         0 B         0 B
            AS_weakrmbsnotok:          0        24 B         0 B         0 B
          AS_readmaytakeskim:          0        40 B         0 B         0 B
               AS_skimstaken:          0        40 B         0 B         0 B
          AS_numskimoverlaps:          0        24 B         0 B         0 B
       AS_numleftextendskims:          0        24 B         0 B         0 B
         AS_rightextendskims:          0        24 B         0 B         0 B
      AS_skimleftextendratio:          0        24 B         0 B         0 B
     AS_skimrightextendratio:          0        24 B         0 B         0 B
             AS_usedlogfiles:          1        48 B         0 B         0 B
Total: 920 (920 B)

================================================================================
Dynamic allocs: 0
Align allocs: 0

Fatal Error (may be due to problems of the input data):
"You did not specify any input sequences to be loaded."

->Thrown: void Assembly::loadSequenceData_new()

->Caught: main

Cheers
Shab


From: mira_talk-bounce@xxxxxxxxxxxxx<mailto:mira_talk-bounce@xxxxxxxxxxxxx> 
[mailto:mira_talk-bounce@xxxxxxxxxxxxx] On Behalf Of Thomas Müller
Sent: 22 July 2010 10:12
To: mira_talk@xxxxxxxxxxxxx<mailto:mira_talk@xxxxxxxxxxxxx>
Subject: [mira_talk] Re: FW: 454 assembly

try:
mira --project=test --job=denovo,genome,draft,est 454_SETTINGS 
-FN:fai=2.GAC.454Reads.fna

But you should really also add the .qual file with FN:fqui=2.GAC.454Reads.qual

cheers
Thomas

On Jul 22, 2010, at 10:53 AM, Shabhonam Caim (TGAC) wrote:




Hello Mira Users

I am trying to assemble the 454 reads using Mira by using following command:
mira-3.0.0 mira --project=test --job=denovo,genome,draft,2.GAC.454Reads.fna

and I am getting the following error:

This is MIRA V3.0.0 (production version).

Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence
Assembly Using Trace Signals and Additional Sequence Information.
Computer Science and Biology: Proceedings of the German Conference on
Bioinformatics (GCB) 99, pp. 45-56.

Mail general questions to the MIRA talk mailing list:
        mira_talk@xxxxxxxxxxxxx<mailto:mira_talk@xxxxxxxxxxxxx>

To (un-)subsubcribe the MIRA mailing lists, see:
        http://www.chevreux.org/mira_mailinglists.html

To report bugs or ask for features, please use the new ticketing system at:
        http://sourceforge.net/apps/trac/mira-assembler/
This ensures that requests don't get lost.


Compiled by: bach
Sun Jan 31 20:23:36 CET 2010
On: Linux arcadia64 2.6.27-11-generic #1 SMP Wed Apr 1 20:53:41 UTC 2009 x86_64 
GNU/Linux
Compiled in boundtracking mode.
Compiled in bugtracking mode.
Compilation settings (sorry, for debug):
        Size of size_t  : 8
        Size of uint32  : 4
        Size of uint32_t: 4
        Size of uint64  : 8
        Size of uint64_t: 8
Current system: Linux n57140 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 
x86_64 x86_64 x86_64 GNU/Linux



Parsing parameters: --project=454asembly --job=denovo, genome, draft 
pk.454.fasta

Seen no assembly quality in job definition, assuming 'normal'.
Seen no assembly type in job definition, assuming 'genome'.

,..

========================= Parameter parsing error(s) ==========================

* Parameter section: '(none)'
*       unrecognised string or unexpected character: genome

* Parameter section: '(none)'
*       unrecognised string or unexpected character: draft

* Parameter section: '(none)'
*       unrecognised string or unexpected character: pk

* Parameter section: '(none)'
*       unrecognised string or unexpected character: 454

* Parameter section: '(none)'
*       unrecognised string or unexpected character: fasta

===============================================================================

Fatal Error (may be due to problems of the input data):
"Error while parsing parameters, sorry."

->Thrown: void MIRAParameters::parse(istream & is, vector<MIRAParameters> & Pv, 
MIRAParameters * singlemp)

->Caught: main

Or can I please get the commands to run the 454 assembly (basic denovo assembly 
with default parameters)

cheers

Shab


--
Crop Plant Biodiversity and Breeding Informatics Group (350b)
Institute of Plant Breeding, Seed Science and Population Genetics
University of Hohenheim
Fruwirthstrasse 21
D-70599 Stuttgart
Phone: +49-711-459 24293


--
Crop Plant Biodiversity and Breeding Informatics Group (350b)
Institute of Plant Breeding, Seed Science and Population Genetics
University of Hohenheim
Fruwirthstrasse 21
D-70599 Stuttgart
Phone: +49-711-459 24293

Other related posts: