[mira_talk] Re: Problems using sff extract

From: Ganga Jeena <ganga.jeena@xxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Tue, 14 Dec 2010 17:50:44 +0530
[dc103@compute-0-0 orig_data]$  ~/apps/mira/bin/sff_extract -l
flx_linker.fa  -i "insert_size:8000, insert_stdev:1000"
/home/dc103/CP_Data/sff_lf/GSMQLLJ01.sff >LOG&

[dc103@compute-0-0 orig_data]$ Working on
'/home/dc103/CP_Data/sff_lf/GSMQLLJ01.sff':
Creating temporary sequences from reads in
'/home/dc103/CP_Data/sff_lf/GSMQLLJ01.sff' ...  done.
Searching linker sequences with SSAHA2 (this may take a while) ...

An error occured during the SSAHA2 execution, aborting.

On Tue, Dec 14, 2010 at 2:28 PM, Ganga Jeena <ganga.jeena@xxxxxxxxx> wrote:

> [dc103@hpc orig_data]$ which ssaha2
> /usr/local/bin/ssaha2
>
>
> [dc103@hpc orig_data]$  /usr/local/bin/ssaha2
>
>
> ========================
> = SSAHA2 version 2.5.3 =
> ========================
>
> by Hannes Ponstingl, Adam Spargo and Zemin Ning.
>
> Copyright (C) 2003-2010 The Wellcome Trust Sanger Institute, Cambridge, UK.
> All Rights Reserved.
>
> SSAHA2 is a package combining SSAHA with cross_match developed
> by Phil Green at the University of Washington.
>
> Reference: Ning Z, Cox AJ, Mullikin JC.
>            SSAHA: a fast search method for large DNAdatabases.
>            Genome Res. 2001 Oct;11(10):1725-9.
>
> DESCRIPTION:
>
> ssaha2Build  This program constructs the hashtable
>              required by the other ssaha2 programs.
>              This provides an index for the given subject
>              sequence.
>
> ssaha2       This program aligns query sequences against a
>              subject hashtable.
>
> ssahaSNP     ssahaSNP is a polymorphism detection tool.
>              It detects homozygous SNPs and indels by aligning
>              shotgun reads to the finished genome sequence.
>              From the best alignment, SNP candidates are
>              screened, taking into account the quality value
>              of the bases with variation as well as the quality
>              values in the neighbouring bases, using
>              neighbourhood quality standard (NQS).
>
> USAGE:
>    *************************************************************
>    * NOTE: SSAHA2 & SSAHASNP COMMAND LINE FORMATS HAVE CHANGED *
>    *       with version 1.0.9                                  *
>    *************************************************************
>
>       ssaha2Build [OPTIONS] -save <hash name> <subject file>
>
>       ssaha2 [OPTIONS] -save <hash name> <query file> [<query file 2nd>]
>       ssaha2 [OPTIONS] <subject file> <query file> [<query file 2nd>]
>
>       ssahaSNP [OPTIONS] -save <hash name> <query file>
>       ssahaSNP [OPTIONS] <subject file> <query file>
>
>    where <subject file> and <query file> are fasta or fastq files with
>    the reference (subject) and query sequence files. <hash name> is
>    the root name of the hash table files created from the reference
>    sequence using ssaha2Build.
>    The paired-end module invoked with the -pair option (see below)
>    expects two query files with file <query file> containing the first
>    and <query file 2nd> containing the second mate of each pair. The
>    mates of a paired-end read must occur at the same postion in the
>    sequence of reads in <query file> and <query file 2nd>. The paired-
>    end module is not available for ssahaSNP.
>
> OPTIONS:
>
>  -h, -help     Print this page.
>  -v, -version  Print version information.
>  -c, -cookbook Print some example parameter sets, suitable
>                for common tasks.
>
> All other options, except the '-solexa' and '-454' flags and the
> '-output ssaha2 cigar' option, are key-value pairs. Options are described
> below
> (default values in brackets):
>
>  -save <FILENAME>
>             ssaha2Build: Root name of the files to which the
>                          hash table is saved. The set of files
>                          have the extensions FILENAME.head FILENAME.body
>                          FILENAME.name FILENAME.base FILENAME.size
>             ssaha2, ssahaSNP: Read hash table from files created by
> ssaha2Build.
>                            If -save is not specified then the first file
> name
>                          specifies the reference sequence for which a hash
>                          table is constructed on the fly
>
>  -kmer      Word size for ssaha hashing (13).
>
>  -skip      Step size for ssaha hashing (13).
>
>  -ckmer     Word size for cross_match matching (10).
>
>  -cmatch    Minimum match length for cross_match matching (14).
>
>  -cut       Number of repeats allowed before this kmer is ignored (10000).
>
>  -seeds     Number of kmer matches required to flag a hit (5).
>
>  -depth     Number of hits to consider for alignment (50).
>
>  -memory:   Memory assigned in MBs for the alignment matrix (200).
>
>  -score     Minimum score for match to be reported (30).
>
>  -identity  Minimum identity (as percentage) for match to be reported
> (50.000000).
>
>  -port      Port number for server (60000).
>
>  -align     If set to > 0, output graphical alignment (0).
>             If set to 2 and -solexa flag is set: output also quality score.
>
>  -edge      (obsolete as of release 2.0.0) Augment hit by this many bases
>             before alignment (200).
>
>  -array:    Memory assigned in bytes for frequency arrays (4000000).
>
>  -start:    first sequence to process in query (0).
>
>  -end:      last sequence to process in query, 0 means process all (0).
>
>  -sense     (obsolete as of release 2.0.0) Allow really patchy hits to go
>             for alignment (0).
>
>  -best      If set to 1, only report the best alignment for each
>             match, if multiple best scores report all (65). If set to a
>             negative value '-best -<b>' report alignment only if there are
>             at most <b> best mappings.
>
>  -454:      (see -rtype) Tune for 454 reads (0). Implies '-skip 3 -seeds 2
>             -score 30 -sense 1 -cmatch 10 -ckmer 6'
>
>  -NQS:      Use NQS to filter SNPs if set to 1, otherwise output all
>             candidates (1).
>
>  -quality:  Quality value to use for variation base in NQS (23).
>
>  -tags:     If set to 1, prefix added to output summary lines to
>             aid parsing, the prefix depends upon the chosen
>             output format, e.g. if output is ssaha2 then the
>             prefix is ALIGNMENT (1).
>
>  -output:   ssaha2       - original ssaha2 line only (default)
>             sugar        - Simple UnGapped Alignment Report
>             cigar        - Compact Idiosyncratic Gapped Alignment Report
>             vulgar       - Verbose Useful Labelled Gapped Alignment Report
>             psl          - Tab separated format similar to BLT
>
> http://genome.ucsc.edu/goldenPath/help/customTrack.html
>             pslx         - Tab separated format with sequence
>             gff          - http://www.sanger.ac.uk/Software/formats/GFF/
>             ssaha2 cigar - alternate between ssaha2 and cigar format lines
>             aln          - Tony Cox' alignment output format.
>             sam          - SAM format (http://samtools.sourceforge.net).
> Output
>                            is written to a file specified with the -oufile
> flag.
>
>             sam_soft     - SAM format using soft clipping. The entire read
> is
>                            output./n/n            for a full description of
> output formats see:
>
> http://www.sanger.ac.uk/Software/analysis/SSAHA2/formats.shtml
>
>  -name      Flag that modifies option '-output cigar' such that read name
>             and length are also reported when there was no hit found.
>
>  -diff:     Output all hits within diff of the best (-1).
>
>  -udiff:    Ignore best hit if second best score within udiff (0).
>
>  -fix:      If set to 1, fix -edge, -seeds, -score so that they
>             are not updated according to read length in ssahaSNP (0).
>
>  -disk:     0: map all files into memory including the query sequences. (0)
>             1: read hashtable body, the subject and query sequences from
> disk
>                rather than loading them into memory.
>             2: like 1 but the subject sequences are read into memory. The
> body
>                is left on the disk.
>             In both cases the query sequence file is not mapped into
> memory.
>  -weight:   If >0, apply this much weighting to rare kmers (0).
>
>  -solexa:   (see -rtype) implies (ssaha2 and ssahaSNP only):
>             -kmer 13 -skip 2 -seeds 2 -score 12 -cmatch 9 -ckmer 6.
>             Top scoring hits with lower quality at the mismatch positions
>             have their Smith-Waterman score incremented by 1. Mapping
>             scores are changed accordingly. SsahaSNP reports in such cases
>             only the top scoring hit (no Repeat lines).
>
>  -rtype:    solexa          - like -solexa flag
>             454             - like -454 flag
>             abi             - tunes for ABI reads (default)
>
>  -pair <dist_lower>,<dist_upper>: Map paired-end reads with an insert
> length in
>             the interval [<dist_lower>,<dist_upper>]. The reads for each
> mate
>             are in a separate input file named by the last two tokens of
> the
>             command line. The sequence for both mates are read in the same
>             direction. Paired-end reads are reported if the mates match
> within
>             <dist_upper> bases from each other, but at least <dist_lower>
> bases
>             apart. No mate is reported, if the mapping is ambiguous, i.e.
> if
>             there are multipe hits with the same Smith-Waterman alignment
> score
>             in the specified distance range.
>             If hits are found only for one mate, the highest scoring hit is
>             reported for this mate unless it is ambiguous.
>
>  -outfile <file_name> Output file for output in SAM format or, if the -pair
>             option is set, for mate pairs that map outside the specified
>             distance range unambiguously (each with a mapping score > 30,
> or a
>             value set with the -mthresh option below).
>
>  -mthresh <mapping_score>  Threshold in the mapping score for mate pairs to
> be
>             reported that map outside the distance range (see option
> -outfile).
>             <mapping_score> is an integer between 0 and 50 (default: 30).
>             This option requires the -pair option to be set.
>
>  -multi <rand_seed> Flag modifying the way paired-end reads with multiple
>             (repetitive) mappings are reported. If rand_seed == 0: skip
> such
>             pairs entirely. if rand_seed> 0: select one pair at random.
>             <rand_seed> seeds the random number generator.
>
>
>
>
> On Tue, Dec 14, 2010 at 2:17 PM, Bastien Chevreux <bach@xxxxxxxxxxxx>wrote:
>
>> On Dienstag 14 Dezember 2010 Ganga Jeena wrote:
>> > [dc103@hpc orig_data]$ /usr/local/bin/sff_extract -o cp_454 -a -l
>> > flx_linker.fa  -i "insert_size:8000, insert_stdev:1000"
>> > /home/dc103/CP_Data/sff_lf/GSMQLLJ01.sff
>> > [Errno 22] Invalid argument
>> >
>>
>> No, not
>>
>>  /usr/local/bin/sff_extract
>>
>> but
>>
>>  /usr/local/bin/ssaha2
>>
>> While we are at it: what does
>>
>>  which ssaha2
>>
>> say?
>>
>> B.
>>
>> --
>> You have received this mail because you are subscribed to the mira_talk
>> mailing list. For information on how to subscribe or unsubscribe, please
>> visit http://www.chevreux.org/mira_mailinglists.html
>>
>
>
Follow-Ups:
- [mira_talk] Re: Problems using sff extract
  - From: Bastien Chevreux
References:
- [mira_talk] Problems using sff extract
  - From: Ganga Jeena
- [mira_talk] Re: Problems using sff extract
  - From: Bastien Chevreux
- [mira_talk] Re: Problems using sff extract
  - From: Ganga Jeena
- [mira_talk] Re: Problems using sff extract
  - From: Bastien Chevreux
- [mira_talk] Re: Problems using sff extract
  - From: Ganga Jeena
[mira_talk] Re: Problems using sff extract

Other related posts: