[mira_talk] Re: vector clipping
- From: "Jan Paces" <Jan.Paces@xxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Sun, 15 Mar 2009 14:32:50 +0100 (CET)
Dear Bastien,
> found the problem. ssaha must be invoked like this:
>
> ssaha vector.fa cra_in.454.fasta -da 0 -pf >cra_ssahavectorscreen_in.txt
>
> That is, first the file with the vector sequences, then the file with
the
> sequencing data. I'm not sure whether this is the "correct" order as
ssaha
> defines it, but when I implemented the parsing options I made some tests
during
> which I found out that ssaha somehow lost valid hits when it was called the
> other way round.
Thanks, probably my mistake not reading mira manuals carefully enough. Now
it works.
---------------------
[maybe someone here finds this interesting:]
I did some test with ssaha by myself and when invoking ssaha with vector
as second, I have ~3% more hits. However, most of them are short:
$ wc *
81217 730953 4345766 ssaha_reads_vector.txt
78704 708336 4218736 ssaha_vector_reads.txt
$ cat ssaha_reads_vector.txt | sort -k 2,3
...
FF FRLDO2B01A0019 269 292 pCC1BAC 1981 2004 24 100.00
RF FRLDO2B01A007S 12 347 pCC1BAC 5317 5652 336 100.00
RF FRLDO2B01A007S 440 451 pCC1BAC 5209 5220 12 100.00 <-
FF FRLDO2B01A01K8 309 356 pCC1BAC 1261 1308 48 100.00
...
$ cat ssaha_vector_reads.txt | sort -k 5,6 | head
...
FF pCC1BAC 1952 1975 FRLDO2B01A0019 241 264 24 100.00
RF pCC1BAC 5316 5651 FRLDO2B01A007S 13 348 336 100.00
FF pCC1BAC 966 1253 FRLDO2B01A01K8 13 300 288 100.00
...
and this is how to convert it for mira:
cat ssaha_reads_vector.txt | \
awk '{print $1 "\t" $5 "\t" $6 "\t" $7 "\t" $2 "\t" $3 "\t" $4 "\t" $8
"\t" $9}' \
> ssaha_switched.txt
----------------------
> As a side question: are you perhaps using an older version of
> "sff_extract" to
> create the input files for mira? In the files you sent me, the XML looks
like it
> ... it has quality clips instead of sequencing vector clips.
I am using sffinco2mirafiles.tcl according to this line in documentation:
tsunami:/path/to/myProject> sffinfo bchoc.sff | sffinfo2mirafiles.tcl
-project bchoc
(I found it somewhere, but I am not sure where, I wasn't able to track it
again - maybe outdated info.)
I am curious, why there should be "vector clip" in xml instead od "quality
clip"? I though information extracted from sff file is mostly about
quality clipping. Anyway, I shall use sff_extract from now on.
Thank You again,
Jan
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: