[mira_talk] Re: vector clipping

Dear Bastien,

> found the problem. ssaha must be invoked like this:
>
> ssaha vector.fa cra_in.454.fasta -da 0 -pf >cra_ssahavectorscreen_in.txt
>
> That is, first the file with the vector sequences, then the file with
the
> sequencing data. I'm not sure whether this is the "correct" order as
ssaha
> defines it, but when I implemented the parsing options I made some tests
during
> which I found out that ssaha somehow lost valid hits when it was called the
> other way round.

Thanks, probably my mistake not reading mira manuals carefully enough. Now
it works.

---------------------
[maybe someone here finds this interesting:]
I did some test with ssaha by myself and when invoking ssaha with vector
as second, I have ~3% more hits. However, most of them are short:

$ wc *
   81217   730953  4345766 ssaha_reads_vector.txt
   78704   708336  4218736 ssaha_vector_reads.txt

$ cat ssaha_reads_vector.txt | sort -k 2,3
...
FF FRLDO2B01A0019 269 292 pCC1BAC 1981 2004 24  100.00
RF FRLDO2B01A007S 12  347 pCC1BAC 5317 5652 336 100.00
RF FRLDO2B01A007S 440 451 pCC1BAC 5209 5220 12  100.00 <-
FF FRLDO2B01A01K8 309 356 pCC1BAC 1261 1308 48  100.00
...

$ cat ssaha_vector_reads.txt | sort -k 5,6 | head
...
FF pCC1BAC 1952 1975 FRLDO2B01A0019 241 264 24  100.00
RF pCC1BAC 5316 5651 FRLDO2B01A007S 13  348 336 100.00
FF pCC1BAC 966  1253 FRLDO2B01A01K8 13  300 288 100.00
...

and this is how to convert it for mira:

cat ssaha_reads_vector.txt | \
awk '{print $1 "\t" $5 "\t" $6 "\t" $7 "\t" $2 "\t" $3 "\t" $4 "\t" $8
"\t" $9}' \
> ssaha_switched.txt

----------------------

> As a side question: are you perhaps using an older version of
> "sff_extract" to
> create the input files for mira? In the files you sent me, the XML looks
like it
> ... it has quality clips instead of sequencing vector clips.

I am using sffinco2mirafiles.tcl according to this line in documentation:

tsunami:/path/to/myProject> sffinfo bchoc.sff | sffinfo2mirafiles.tcl
-project bchoc

(I found it somewhere, but I am not sure where, I wasn't able to track it
again - maybe outdated info.)

I am curious, why there should be "vector clip" in xml instead od "quality
clip"? I though information extracted from sff file is mostly about
quality clipping. Anyway, I shall use sff_extract from now on.

Thank You again,

Jan





-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: