[mira_talk] Re: Problems using sff extract

  • From: "Bastien Chevreux" <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 16 Dec 2010 09:02:18 +0100 (MET)

>From: Ganga Jeena
> Thanks it worked  with 0.2.9 

0.2.9 and 0.2.9 are identical except I re-activated a check for ssaha2 
runability from Python. If 0.2.9 worked, 0.2.8 should have too.

> Here for only 80709 paired end reads sequences generated could have been 
> maximum
> double 161418 but they are 7 times 754371 ( Converted 441320 reads into 
> 754371 sequences.)
> How is it possible ?
> How exactly does the sff_extarct works?

SSAHA2 for finding linker sequences, splitting reads at these places. For 
details, please read the function description comment in the function

split_paired_end(data, sff_fh, seq_fh, qual_fh, xml_fh):

of sff_extract

> Does it not only take sequences with linker and discard the others which 
> either had no liker
> or had linker in far-end which when separated could not have the other pair 
> end?

No, why throw away data which could still be useful?

> Is the .r for reverse and .f for forward strand of same sequence ends ? 

Yes.

> What does this .fn indicate??
> Why are nnn appended to end of the sequences ?

See function comments pointed to above-

> Why Most of the sequences are in small letters ?

Clipped sequences in the SFF, see Roche documentation.

B.

Other related posts: