[mira_talk] Re: fn in paired-end reads?

  • From: Cleo HC Ho <cleoho175@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 31 May 2011 10:43:47 -0400

Hi Sven,

Thanks a lot! May I ask where you obtain this documention? I couldn't find
it on the sff_extract website.

Also, why does the reverse read have be written in complement? Aren't the
right and left sides of the linker from the same strand of DNA?

cheers,
Cleo



cheers,
Cleo

On Tue, May 31, 2011 at 3:26 AM, Sven Klages
<sir.svencelot@xxxxxxxxxxxxxx>wrote:

> Hi Cleo,
>
>
> 2011/5/30 Cleo HC Ho <cleoho175@xxxxxxxxx>
>
>> Hi all,
>>
>> After the SSAHA2 extraction, some reads are tagged with fn. What does "fn"
>> mean? I tried searching the archive but did not find anything. Thanks in
>> advance!
>>
>
> You have splitted paired-end data using sff_extract?
>
> Taken from sff_extract:
>
> Splits a paired end read and writes sequences into FASTA, FASTA qual
>     and XML traceinfo file. Returns the number of sequences created.
>
>     As the linker sequence may be anywhere in the read, including the ends
>     and overlapping with bad quality sequence, we need to perform some
>     computing and eventually set new clip points.
>
>     If the resulting split yields only one sequence (because linker
>     was not present or overlapping with left or right clip), only one
>     sequence will be written with ".fn" appended to the name.
>
>     If the read can be split, two reads will be written. The side left of
>     the linker will be named ".r" and will be written in reverse complement
>     into the file to conform with what approximately all assemblers expect
>     when reading paired-end data: reads in forward direction in file. The
> side
>     right of the linker will be named ".f"
>
>     If SSAHA found partial linker (linker sequences < length of linker),
>     the sequences will get a "_pl" furthermore be cut back thoroughly.
>
>     If SSAHA found multiple occurences of the linker, the names will get an
>     additional "_mlc" within the name to show that there was "multiple
>     linker contamination".
>
>     For multiple or partial linker, the "good" parts of the reads are
>     stored with a ".part<number>" name, additionally they will not get
>     template information in the XML
>
> hth,
> Sven
>

Other related posts: