[mira_talk] Re: sff_extract

Blimey, I'm stupid!

It's actually *_mlc that denotes the presence of multiple linker presence. *_pl denotes the presence of _partial_ linker (with a 10% tolerance) in a sequence.

From the comments of the program:

'''Splits a paired end read and writes sequences into FASTA, FASTA qual
    and XML traceinfo file. Returns the number of sequences created.

As the linker sequence may be anywhere in the read, including the ends
    and overlapping with bad quality sequence, we need to perform some
    computing and eventually set new clip points.

    If the resulting split yields only one sequence (because linker
    was not present or overlapping with left or right clip), only one
    sequence will be written with ".fn" appended to the name.

If the read can be split, two reads will be written. The side left of the linker will be named ".r" and will be written in reverse complement into the file to conform with what approximately all assemblers expect when reading paired-end data: reads in forward direction in file. The side
    right of the linker will be named ".f"

If SSAHA found partial linker (linker sequences < length of linker),
    the sequences will get a "_pl" furthermore be cut back thoroughly.

If SSAHA found multiple occurences of the linker, the names will get an
    additional "_mlc" within the name to show that there was "multiple
    linker contamination".

    For multiple or partial linker, the "good" parts of the reads are
    stored with a ".part<number>" name, additionally they will not get
    template information in the XML
    '''


Sorry for the mess...

Lionel

On 22 May 2009, at 11:24 , Brian Forde wrote:

Thanks Lionel,

While i'm on the topic I have also noticed *_mlc.part extensions as well. Can you tell me what these ones are?

On Thu, May 21, 2009 at 6:05 PM, Lionel Guy <guy.lionel@xxxxxxxxx> wrote:
Hi Brian,

*_pl reads mean that there were several linker matches found in the read.

By the way, there has been some issues with sff_extract, what version are you using? The latest one is (AFAIK) in http://www.chevreux.org/tmp/mira_3rdparty_05-05-2009.tar.bz2

Cheers,

Lionel


On 21 May 2009, at 18:09 , Brian Forde wrote:

Hello all,

I was having a look through the multi fasta file output of sffextract and noticed some thing. All the reads have a file extension. *.fn (normal nonpaired shotgun reads) *.f/*.r (paired reads) and *_pl.part1. What I would like to know is what are these reads the *_pl.part1?

regards

Brian

--
"If scientists knew what they were doing they wouldn't call it research"


--
You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html



--
"If scientists knew what they were doing they wouldn't call it research"

============================================
Lionel Guy
Thunmansgatan 25, SE-75421 Uppsala

phone: +46 (0)18 245596
mobile: +46 (0)73 9760618
email: guy.lionel@xxxxxxxxx
============================================


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: