[mira_talk] Re: sff_extract

Thanks again,

does having multiple linker sequence affect the paired status of the reads
or will the *.part1 link with the *.part1 and part2 with part2 etc.

On Fri, May 22, 2009 at 10:42 AM, Lionel Guy <guy.lionel@xxxxxxxxx> wrote:

> Blimey, I'm stupid!
>
> It's actually *_mlc that denotes the presence of multiple linker presence.
> *_pl denotes the presence of _partial_ linker (with a 10% tolerance) in a
> sequence.
>
> From the comments of the program:
>
>    '''Splits a paired end read and writes sequences into FASTA, FASTA qual
>    and XML traceinfo file. Returns the number of sequences created.
>
>    As the linker sequence may be anywhere in the read, including the ends
>    and overlapping with bad quality sequence, we need to perform some
>    computing and eventually set new clip points.
>
>    If the resulting split yields only one sequence (because linker
>    was not present or overlapping with left or right clip), only one
>    sequence will be written with ".fn" appended to the name.
>
>    If the read can be split, two reads will be written. The side left of
>    the linker will be named ".r" and will be written in reverse complement
>    into the file to conform with what approximately all assemblers expect
>    when reading paired-end data: reads in forward direction in file. The
> side
>    right of the linker will be named ".f"
>
>    If SSAHA found partial linker (linker sequences < length of linker),
>    the sequences will get a "_pl" furthermore be cut back thoroughly.
>
>    If SSAHA found multiple occurences of the linker, the names will get an
>    additional "_mlc" within the name to show that there was "multiple
>    linker contamination".
>
>    For multiple or partial linker, the "good" parts of the reads are
>    stored with a ".part<number>" name, additionally they will not get
>    template information in the XML
>    '''
>
>
> Sorry for the mess...
>
> Lionel
>
>
> On 22 May 2009, at 11:24 , Brian Forde wrote:
>
>  Thanks Lionel,
>>
>> While i'm on the topic I have also noticed *_mlc.part extensions as well.
>> Can you tell me what these ones are?
>>
>> On Thu, May 21, 2009 at 6:05 PM, Lionel Guy <guy.lionel@xxxxxxxxx> wrote:
>> Hi Brian,
>>
>> *_pl reads mean that there were several linker matches found in the read.
>>
>> By the way, there has been some issues with sff_extract, what version are
>> you using? The latest one is (AFAIK) in
>> http://www.chevreux.org/tmp/mira_3rdparty_05-05-2009.tar.bz2
>>
>> Cheers,
>>
>> Lionel
>>
>>
>> On 21 May 2009, at 18:09 , Brian Forde wrote:
>>
>> Hello all,
>>
>> I was having a look through the multi fasta file output of sffextract and
>> noticed some thing. All the reads have a file extension. *.fn (normal
>> nonpaired shotgun reads)
>> *.f/*.r (paired reads) and *_pl.part1. What I would like to know is what
>> are these reads the *_pl.part1?
>>
>> regards
>>
>> Brian
>>
>> --
>> "If scientists knew what they were doing they wouldn't call it research"
>>
>>
>> --
>> You have received this mail because you are subscribed to the mira_talk
>> mailing list. For information on how to subscribe or unsubscribe, please
>> visit http://www.chevreux.org/mira_mailinglists.html
>>
>>
>>
>> --
>> "If scientists knew what they were doing they wouldn't call it research"
>>
>
> ============================================
> Lionel Guy
> Thunmansgatan 25, SE-75421 Uppsala
>
> phone: +46 (0)18 245596
> mobile: +46 (0)73 9760618
> email: guy.lionel@xxxxxxxxx
> ============================================
>
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>



-- 
"If scientists knew what they were doing they wouldn't call it research"

Other related posts: