[mira_talk] Re: sff_extract
- From: Lionel Guy <guy.lionel@xxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Sat, 23 May 2009 15:06:22 +0200
I'm not quite sure about your question... there is just one part1 and
one part2, so how would you link part1 with part1? If you have more
than two parts (meaning that there would be a sequence, a linker, a
second sequence, another linker, and a third sequence, it would mean
your paired-end sequences are kind of screwed... It would raise an
error or a warning, AFAIK.
The reads will get names that reflect their status and you could link
part1 to part2, but the linking information doesn't make it to the xml
file, which I think is sound. If you have multiple linker
contamination, you shouldn't really trust the pair, should you?
Lionel
On 22 May 2009, at 18:32 , Brian Forde wrote:
Thanks again,
does having multiple linker sequence affect the paired status of the
reads or will the *.part1 link with the *.part1 and part2 with part2
etc.
On Fri, May 22, 2009 at 10:42 AM, Lionel Guy <guy.lionel@xxxxxxxxx>
wrote:
Blimey, I'm stupid!
It's actually *_mlc that denotes the presence of multiple linker
presence. *_pl denotes the presence of _partial_ linker (with a 10%
tolerance) in a sequence.
From the comments of the program:
'''Splits a paired end read and writes sequences into FASTA,
FASTA qual
and XML traceinfo file. Returns the number of sequences created.
As the linker sequence may be anywhere in the read, including the
ends
and overlapping with bad quality sequence, we need to perform some
computing and eventually set new clip points.
If the resulting split yields only one sequence (because linker
was not present or overlapping with left or right clip), only one
sequence will be written with ".fn" appended to the name.
If the read can be split, two reads will be written. The side
left of
the linker will be named ".r" and will be written in reverse
complement
into the file to conform with what approximately all assemblers
expect
when reading paired-end data: reads in forward direction in file.
The side
right of the linker will be named ".f"
If SSAHA found partial linker (linker sequences < length of
linker),
the sequences will get a "_pl" furthermore be cut back thoroughly.
If SSAHA found multiple occurences of the linker, the names will
get an
additional "_mlc" within the name to show that there was "multiple
linker contamination".
For multiple or partial linker, the "good" parts of the reads are
stored with a ".part<number>" name, additionally they will not get
template information in the XML
'''
Sorry for the mess...
Lionel
On 22 May 2009, at 11:24 , Brian Forde wrote:
Thanks Lionel,
While i'm on the topic I have also noticed *_mlc.part extensions as
well. Can you tell me what these ones are?
On Thu, May 21, 2009 at 6:05 PM, Lionel Guy <guy.lionel@xxxxxxxxx>
wrote:
Hi Brian,
*_pl reads mean that there were several linker matches found in the
read.
By the way, there has been some issues with sff_extract, what
version are you using? The latest one is (AFAIK) in http://www.chevreux.org/tmp/mira_3rdparty_05-05-2009.tar.bz2
Cheers,
Lionel
On 21 May 2009, at 18:09 , Brian Forde wrote:
Hello all,
I was having a look through the multi fasta file output of
sffextract and noticed some thing. All the reads have a file
extension. *.fn (normal nonpaired shotgun reads)
*.f/*.r (paired reads) and *_pl.part1. What I would like to know is
what are these reads the *_pl.part1?
regards
Brian
--
"If scientists knew what they were doing they wouldn't call it
research"
--
You have received this mail because you are subscribed to the
mira_talk mailing list. For information on how to subscribe or
unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html
--
"If scientists knew what they were doing they wouldn't call it
research"
============================================
Lionel Guy
Thunmansgatan 25, SE-75421 Uppsala
phone: +46 (0)18 245596
mobile: +46 (0)73 9760618
email: guy.lionel@xxxxxxxxx
============================================
--
You have received this mail because you are subscribed to the
mira_talk mailing list. For information on how to subscribe or
unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html
--
"If scientists knew what they were doing they wouldn't call it
research"
============================================
Lionel Guy
Thunmansgatan 25, SE-75421 Uppsala
phone: +46 (0)18 245596
mobile: +46 (0)73 9760618
email: guy.lionel@xxxxxxxxx
============================================
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: