[mira_talk] dependencies between X masked data and traceinfo.xml data in mira clipping behaviour
- From: Jorge.DUARTE@xxxxxxxxxxxx
- To: mira_talk@xxxxxxxxxxxxx
- Date: Wed, 8 Oct 2008 12:04:10 +0200
Hi all,
Does anyone have experience in using together traceinfo.xml data and X
masked sequences with mira ?
I mean, looking at the doc of mira :
maskedbases_clip(mbc)=on|yes|1, off|no|0
Default is dependent of the sequencing technology used. This will let mira
perform a 'clipping' of bases that were masked out (replaced with the
character X). It is generally not a good idea to use mask bases to remove
unwanted portions of a sequence, the EXP file format and the NCBI
traceinfo format have excellent possibilities to circumvent this. But
because a lot of preprocessing software are built around cross_match,
scylla- and phrap-style of base masking, the need arised for mira to be
able to handle this, too. mira will look at the start and end of each
sequence to see whether there are masked bases that should be 'clipped'.
It looks like Xs are clipped at some point, but it doesn't say when, and i
wonder if there could be a problem in the clipping process from the
data contained into the xml file in case this clipping is done after the
other.. (assuming that clipping means actually removing of a region)
Indeed, if i have a sequence like this one:
>FAINC1H01EKDXQ
TCAGACGAGTGCGTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXAGCAGA
GATGATGTGGGCAAGTTCCTTCCCACATACTTGGCGCAGGGAATCCTTCA
GAGCGCTGAGCGGGCTGGCAAGGC
and traceinfo data like this:
<trace>
<trace_name>FAINC1H01EKDXQ</trace_name>
<trace_type_code>454</trace_type_code>
<program_id>454Basecaller</program_id>
<clip_quality_left>15</clip_quality_left>
<clip_quality_right>105</clip_quality_right>
</trace>
if mira clips first the Xs, and then try to clip the sequence using the
traceinfo data,
will the sequence not be too short to be clipped at base 105 ?
I appologyze if this is a stupid question, as i'm not familiar with
general clipping behaviour of bioinformatics tools.
But if someone could tell me that mira will have no trouble handling this,
that would be great !
Regards,
Jorge.
---
Jorge Duarte
Bioinformatics Research Engineer
BIOGEMMA - Upstream Genomics Group
Z.I. Du Brézet
8, Rue des Frères Lumière
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 39 60 71
E-mail : jorge.duarte@xxxxxxxxxxxx
*****************************************************************
Pour toute demande de support merci d'inclure
BIOGEMMA_BioInfo_Service ou bioinfo@xxxxxxxxxxxx
dans les destinataires lors du premier contact
*****************************************************************
BIOGEMMA S.A.S. au capital social de 48.335.652,00 ?. 1, Rue Edouard
Colonne - 75001 PARIS. RCS PARIS 412 514 366
This message and any attachments are confidential and intended solely for
the use of the addressee(s) named above. The information contained in this
email may also be legally privileged. If you have received this email in
error, please notify us immediately by reply email or by fax and then
delete it. Any use, distribution or reproduction of this message is
strictly prohibited. The integrity or authenticity of this message cannot
be guaranteed. We therefore shall not be liable for the message if
altered, changed or falsified. Thank you.
Cet email et ses pièces jointes sont strictement confidentiels et destinés
uniquement à l'usage du (des) destinataire(s) sus-indiqué(s). Les
informations contenues dans cet email sont légalement protégées. Si vous
avez reçu cet email par erreur, merci de nous le retourner immédiatement
par courrier électronique ou télécopie avant de le supprimer. Toute
utilisation ou reproduction de cet email est strictement interdite. La
véracité et l'authenticité de cet email et de son contenu ne peuvent être
garanties et nous ne pouvons être tenus responsables de leur altération,
modification ou falsification. Merci.
Other related posts: