[mira_talk] Re: mid-tags

From: Ross Whetten <ross_whetten@xxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Wed, 18 Feb 2009 09:28:36 -0500

It should be possible to assemble tagged 454 reads using the existing tools.
More details following Bastien's last comment.

Bastien Chevreux wrote:

On Tuesday 17 February 2009 Oscar Franzén wrote:
I have a question about MIRA (great software by the way):
is it possible to assemble 454 data sequenced with MID-tags?
Hello Oscar,
yes and no. You must "remove" the MID tags from the input sequence as elsethey'd wreak havoc.
Assuming that in the following read
demo
tcag ttgccaggtaac ctcgattgagtactatctgacgagcgacgactgtctgcat
the "tcag" is the 'normal' remainder of the 454 adaptor (clipped away bysff_extract vie a corresponding left clip entry in the ancillary XML data) and"ttgccaggtaac" is one of your MID tags, you can:
1) physically remove the whole stretch (I do not recommend this), leading to
demo
ctcgattgagtactatctgacgagcgacgactgtctgcat

2) mask the MID tag (and perhaps also the remainder of the adaptor) and use -
CL:mbc
demo
xxxx xxxxxxxxxxxx ctcgattgagtactatctgacgagcgacgactgtctgcat
3) (prefered) keep the whole sequence as is and use a script that sets correctvalues in the XML file with ancillary data.
The problem with all three possibilities above: even though a number of peoplehave inquired previously by mail regarding this topic, I yet haven't got backany script that performs this kind of data mangling[*]. Feel free to be thefirst :-)
Regards,
  Bastien
[*] I would assume that this belongs to "normal" data processing that theRoche software should perform, but until now this is not part of theirsoftware pipeline.

The program SSAHA (http://www.sanger.ac.uk/Software/analysis/SSAHA/) canscreen your 454 reads for the presence of MID tag sequences, and producean output file that can be read by MIRA using the *-CL:msvs* switch.These tools are intended for use in screening Sanger reads for thepresence of sequencing vector, but with the appropriate parametersettings in SSAHA, it can find short sequences such as the MID tags aswell. The risk is that any coincidental matches of the same sequenceelsewhere in the read will be marked as "vector" and not used forassembly, but for 10-bp MID tags thefrequency of such an artifact should be low. Key parameter settings forSSAHA are -pf -da 0 (required for MIRA to read the output file), and the-wl , -mg , -mi , and -sl parameters. For 10-bp MID tags, you could try-wl 10 -mg 1 - mi 1 -sl 1.To avoid assembling reads with different MID tags into the same contigs,you would use the bin_fasta_on_mid_primers.pl Perl script from the3rdparty script package to sort the 454 reads into different files basedon the presence of the MID tags, and assemble each set of readsindependently. This would requirerunning SSAHA on each of the input files, so that each input file hasits own "vector clipping" information specific to the appropriate MID tag.


Regards,
Ross

--
Ross Whetten
Associate Professor
Dept of Forestry & Environmental Resources
NC State University Raleigh NC 27695-8008



--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

References:
- [mira_talk] mid-tags
  - From: Oscar Franzén
- [mira_talk] Re: mid-tags
  - From: Bastien Chevreux

[mira_talk] Re: mid-tags

Other related posts: