[mira_talk] Re: mid-tags

  • From: Gregory Harhay <gregory.harhay@xxxxxxxxxxxx>
  • To: <mira_talk@xxxxxxxxxxxxx>
  • Date: Wed, 18 Feb 2009 13:35:16 -0600

Hi fellow Mira users:

I've sent Bastien a revamped bin_fasta_on_mid_primers.pl. The script bins
the reads according to their mid primers and removes them as in in case 1
below. My rational for doing this is that these mid primers are not
biologically entities and I don't want the assembler to use them or the
space they occupy in "assembly space". I assemble with -LR:mxti=no so that
mira doesn't use the XML file.

The script depends critically on the  fasta file of mid primers. If the
users sequence always has begins with tcag, the this sequence must be tacked
onto the beginning of every sequence in the mid primer fasta file. I think
this depends on the particular 454 machine's configuration and the way
sff_extract is invoked. I'm not sure though. If you have problems with this
script, just send me an email.

FYI: As the TI chemistry doesn't support MID primers yet, we tried to run
old chemistry with mid primers on TI. Roche acknowledges a glitch in their
new TI cluster software using this approach as the B fusion primer appears
on the 3' end of most reads.

One of these days, when I've got a chance I'll generate the XML that
describes the clipping as you suggest as in case 3. I'm in no rush though as
case 1 works pretty well for us.

Greg



On 2/18/09 7:59 AM, "Bastien Chevreux" <bach@xxxxxxxxxxxx> wrote:

> On Tuesday 17 February 2009 Oscar Franzén wrote:
>> I have a question about MIRA (great software by the way):
>> is it possible to assemble 454 data sequenced with MID-tags?
> 
> Hello Oscar,
> 
> yes and no. You must "remove" the MID tags from the input sequence as else
> they'd wreak havoc.
> 
> Assuming that in the following read
> 
>> demo
> tcag ttgccaggtaac ctcgattgagtactatctgacgagcgacgactgtctgcat
> 
> the "tcag" is the 'normal' remainder of the 454 adaptor (clipped away by
> sff_extract vie a corresponding left clip entry in the ancillary XML data) and
> "ttgccaggtaac" is one of your MID tags, you can:
> 
> 1) physically remove the whole stretch (I do not recommend this), leading to
>> demo
> ctcgattgagtactatctgacgagcgacgactgtctgcat
> 
> 2) mask the MID tag (and perhaps also the remainder of the adaptor) and use -
> CL:mbc
>> demo
> xxxx xxxxxxxxxxxx ctcgattgagtactatctgacgagcgacgactgtctgcat
> 
> 3) (prefered) keep the whole sequence as is and use a script that sets correct
> values in the XML file with ancillary data.
> 
> The problem with all three possibilities above: even though a number of people
> have inquired previously by mail regarding this topic, I yet haven't got back
> any script that performs this kind of data mangling[*]. Feel free to be the
> first :-)
> 
> Regards,
>   Bastien
> 
> [*] I would assume that this belongs to "normal" data processing that the
> Roche software should perform, but until now this is not part of their
> software pipeline.
> 

Gregory P. Harhay, PhD
Computational Biologist
Animal Health Research Unit
USDA-ARS-Roman L. Hruska  U.S. Meat Animal Research Center
Clay Center, NE 68933
v - 402.762.4250




--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: