[mira_talk] Re: Assemble extremely similar amplicons
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Fri, 23 Sep 2016 07:46:07 -0400
On 22 Sep 2016, at 20:40 , Chris Hoefler <hoeflerb@xxxxxxxxx> wrote:
Although not really intended for what you're doing, you can try to see if
Mira's repeat masking can help you. Play around with HS:nrr and HS:nrc, set
HS:ldn=no. The latter turns off digital normalization. Normally for EST
assemblies you want this on so that Mira can make a decent shot at assembling
a repeat, but in your case you actually want the contig to break at the
repeat, so better to leave it off.
For the data Sven has this may lead to all sorts of interesting side-effects.
As his target sequence itself is extremely repetitive, many algorithms will
either take a long time or simply switch to “survival mode” (silently disregard
that piece of data).
The above may help, but it doesn't really solve your problem because you
actually want to assemble some of the repeats (internal to your PCR product),
so another approach could be to use SMALT to soft clip the repetitive
sequences. Mira knows how to use these soft clips during assembly. You can
play with the bait sequences a bit to see what works best, but maybe screen
out the ends of the PCR product so that contig building is forced to stop
there, but not in the middle. There is still a fairly substantial danger of
ending up with hybrid sequences, but if you are careful with the PE
constraints you can probably sort it out.
Clipping / masking is the way to go, I think. You basically want to treat the
primer sequence as if it were sequencing vector.
Method 1: hard / soft clip your data
Find a program with which you can clip your reads at the ends if it finds the
PCR primers. Chris mentioned SMALT, and MIRA supports it out of the box. Then
either hard clip the data (completely remove) or soft clip. The latter you can
do in several ways:
- use lower-case at read ends to be clipped, together with -CL:lccf:lccb
- mark the clipped ends (and really just the ends, not something amidst the
read) with the tag SVEC and MIRA will soft-clip these. You of course would need
to mark that in a file format which supports tags, EXP or MAF. I’d take MAF
(it’s similar to EXP for simple input of reads).
Method 2: the hack-y way.
MIRA has routines to clip Illumina adapters or filter for PhiX-174. While there
is no interface that exposes the sequence to users, one could add your primer
sequences to the source code and recompile a “special version.” It’s not even
difficult: add your sequences to
src/mira/adaptorsforclip.solexa.xxd
in forward and reverse direction and it should work. Although … maybe not.
There could be a problem at the border between adaptor and primer. Maybe you
should test that first.
Then again I *could* add yet another new clipping routine which uses the
adaptor-clip algorithm on user supplied data though. That would certainly work.
B.
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: