[mira_talk] Re: reducing spurious nearly identical contigs
- From: Jeremy Volkening <jdv@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Wed, 4 Oct 2017 13:58:00 -0500
On Wed, Oct 04, 2017 at 01:33:27PM -0500, Jeremy Volkening wrote:
I do many mid-size viral assemblies from non-clonal populations and
often end up with many "spurious" contigs in my assemblies. By this I
mean small contigs that overlap entirely with larger contigs and differ
by often <1%, sometimes with only one or two mismatches in the entire
alignment. These mismatches usually occur in regions of the smaller
contig covered by only a single or few reads, often at the ends, and so
seem like sequencing errors rather than true SNVs. As it is now, I
often get a dozen or more of these even after contig depth and length
filtering, and just end up merging them during Gap5 manual finishing.
Another angle to approach this: is there a way to force mira to trim off
the tails of contigs consisting of a single read prior to the final
round of assembly? I though the various clipping options would do this
but they don't always seem to. If I could reliably avoid long
single-read contig tails I believe it would mitigate much of the issue
described above.
Thanks,
Jeremy
--
Wagner's music is better than it sounds.
-- Mark Twain
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: