[mira_talk] Re: reducing spurious nearly identical contigs

  • From: Jeremy Volkening <jdv@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 4 Oct 2017 13:58:00 -0500

On Wed, Oct 04, 2017 at 01:33:27PM -0500, Jeremy Volkening wrote:

I do many mid-size viral assemblies from non-clonal populations and often end up with many "spurious" contigs in my assemblies. By this I mean small contigs that overlap entirely with larger contigs and differ by often <1%, sometimes with only one or two mismatches in the entire alignment. These mismatches usually occur in regions of the smaller contig covered by only a single or few reads, often at the ends, and so seem like sequencing errors rather than true SNVs. As it is now, I often get a dozen or more of these even after contig depth and length filtering, and just end up merging them during Gap5 manual finishing.

Another angle to approach this: is there a way to force mira to trim off the tails of contigs consisting of a single read prior to the final round of assembly? I though the various clipping options would do this but they don't always seem to. If I could reliably avoid long single-read contig tails I believe it would mitigate much of the issue described above.

Thanks,
Jeremy

--
Wagner's music is better than it sounds.
                -- Mark Twain

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: