On 22 May 2014, at 21:41 , Bayles, Darrell <Darrell.Bayles@xxxxxxxxxxxx> wrote: > The bacterium does have a large number of large repetitive elements, and yes > most are transposons. We have considered PacBio and that would be the > simplest way to work through the big repeats; however, I’d still like to get > some clarification regarding the questions of performance differences between > MIRA v. 3.9.15 and MIRA v. 4.0.2, and clarification about the questions > resulting from your comments about short reads. While I didn’t expect there > to be a big improvement in assembling with v. 4.0.2, I certainly didn’t > expect a substantial decline in the goodness of assembly either. There are a couple of things to consider when trying to explain the differences you see. Remember, I do not know the data set, I’m just guessing. 1. No guess here: 3.9.15 belongs to those development version where more misassemblies happened than in later versions as people had given me a lot of tough data to optimise MIRA for. 2. the “default” settings have changed all along the 3.9.x development as I adjusted MIRA to more current data sets. And this means that some heuristics now are (much) better adapted to seeing “short” reads in the 100+ bp range and will utterly fail for smaller reads. The “why” I did not bother to investigate, 100bp Illuminas have been here since 2010 or so and why should I spend time optimising for data sets no one is generating since 3+ years or so? Especially the failing heuristics probably leads to the very long assembly times you’re seeing: it’s spending way more time in Smith-Waterman alignments than I’d expect. Which means that too many hits from the SKIM phase are not pruned out, which in turn leads to sub-optimal overlap graphs and that leads to … well, in the end, worse assemblies with shorter reads when you’re using the 4.x series of MIRA. Maybe I should add another “Nag and Warn” flag which stops the assembly if it detects a readgroup with reads distinctly smaller than, say, 80bp. B. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html