On Tuesday 19 April 2011 18:14:47 Egon Ozer wrote: > I'd be happy to provide my data to you for testing. Do you want the sff > files or my extracted fasta, qual, and xml files for the 454 data? Hello Egon, your data set made MIRA (and me) sweat, actually, quite a lot. It's not that much that version 3.2.1 crashed on it, but that my newer development version, while not crashing, performed ... really not good: way too many contigs for my liking. I've been busy the week-end over to understand what happened that MIRA absolutely did not like that data set and found the reason: it looks like that this paired-end FLX data contains a lot more false duplicates than I have ever seen up to now. These false duplicates contain, I think, PCR artefacts ... and these "sequencing errors" let MIRA believe that there are repeats and/or ploidy differences. I had to develop a couple of new algorithms to deal with these kind of things. Not everything I thought of has been implemented yes, but already I think the improvements are good enough to test. E.g., here are the results of 3.2.1.15: Number of contigs: 116 Largest contig: 893586 N50 contig size: 172613 N90 contig size: 34046 N95 contig size: 21118 and here for my current development version: Number of contigs: 75 Largest contig: 901116 N50 contig size: 397873 N90 contig size: 108334 N95 contig size: 52586 Almost halved the number of contigs and N50 doubled. Taking then a hybrid assembly with your 454 and Solexa data, I get this: Number of contigs: 55 Largest contig: 894849 N50 contig size: 588120 N90 contig size: 139889 N95 contig size: 62263 The number of contigs was more than halved and the N50/90/95 numbers trippled. The next release on SourceForge will contain those enhancements (but can take a week or two). Contact me if you want to test the current head of the development tree before that :-) B.