Hi, I have Ion Torrent data from two strains of Pseudomonas (~6Mb genome). The datasets are similar in terms of numbers of reads, but the assembly seems to work better for the first strain than the second. Here are some of the stats (I have removed some lines for brevity): Strain 1 ---------- Num. reads assembled: 1339574 Coverage assessment (calculated from contigs >= 5000): Avg. total coverage: 19.56 IonTor: 19.58 Number of contigs: 742 Total consensus: 7099193 Largest contig: 92520 N50 contig size: 18758 N90 contig size: 5115 N95 contig size: 3074 Coverage assessment: Max coverage (total): 678 IonTor: 678 Quality assessment: Average consensus quality: 55 Consensus bases with IUPAC: 101 (you might want to check these) All contigs: Number of contigs: 1226 Total consensus: 7220071 Largest contig: 92520 N50 contig size: 18641 N90 contig size: 4332 N95 contig size: 2189 Max coverage (total): 727 IonTor: 818 Quality assessment: Average consensus quality: 54 Consensus bases with IUPAC: 554 (you might want to check these) Strain 2 ----------- Num. reads assembled: 1346193 Coverage assessment (calculated from contigs >= 5000): Avg. total coverage: 20.38 IonTor: 20.77 Length assessment: Number of contigs: 1205 Total consensus: 7063729 Largest contig: 57644 N50 contig size: 11611 N90 contig size: 2624 N95 contig size: 1576 Coverage assessment: Max coverage (total): 793 IonTor: 793 Quality assessment: Average consensus quality: 57 Consensus bases with IUPAC: 69 (you might want to check these) All contigs: Length assessment: Number of contigs: 1715 Total consensus: 7192230 Largest contig: 57644 N50 contig size: 10977 N90 contig size: 2241 N95 contig size: 1217 Coverage assessment: Max coverage (total): 793 IonTor: 793 Quality assessment: ------------------- Average consensus quality: 56 Consensus bases with IUPAC: 227 (you might want to check these) My definition of 'better' by the way is higher N50, size of contigs, number of contigs etc. Other things I noticed were that the first strain assembled more quickly and used less temp disk space than the second. My question is, is this just down to natural variation when sequencing two strains (although these two strains do look quite similar by BLAST etc), or is there something else in the data i should filter out before assembling? Note there may also be a plasmid, although I have not yet found any. This is MIRA V3.4.0 (production version). thanks for any help adam -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html