[mira_talk] one strain assembles better than a second similar strain?

  • From: Adam Witney <awitney@xxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 20 Oct 2011 10:10:29 +0100

Hi,

I have Ion Torrent data from two strains of Pseudomonas (~6Mb genome). The 
datasets are similar in terms of numbers of reads, but the assembly seems to 
work better for the first strain than the second. Here are some of the stats (I 
have removed some lines for brevity):

Strain 1
----------

Num. reads assembled: 1339574

Coverage assessment (calculated from contigs >= 5000):
  Avg. total coverage: 19.56
        IonTor: 19.58

  Number of contigs:    742
  Total consensus:      7099193
  Largest contig:       92520
  N50 contig size:      18758
  N90 contig size:      5115
  N95 contig size:      3074

  Coverage assessment:
  Max coverage (total): 678
        IonTor: 678

  Quality assessment:
  Average consensus quality:                    55
  Consensus bases with IUPAC:                   101     (you might want to 
check these)

All contigs:
  Number of contigs:    1226
  Total consensus:      7220071
  Largest contig:       92520
  N50 contig size:      18641
  N90 contig size:      4332
  N95 contig size:      2189

  Max coverage (total): 727
        IonTor: 818

  Quality assessment:
  Average consensus quality:                    54
  Consensus bases with IUPAC:                   554     (you might want to 
check these)

Strain 2
-----------

Num. reads assembled: 1346193

Coverage assessment (calculated from contigs >= 5000):
  Avg. total coverage: 20.38
        IonTor: 20.77

  Length assessment:
  Number of contigs:    1205
  Total consensus:      7063729
  Largest contig:       57644
  N50 contig size:      11611
  N90 contig size:      2624
  N95 contig size:      1576

  Coverage assessment:
  Max coverage (total): 793
        IonTor: 793

  Quality assessment:
  Average consensus quality:                    57
  Consensus bases with IUPAC:                   69      (you might want to 
check these)

All contigs:
  Length assessment:
  Number of contigs:    1715
  Total consensus:      7192230
  Largest contig:       57644
  N50 contig size:      10977
  N90 contig size:      2241
  N95 contig size:      1217

  Coverage assessment:
  Max coverage (total): 793
        IonTor: 793

  Quality assessment:
  -------------------
  Average consensus quality:                    56
  Consensus bases with IUPAC:                   227     (you might want to 
check these)

My definition of 'better' by the way is higher N50, size of contigs, number of 
contigs etc. Other things I noticed were that the first strain assembled more 
quickly and used less temp disk space than the second.

My question is, is this just down to natural variation when sequencing two 
strains (although these two strains do look quite similar by BLAST etc), or is 
there something else in the data i should filter out before assembling? Note 
there may also be a plasmid, although I have not yet found any.

This is MIRA V3.4.0 (production version).

thanks for any help

adam




--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: