[mira_talk] Singlets after hybrid assembly

  • From: Andrew Gracey <mirabilis@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sat, 5 Sep 2009 22:18:49 -0700

 Hi Folks/Bastien,

I'm an absolute beginning user and my question is about singlets or the lack
thereof in the results of a recent assembly.  I have ~170K 5' and 3' Sanger
reads and ~1 million FLX 454 reads from a eukaryote transcriptome.  First of
all I assembled just the Sanger reads to verify that MIRA was working and I
got a nice assembly of 28,389 contigs and singlets.  Then I tried a hybrid
assembly which ran for >48hr on a 64-bit 24Gb Linux box.  But the assembly
yielded just 5,495 contigs (147,039 reads were in the contigs) and I can't
find my singlets?  The manual mentions the following option:

*savesimplesingletsinproject(sssip)=**on|yes|1,off|no|0* Default is *no*.
Controls whether 'unimportant' singlets are written to the result files.

What's the definition of 'unimportant' and can this explain why I didn't get
any singlets written in the output files?

These are the call parameters that I used
./mira -project=Bot -job=denovo,est,normal,sanger,454

I have fasta and qual files for the sanger data, and the 454 data were
extracted using the sff_extract script.

Below I've pasted what was in my assembly info file.

Thanks in advance for any insights you can offer.

Andrew



Localtime: Sat Sep  5 07:28:45 2009

Assembly information:
=====================

Num. reads assembled: 1079084
Num. singlets: 2340

Large contigs:
--------------
With    Contig size        >= 500
    AND (Total avg. Cov    >= 3
         OR Cov(san)    >= 0
         OR Cov(454)    >= 3
         OR Cov(sxa)    >= 0
         OR Cov(sid)    >= 0
        )

  Length assessment:
  ------------------
  Number of contigs:    30951
  Total consensus:    37182812
  Largest contig:    9906
  N50 contig size:    1352
  N90 contig size:    685
  N95 contig size:    620

  Coverage assessment:
  --------------------
  Max coverage (total):    4780
  Max coverage
    Sanger:    367
    454:    6169
    Solexa:    0
    Solid:    0
  Avg. total coverage (size >= 5000): 10.29
  Avg. coverage (contig size >= 5000)
    Sanger:    0.49
    454:    8.89
    Solexa:    0.00
    Solid:    0.00

  Quality assessment:
  -------------------
  Average consensus quality:            70
  Consensus bases with IUPAC (IUPc):        96774    (you might want to
check these)
  Strong unresolved repeat positions (SRMc):    186    (you might want to
check these)
  Weak unresolved repeat positions (WRMc):    208    (you might want to
check these)
  Sequencing Type Mismatch Unsolved (STMU):    0    (excellent)
  Contigs having only reads wo qual:        0    (excellent)
  Contigs with reads wo qual values:        0    (excellent)


All contigs:
------------
  Length assessment:
  ------------------
  Number of contigs:    75734
  Total consensus:    67118803
  Largest contig:    9906
  N50 contig size:    961
  N90 contig size:    507
  N95 contig size:    450

  Coverage assessment:
  --------------------
  Max coverage (total):    4780
  Max coverage
    Sanger:    367
    454:    6169
    Solexa:    0
    Solid:    0
  Avg. total coverage (size >= 5000): 10.29
  Avg. coverage (contig size >= 5000)
    Sanger:    0.49
    454:    8.89
    Solexa:    0.00
    Solid:    0.00

  Quality assessment:
  -------------------
  Average consensus quality:            57
  Consensus bases with IUPAC (IUPc):        202037    (you might want to
check these)
  Strong unresolved repeat positions (SRMc):    186    (you might want to
check these)
  Weak unresolved repeat positions (WRMc):    212    (you might want to
check these)
  Sequencing Type Mismatch Unsolved (STMU):    0    (excellent)
  Contigs having only reads wo qual:        14    (you might want to check
these)
  Contigs with reads wo qual values:        3    (you might want to check
these)

Other related posts: