[mira_talk] Reference assembly issues...

From: Shankar Manoharan <shankarmanostar@xxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Mon, 19 Mar 2012 18:42:21 +0530
*Hi all...
     I made a reference assembly of my 454-bacterial data with a closely
related strain as the backbone. I got a single contig roughly the size of
the reference genome. I also made a de-novo assembly of the same data to
get around 60 good quality contigs with sufficient length and coverage.
There are a few questions I have about these two...Any help would be
greatly appreciated :)

De-novo assembly
All contigs:
============
  Length assessment:
  ------------------
  Number of contigs:    72
  Total consensus:    4510449
  Largest contig:    691407
  N50 contig size:    150832
  N90 contig size:    50418
  N95 contig size:    27370

  Coverage assessment:
  --------------------
  Max coverage (total):    215
  Max coverage per sequencing technology
    Sanger:    0
    454:    262
    IonTor:    0
    PacBio:    0
    Solexa:    0
    Solid:    0

  Quality assessment:
  -------------------
  Average consensus quality:            85
  Consensus bases with IUPAC:            4    (you might want to check
these)
  Strong unresolved repeat positions (SRMc):    0    (excellent)
  Weak unresolved repeat positions (WRMc):    0    (excellent)
  Sequencing Type Mismatch Unsolved (STMU):    0    (excellent)
  Contigs having only reads wo qual:        0    (excellent)
  Contigs with reads wo qual values:        0    (excellent)

Reference assembly
All contigs:
============
  Length assessment:
  ------------------
  Number of contigs:    1
  Total consensus:    4909964
  Largest contig:    4909964
  N50 contig size:    4909964
  N90 contig size:    4909964
  N95 contig size:    4909964

  Coverage assessment:
  --------------------
  Max coverage (total):    272
  Max coverage per sequencing technology
    Sanger:    3
    454:    269
    IonTor:    0
    PacBio:    0
    Solexa:    0
    Solid:    0

  Quality assessment:
  -------------------
  Average consensus quality:            79
  Consensus bases with IUPAC:            10440    (you might want to check
these)
  Strong unresolved repeat positions (SRMc):    1592    (you might want to
check these)
  Weak unresolved repeat positions (WRMc):    0    (excellent)
  Sequencing Type Mismatch Unsolved (STMU):    0    (excellent)
  Contigs having only reads wo qual:        0    (excellent)
  Contigs with reads wo qual values:        1    (you might want to check
these)

*

   1. *When visualizing the reference assembly with Tablet, I see that
   there are regions where there aren't really any reads spanning the region
   except the template. How is this acceptable ? It appears as though MIRA
   replaces the assembly with the template sequence which may or may not be
   present in the sequenced genome. So how far can this assembly be trusted ?
   *
   2. *Secondly, wasn't the reference assembly feature of MIRA developed to
   identify SNPs and other genomic changes in pre-sequenced genomes ? So, is
   it technically right to assemble based on closely related organisms ?*
   3. *Third, If I were to accept the reference assembly that MIRA has
   putput, what kind of validation tests are essential before annotation ?*

Many thanks in advance.

Cheers :)
*
*
*Shankar Manoharan
Graduate Student
Department of Genetics
Madurai Kamaraj University*
*Ph. +919790167534*
*
*
*I strongly believe in doing my best and leaving the rest to God*
*
*
Follow-Ups:
- [mira_talk] Re: Reference assembly issues...
  - From: Bastien Chevreux
[mira_talk] Reference assembly issues...

Other related posts: