[mira_talk] tweaking Manifest for polyploid genome

  • From: "Gutierrez, Juan" <Juan.Gutierrez@xxxxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Sat, 22 Feb 2014 19:55:38 +0000

Hi,

I am trying to do RNA-seq de novo on a polyploid (hexaploid) genome using a 
combination of 454 and illumina 100bp paired ended reads. The three copies of 
each gene are highly similar to one  another. I am having trouble in separating 
each one of the three copies into three different fully-length assemblies. Most 
of the times I just get a fragment of each of the three copies. I am guessing 
that when Mira finds a difference between highly similar transcripts, it just 
can't assess if there is a polymorphism between 2 of the copies or a sequencing 
error. In any case, Mira seems to end the assembly way before reaching the end 
of the transcript.

I have prepared and run 2 different Manifests. I am getting better results with 
Manifest.conf (less number of contigs but longer) than with Manifest2.conf 
(higher number but shorter contigs), so I am supposing that I could eventually 
separate the 3 copies of each gene by fine-adjusting the parameters.

Any suggestion would be greatly appreciated,
Thanks so much!
Juan




This electronic message contains information generated by the USDA solely for 
the intended recipients. Any unauthorized interception of this message or the 
use or disclosure of the information it contains may violate the law and 
subject the violator to civil or criminal penalties. If you believe you have 
received this message in error, please notify the sender and delete the email 
immediately.
Localtime: Tue Jan 21 03:11:57 2014

Assembly information:
=====================

Localtime: Tue Jan 21 03:11:57 2014
MIRA version: 4.0rc4 

Num. reads assembled: 25787099
Num. singlets: 0


Coverage assessment (calculated from contigs >= 1000 with coverage >= 12):
=========================================================
  Avg. total coverage: 30.84
  Avg. coverage per sequencing technology
        Sanger: 0.00
        454:    0.78
        IonTor: 0.00
        PcBioHQ:        0.00
        PcBioLQ:        0.00
        Text:   0.00
        Solexa: 28.34
        Solid:  0.00


Large contigs (makes less sense for EST assemblies):
====================================================
With    Contig size             >= 500
        AND (Total avg. Cov     >= 10
             OR Cov(san)        >= 0
             OR Cov(454)        >= 0
             OR Cov(ion)        >= 0
             OR Cov(pbh)        >= 0
             OR Cov(pbl)        >= 0
             OR Cov(txt)        >= 0
             OR Cov(sxa)        >= 9
             OR Cov(sid)        >= 0
            )

  Length assessment:
  ------------------
  Number of contigs:    67143
  Total consensus:      71552356
  Largest contig:       9612
  N50 contig size:      1161
  N90 contig size:      616
  N95 contig size:      559

  Coverage assessment:
  --------------------
  Max coverage (total): 184917
  Max coverage per sequencing technology
        Sanger: 0
        454:    180830
        IonTor: 0
        PcBioHQ:        0
        PcBioLQ:        0
        Text:   0
        Solexa: 61823
        Solid:  0

  Quality assessment:
  -------------------
  Average consensus quality:                    81
  Consensus bases with IUPAC:                   49550   (you might want to 
check these)
  Strong unresolved repeat positions (SRMc):    0       (excellent)
  Weak unresolved repeat positions (WRMc):      322     (you might want to 
check these)
  Sequencing Type Mismatch Unsolved (STMU):     0       (excellent)
  Contigs having only reads wo qual:            0       (excellent)
  Contigs with reads wo qual values:            0       (excellent)


All contigs:
============
  Length assessment:
  ------------------
  Number of contigs:    480338
  Total consensus:      175708905
  Largest contig:       9612
  N50 contig size:      545
  N90 contig size:      165
  N95 contig size:      133

  Coverage assessment:
  --------------------
  Max coverage (total): 184917
  Max coverage per sequencing technology
        Sanger: 0
        454:    180830
        IonTor: 0
        PcBioHQ:        0
        PcBioLQ:        0
        Text:   0
        Solexa: 61823
        Solid:  0

  Quality assessment:
  -------------------
  Average consensus quality:                    66
  Consensus bases with IUPAC:                   138517  (you might want to 
check these)
  Strong unresolved repeat positions (SRMc):    0       (excellent)
  Weak unresolved repeat positions (WRMc):      380     (you might want to 
check these)
  Sequencing Type Mismatch Unsolved (STMU):     0       (excellent)
  Contigs having only reads wo qual:            0       (excellent)
  Contigs with reads wo qual values:            0       (excellent)

Localtime: Fri Feb 21 03:18:07 2014

Assembly information:
=====================

Localtime: Fri Feb 21 03:18:07 2014
MIRA version: 4.0 

Num. reads assembled: 12655197
Num. singlets: 0


Coverage assessment (calculated from contigs >= 1000 with coverage >= 12):
=========================================================
  Avg. total coverage: 41.27
  Avg. coverage per sequencing technology
        Sanger: 0.00
        454:    2.49
        IonTor: 0.00
        PcBioHQ:        0.00
        PcBioLQ:        0.00
        Text:   0.00
        Solexa: 36.21
        Solid:  0.00


Large contigs (makes less sense for EST assemblies):
====================================================
With    Contig size             >= 200
        AND (Total avg. Cov     >= 21
             OR Cov(san)        >= 0
             OR Cov(454)        >= 1
             OR Cov(ion)        >= 0
             OR Cov(pbh)        >= 0
             OR Cov(pbl)        >= 0
             OR Cov(txt)        >= 0
             OR Cov(sxa)        >= 18
             OR Cov(sid)        >= 0
            )

  Length assessment:
  ------------------
  Number of contigs:    67511
  Total consensus:      39791618
  Largest contig:       5919
  N50 contig size:      609
  N90 contig size:      399
  N95 contig size:      314

  Coverage assessment:
  --------------------
  Max coverage (total): 3910
  Max coverage per sequencing technology
        Sanger: 0
        454:    3726
        IonTor: 0
        PcBioHQ:        0
        PcBioLQ:        0
        Text:   0
        Solexa: 2921
        Solid:  0

  Quality assessment:
  -------------------
  Average consensus quality:                    55
  Consensus bases with IUPAC:                   9244    (you might want to 
check these)
  Strong unresolved repeat positions (SRMc):    0       (excellent)
  Weak unresolved repeat positions (WRMc):      5       (you might want to 
check these)
  Sequencing Type Mismatch Unsolved (STMU):     0       (excellent)
  Contigs having only reads wo qual:            0       (excellent)
  Contigs with reads wo qual values:            0       (excellent)


All contigs:
============
  Length assessment:
  ------------------
  Number of contigs:    826277
  Total consensus:      194908914
  Largest contig:       5919
  N50 contig size:      234
  N90 contig size:      150
  N95 contig size:      135

  Coverage assessment:
  --------------------
  Max coverage (total): 3910
  Max coverage per sequencing technology
        Sanger: 0
        454:    3726
        IonTor: 0
        PcBioHQ:        0
        PcBioLQ:        0
        Text:   0
        Solexa: 2921
        Solid:  0

  Quality assessment:
  -------------------
  Average consensus quality:                    58
  Consensus bases with IUPAC:                   13244   (you might want to 
check these)
  Strong unresolved repeat positions (SRMc):    0       (excellent)
  Weak unresolved repeat positions (WRMc):      5       (you might want to 
check these)
  Sequencing Type Mismatch Unsolved (STMU):     0       (excellent)
  Contigs having only reads wo qual:            0       (excellent)
  Contigs with reads wo qual values:            0       (excellent)

Attachment: Manifest2.conf
Description: Manifest2.conf

Attachment: Manifest.conf
Description: Manifest.conf

Other related posts: