[mira_talk] Making the best assembly

  • From: C Jenkins <cej.jenkins@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 6 Dec 2015 20:00:05 -0800

I have a largely undescribed species of a trematode parasite. It is similar
in life cycle to Schistosoma mansoni.

I have 454 and illumina single end reads from 4 different populations. I
need to first create a reference transcriptome.

The illumina data is... rough. I first assembled it using Trinity, and
found only 531 contigs... which is orders of magnitude less than I
expected.

So I used MIRA to do a 454 assembly, a illumina assembly and a hybrid
assembly. Now I'm trying to figure out which is any good.

MIRA Assembly Statistics

Platform

Reads

# Contigs

Max

Coverage

Average Quality

454

33779

1938

43505

45

Illumina

72276

3033

34767

43

Hybrid

454 and Illumina

98259

6726

77610

55


Obviously there are the most contigs in the hybrid assembly, but the
percentage of reads from each population that map to the reference is
significantly lower (~55% of reads from each population map to the
reference).

How could I improve this? I'm drowning a bit in the literature and any/all
help is welcome.

Thanks!

CJ
--
CJenkins, MS
PhD Candidate
Washington State University/University of Idaho

Other related posts: