Mira is a great assembler and the right tool for many types jobs. But,
these larger genomes Mira may not scale. How much data do you have ? Read
lengths etc. Insert sizes etc. You can email me directly how close are u to
the tricites? We have partnerships with wsu and UI.
My direct email is raw937@xxxxxxxxx.
Let me know CJ? Have you looked into the other omes eg DNA, proteins,
lipids etc.
Cheers
Rick
On Dec 6, 2015 8:19 PM, "C Jenkins" <cej.jenkins@xxxxxxxxx> wrote:
That's one of the problems, this is the first data we have for this
parasite. We don't really know the average genome size, the chromosome
number or how many contigs we really should have. I said "significantly
fewer contigs than expected" because we compare it to Schistosoma, which
has 13,000 contigs. So 530.... is significantly less than that.
I have cDNA library from an mRNA isolation, so only transcribed sequences.
I do have access to a large terabyte single node.
I was under the impression that MIRA would be good at hybrid assemblies?
On Sun, Dec 6, 2015 at 8:04 PM, Rick White <raw937@xxxxxxxxx> wrote:
What type if data do you have? RNA or DNA? Avg genome size? Chromosome
number? I am close to you if you need help. I am in PNNL in central
washington. Do you have access to a large terabyte single node? You can try
minimus2 to merge assemblies.
Cheers
Rick
On Dec 6, 2015 8:00 PM, "C Jenkins" <cej.jenkins@xxxxxxxxx> wrote:
I have a largely undescribed species of a trematode parasite. It is
similar in life cycle to Schistosoma mansoni.
I have 454 and illumina single end reads from 4 different populations. I
need to first create a reference transcriptome.
The illumina data is... rough. I first assembled it using Trinity, and
found only 531 contigs... which is orders of magnitude less than I
expected.
So I used MIRA to do a 454 assembly, a illumina assembly and a hybrid
assembly. Now I'm trying to figure out which is any good.
MIRA Assembly Statistics
Platform
Reads
# Contigs
Max
Coverage
Average Quality
454
33779
1938
43505
45
Illumina
72276
3033
34767
43
Hybrid
454 and Illumina
98259
6726
77610
55
Obviously there are the most contigs in the hybrid assembly, but the
percentage of reads from each population that map to the reference is
significantly lower (~55% of reads from each population map to the
reference).
How could I improve this? I'm drowning a bit in the literature and
any/all help is welcome.
Thanks!
CJ
--
CJenkins, MS
PhD Candidate
Washington State University/University of Idaho
--
CJenkins, MS
PhD Candidate
Washington State University/University of Idaho