[mira_talk] Re: poor quality assembly results

  • From: Jens Christian Froslev Nielsen <jens.c.nielsen@xxxxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Sun, 3 May 2015 17:53:14 +0000

Thanks for the answer.

But how do you assess “how accurate is the assembly” in the best way? I know
that N50 doesn’t tell you everything (I use an Nx plot to compare assemblies),
but in this case when the difference is so big, I guess it still tells me that
the ABySS assembler performs better. How do I check if the ABySS assembly is
full of errors as you suggest?

In the end couldn’t it be that for this particular genome MIRA is a poor
assembler, while other genomes might work fine with MIRA?

Best,

Jens

________________________________
From: mira_talk-bounce@xxxxxxxxxxxxx [mira_talk-bounce@xxxxxxxxxxxxx] on behalf
of Robert Bruccoleri [bruc@xxxxxxxxxxxxxxxxxxxxx]
Sent: Sunday, May 03, 2015 6:38 PM
To: mira_talk@xxxxxxxxxxxxx
Subject: [mira_talk] Re: poor quality assembly results

The key question that you have to ask about any assembler is 'how accurate is
the assembly?' The N50 value doesn't tell you that.

If your genome has lots of repeats, the Mira N50 might reflect the reality of
the genome, whereas the AByss assembly might be filled with errors.

For bacteria and other small genomes, the best genome sequencing technology
today is PacBio. If you want an accurate assessment of your assembly, get it
done using PacBio with enough coverage to correct the read errors (around 100x).

However, PacBio is more expensive and more difficult than Illumina sequencing,
and for your application, Illumina with Mira might be acceptable. Please use
sequencing accuracy to judge.

Best regards,
Bob


On 04/29/2015 07:46 AM, Rick Westerman wrote:
I wouldn’t say that ABySS is a better assembler but it does handle bigger
projects. I use ABySS, Mira and Spades as appropriate.

--
Rick Westerman
westerman@xxxxxxxxxx<mailto:westerman@xxxxxxxxxx>




On Apr 29, 2015, at 2:59 AM, Jens Christian Froslev Nielsen
<jens.c.nielsen@xxxxxxxxxxx<mailto:jens.c.nielsen@xxxxxxxxxxx>> wrote:

I have a genome sequenced to a coverage of 137x, with 2 G PE 125 bp reads
(illumina HiSeq 2500).

I tried running MIRA twice with subsets of the reads:
Firstly: with a subset of 15 M PE reads, where MIRA stopped and complained
about too high coverage.
Secondly: with a subset of 10 M PE reads. This successfully finished but gave a
horrible assembly compared to de novo assembly using abyss:

MIRA N50: 1805 bp
ABySS N50: 780383 bp

Is abyss just a better assembler for my genome or am I doing something wrong?

These are my mira specifications

$ mira -t 16 manifest.conf

$ cat Manifest.conf

project = P12
job = genome,denovo,accurate
parameters = COMMON_SETTINGS -GENERAL:number_of_threads=16 -NW:cmrnl=no

readgroup = Penicillium_data
data = path/to/P12_R1_10M.fastq.gz path/to/P12_R2_10M.fastq.gz
technology = solexa
template_size = 500 700 autorefine
segment_placement = ---> <---
segment_naming = solexa

Best,

Jens


Other related posts: