[mira_talk] Re: Assembling nanopore data only E. coli

From: Chris Hoefler <hoeflerb@xxxxxxxxx>
To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
Date: Sat, 8 Aug 2015 23:30:51 -0500

You will need to acquire the corrected reads they used to do that assembly,
which it doesn't look like they deposited with the raw reads. Pcbiohq might
work once you have some corrected reads.

On Aug 8, 2015, at 8:10 PM, Adrian Pelin <apelin20@xxxxxxxxx> wrote:

Hello,

I am trying to use the mira assembler to assemble the data presented here:
http://www.nature.com/nmeth/journal/v12/n8/full/nmeth.3444.html

Briefly, these are nanopore sequencing reads produced with a small device
called the MinION.

I have downloaded their 4 datasets, and converted all reads to fastq format.
I have remove all sequences above 29kb so as to pass read length requirements.

I have mapped these reads to the E. coli genome, and they are roughly a 22x
coverage (st. dev. 5x). This is achieved allowing a maximum of 30% mismatch.

I have tried to assemble this data with MIRA, but the assembler produced 0
contigs, mainly because contigs do not pass minimum requirements.

i.e.
Contig does not meet requirement of minimum reads per contig.
Moved 2 reads to debris.
Timing BFC unused: 17
CUnused: 3
TUnused: 3
AS_used_ids.size(): 22245
bfc 1
Localtime: Sat Aug 8 21:03:59 2015

At the end I get:
IRA warncode: CONCOV_SUSPICIOUS_DISTRIBUTION
Title: Suspicious distribution of contig coverages

- 0 contig(s) with a total of 0 bases (= -nan% of bases in all non-repetitive
large contigs) have an average coverage less than 75% of the average
coverage
of all non-repetitive large contigs.
- 0 contig(s) with a total of 0 bases (= -nan% of bases in all non-repetitive
contigs) have an average coverage more than 125% of the average coverage of
all non-repetitive large contigs.
- 0 contig(s) with a total of 0 bases (= -nan% of bases in all non-repetitive
contigs) have an average coverage 25% above or below the average coverage of
all non-repetitive large contigs.
Summary: found 3 indicator(s) for coverage problem(s).

This is my manifest:
project = Ecoli_mira_nanopore_2D
job = genome,denovo,accurate
parameters = -GE:not=8:kpmf=15:
#parameters = -MI:somrnl=0
parameters = -NW:cmrnl=no
parameters = -NW:cac=warn
parameters = -NW:cnfs=warn
parameters = PCBIOHQ_SETTINGS -CL:pec=yes

# The second part defines the sequencing data MIRA should load and assemble
# The data is logically divided into "readgroups": this reflects the
# ... that read sequences ...

readgroup = Ecoli_nanopore_2D
data = Ecoli_nanopore_2D_29kb.fastq
technology = pcbiohq

Please let me know if there is anything I can do to get an assembly with
nanopore reads alone.

Thanks,
Adrian

Follow-Ups:
- [mira_talk] Re: Assembling nanopore data only E. coli
  - From: Adrian Pelin

References:
- [mira_talk] Assembling nanopore data only E. coli
  - From: Adrian Pelin

[mira_talk] Re: Assembling nanopore data only E. coli

Other related posts: