[mira_talk] Assembling nanopore data only E. coli

  • From: Adrian Pelin <apelin20@xxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Sat, 8 Aug 2015 21:10:42 -0400

Hello,

I am trying to use the mira assembler to assemble the data presented here:
http://www.nature.com/nmeth/journal/v12/n8/full/nmeth.3444.html

Briefly, these are nanopore sequencing reads produced with a small device
called the MinION.

I have downloaded their 4 datasets, and converted all reads to fastq
format. I have remove all sequences above 29kb so as to pass read length
requirements.

I have mapped these reads to the E. coli genome, and they are roughly a 22x
coverage (st. dev. 5x). This is achieved allowing a maximum of 30% mismatch.

I have tried to assemble this data with MIRA, but the assembler produced 0
contigs, mainly because contigs do not pass minimum requirements.

i.e.
Contig does not meet requirement of minimum reads per contig.
Moved 2 reads to debris.
Timing BFC unused: 17
CUnused: 3
TUnused: 3
AS_used_ids.size(): 22245
bfc 1
Localtime: Sat Aug 8 21:03:59 2015

At the end I get:
IRA warncode: CONCOV_SUSPICIOUS_DISTRIBUTION
Title: Suspicious distribution of contig coverages

- 0 contig(s) with a total of 0 bases (= -nan% of bases in all
non-repetitive
large contigs) have an average coverage less than 75% of the average
coverage
of all non-repetitive large contigs.
- 0 contig(s) with a total of 0 bases (= -nan% of bases in all
non-repetitive
contigs) have an average coverage more than 125% of the average coverage
of
all non-repetitive large contigs.
- 0 contig(s) with a total of 0 bases (= -nan% of bases in all
non-repetitive
contigs) have an average coverage 25% above or below the average coverage
of
all non-repetitive large contigs.
Summary: found 3 indicator(s) for coverage problem(s).


This is my manifest:
project = Ecoli_mira_nanopore_2D
job = genome,denovo,accurate
parameters = -GE:not=8:kpmf=15:
#parameters = -MI:somrnl=0
parameters = -NW:cmrnl=no
parameters = -NW:cac=warn
parameters = -NW:cnfs=warn
parameters = PCBIOHQ_SETTINGS -CL:pec=yes

# The second part defines the sequencing data MIRA should load and assemble
# The data is logically divided into "readgroups": this reflects the
# ... that read sequences ...

readgroup = Ecoli_nanopore_2D
data = Ecoli_nanopore_2D_29kb.fastq
technology = pcbiohq

Please let me know if there is anything I can do to get an assembly with
nanopore reads alone.

Thanks,
Adrian

Other related posts: