[mira_talk] High coverage data set, RAM limitations, subsampling.

From: Felipe Gajardo <felipe.gajardo.e@xxxxxxxxx>
To: mira_talk <mira_talk@xxxxxxxxxxxxx>
Date: Fri, 10 May 2013 12:02:18 -0400

Hi Bastien and everybody,

I have a data set from an IonTorrent sequencing of a bacterial genome
(target size ~2,5 Mb). Throughput is 757406080 bp so i have ~300X of
coverage. the data set contains two libraries merged. A mate-paired library
with avg. read size ~50 bp (I already get off the internal adaptor
sequence); and a fragments library with an avg. read size of ~200 bp. I
tried to assemble this data, but i just have 8Gb of RAM, so MIRA crushes
after a few minutes working. Then i decided to take a subset of the reads
until obtain 100X coverage and then assemble (but this time without a
traceinfo file, because i did not generate it to the subset). I took the
whole mate-paired library (~20% of the data set) and part of the fragments
library.

$ mira --project=B0P1-8 --job=denovo,genome,accurate,
iontor --notraceinfo

MIRA successfully assembled the data, obtaining:

Large contigs
===========
  Length assessment:
  ------------------
  Number of contigs:    154
  Total consensus:      2466152
  Largest contig:       325074
  N50 contig size:      55150
  N90 contig size:      9797
  N95 contig size:      3942

All contigs:
============
  Length assessment:
  ------------------
  Number of contigs:    1375
  Total consensus:      2810830
  Largest contig:       325074
  N50 contig size:      44588
  N90 contig size:      365
  N95 contig size:      283

Now i have some questions:
Is there a way to include the reads i left out of the assemble to complete
it (considering my RAM limitations)?
Does know MIRA that some reads are mate-paired if not having the traceinfo
file?
Could be a better approach make an assembly of a subset including
exclusively reads from the fragments library and after that, use the
mate-paired information to give order to the contigs obtained?

Greetings and thanks in advance!

Follow-Ups:
- [mira_talk] Re: High coverage data set, RAM limitations, subsampling.
  - From: Bastien Chevreux
- [mira_talk] Re: High coverage data set, RAM limitations, subsampling.
  - From: John Nash

[mira_talk] High coverage data set, RAM limitations, subsampling.

Other related posts: