[mira_talk] Mapping assembly problems: autorefine and low coverage

From: Elizabeth Latham <ealatham@xxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Sun, 7 Jun 2015 11:41:41 -0500

Greetings,

This is my first time posting so apologies for any faux pas. I'm
trying to map >35mil illumina reads to the genome of its closest
relative based on 16s sanger sequencing. However, I'm having problems
with the autorefine option. Inputting the following manifest results
in the attached error. I'm using cluster computing fyi

# First part: defining some basic things
project = Paeni_unpaired2
job = genome,mapping,accurate

parameters = -GE:not=8 -DI:trt=/state/partition1/tmp/ealatham -NW:cmrnl=no
#Reference sequence
readgroup
is_reference
data = ./paeni_r.fa
technology = text
strain = Paeni_r_genome

# The second part defines the sequencing data MIRA should load and assemble
readgroup = Newimport_unpaired
data = ./R1.fastq ./R2.fastq
technology = solexa
template_size = 50 1000 autorefine
segment_placement = ---> <---
segment_naming = solexa

Second part of my question, if I eliminate the template size, segment
placement, and naming I can map the reads against the reference genome
but I get really low coverage (>10% of the reads map to the reference)
despite being in the same genera. I'm not sure if this is related to
the lack of template size info or if there is something else that I am
missing in the manifest that would cause this problem. If I assemble
the reads on CLC genomic workbench I get around five >300,000bp
contigs so there is nothing intrinsically wrong with the data I think.

Thanks for helping!
Elizabeth
This is MIRA 4.0rc4 .

Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence
Assembly Using Trace Signals and Additional Sequence Information.
Computer Science and Biology: Proceedings of the German Conference on
Bioinformatics (GCB) 99, pp. 45-56.

To (un-)subscribe the MIRA mailing lists, see:
http://www.chevreux.org/mira_mailinglists.html

After subscribing, mail general questions to the MIRA talk mailing list:
mira_talk@xxxxxxxxxxxxx

To report bugs or ask for features, please use the SourceForge ticketing
system at:
http://sourceforge.net/p/mira-assembler/tickets/
This ensures that requests do not get lost.

Compiled by: bach
Mon Oct 14 17:32:36 CEST 2013
On: Linux vk10464 2.6.32-41-generic #94-Ubuntu SMP Fri Jul 6 18:00:34 UTC 2012
x86_64 GNU/Linux
Compiled in boundtracking mode.
Compiled in bugtracking mode.
Compiled with ENABLE64 activated.
Runtime settings (sorry, for debug):
Size of size_t : 8
Size of uint32 : 4
Size of uint32_t: 4
Size of uint64 : 8
Size of uint64_t: 8
Current system: Linux compute-1-13.local 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9
08:03:13 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

Fatal error (may be due to problems of the input data or parameters):

*******************************************************************************
* In file /data/ealatham/manifest.conf: line *
*
* template_size = 50 1000 autorefine *
*
* Keyword 'template_size' expects exactly 'infoonly' or 'exclusion_criterion' *
* as third value, found 'autorefine' *
*******************************************************************************

->Thrown: void Manifest::slurpInManifest(stringstream & mfin, const string &
origsource, bool resume))
->Caught: main

Aborting process, probably due to error in the input data or parametrisation.
Please check the output log for more information.
For help, please write a mail to the mira talk mailing list.

Subscribing / unsubscribing to mira talk, see:
//www.freelists.org/list/mira_talk

CWD: /data/ealatham
Thank you for noticing that this is *NOT* a crash, but a
controlled program stop.
Failure, wrapped MIRA process aborted.

Follow-Ups:
- [mira_talk] Re: Mapping assembly problems: autorefine and low coverage
  - From: Andrej Benjak

[mira_talk] Mapping assembly problems: autorefine and low coverage

Other related posts: