Hi Elizabeth,
I think you should use 'infoonly' in mapping, especially if the
reference is another species.
Even better, just use 'autopairing' and forget about the template size
or placement, like this:
# The second part defines the sequencing data MIRA should load and assemble
readgroup = Newimport_unpaired
autopairing
data = ./R1.fastq ./R2.fastq
technology = solexa
mira will default to 'infoonly' because it's a mapping project, and will
infer the insert size and placement automatically, for information
purpose only.
Does the assembly improve with that?
Cheers,
Andrej
BTW, your fastq files don't need to be uncompressed, they take too much
space. Mira can handle compressed files.
For example:
data = ./R1.fastq.gz ./R2.fastq.gz
On 06/07/2015 06:41 PM, Elizabeth Latham wrote:
Greetings,
This is my first time posting so apologies for any faux pas. I'm
trying to map >35mil illumina reads to the genome of its closest
relative based on 16s sanger sequencing. However, I'm having problems
with the autorefine option. Inputting the following manifest results
in the attached error. I'm using cluster computing fyi
# First part: defining some basic things
project = Paeni_unpaired2
job = genome,mapping,accurate
parameters = -GE:not=8 -DI:trt=/state/partition1/tmp/ealatham -NW:cmrnl=no
#Reference sequence
readgroup
is_reference
data = ./paeni_r.fa
technology = text
strain = Paeni_r_genome
# The second part defines the sequencing data MIRA should load and assemble
readgroup = Newimport_unpaired
data = ./R1.fastq ./R2.fastq
technology = solexa
template_size = 50 1000 autorefine
segment_placement = ---> <---
segment_naming = solexa
Second part of my question, if I eliminate the template size, segment
placement, and naming I can map the reads against the reference genome
but I get really low coverage (>10% of the reads map to the reference)
despite being in the same genera. I'm not sure if this is related to
the lack of template size info or if there is something else that I am
missing in the manifest that would cause this problem. If I assemble
the reads on CLC genomic workbench I get around five >300,000bp
contigs so there is nothing intrinsically wrong with the data I think.
Thanks for helping!
Elizabeth