[mira_talk] Re: from + len > size of contig?

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 1 Jul 2015 17:17:02 +0200

My apologies for responding late in this thread, I have a couple of things to
juggle with atm.

1. The message MIRA spits out is a real error which should be fixed.
2. If you were working with MIRA 4.0.2 or prior version, could you please try
and test the current development version 4.9.5? I *think* I fixed at least one
bug which could cause this message, but I’m unsure whether it would apply for
your scenario.
3. If you still see the error popping up: if I could get the data, I should be
able to reproduce it and hunt down the problem. Though, depending on timing, it
may take until end of August.

B.

On 01 Jul 2015, at 16:40 , abenjak . <abenjak@xxxxxxxxx> wrote:

Could the problem be that you set the Illumina contigs as "technology =
solexa"?
If these are assembled contigs, than the technology should be set to "text".

Andrej

On Wed, Jul 1, 2015 at 4:23 PM, Bleker, Carissa R. <blekercr@xxxxxxxx
<mailto:blekercr@xxxxxxxx>> wrote:
We've asked, but there is no chance for getting fastq reads. The PacBio was
the raw data, which I assembled using mira as well. Attempting to map the
illumina reads to the PacBio mira assembly is what threw up the error below.


From: mira_talk-bounce@xxxxxxxxxxxxx <mailto:mira_talk-bounce@xxxxxxxxxxxxx>
<mira_talk-bounce@xxxxxxxxxxxxx <mailto:mira_talk-bounce@xxxxxxxxxxxxx>> on
behalf of John Nash <john.he.nash@xxxxxxxxx <mailto:john.he.nash@xxxxxxxxx>>
Sent: Tuesday, June 30, 2015 12:46 PM

To: mira_talk@xxxxxxxxxxxxx <mailto:mira_talk@xxxxxxxxxxxxx>
Subject: [mira_talk] Re: from + len > size of contig?

I agree with Chris. I have found using illumina reads to correct PacBio
assemblies a great tool BUT it is only useful if the reads are fastq reads.

How was your PacBio data assembled? Was it using the pacbio (hgap, etc) tools?

Is there any way that you can chase up the person/lab who did the illumina
sequencing and hunt down the fastq reads? So what you need to do is not a
hybrid de novo assembly of pacbio and illumina reads but a reference assembly
of illumina reads to a pacbio reference.

0.02
John


On Jun 29, 2015, at 12:40 PM, Chris Hoefler <hoeflerb@xxxxxxxxx
<mailto:hoeflerb@xxxxxxxxx>> wrote:

That's not going to work very well. What are you trying to achieve with the
hybrid assembly? Is the PacBio assembly not good enough for what you need?
Without Illumina reads, you won't be able to do much to improve it. If you
just want to order the Illumina contigs using the PacBio reference, you can
use Mauve. I'm assuming that since Mira was able to take your contigs as
reads that they aren't very long (< 20 kb)?

On Mon, Jun 29, 2015 at 11:06 AM, Bleker, Carissa R. <blekercr@xxxxxxxx
<mailto:blekercr@xxxxxxxx>> wrote:
Nope, I only have the fasta file. They are from the same strain, I'm trying
to a hybrid assembly with the PacBio and Illumina data.
From: mira_talk-bounce@xxxxxxxxxxxxx <mailto:mira_talk-bounce@xxxxxxxxxxxxx>
<mira_talk-bounce@xxxxxxxxxxxxx <mailto:mira_talk-bounce@xxxxxxxxxxxxx>> on
behalf of Chris Hoefler <hoeflerb@xxxxxxxxx <mailto:hoeflerb@xxxxxxxxx>>
Sent: Monday, June 29, 2015 10:42 AM
To: mira_talk@xxxxxxxxxxxxx <mailto:mira_talk@xxxxxxxxxxxxx>
Subject: [mira_talk] Re: from + len > size of contig?

Do you have the Illumina reads? You can just map those directly to the
reference instead of the contigs. Are you mapping two different
strains...what are you trying to do?

On Mon, Jun 29, 2015 at 8:22 AM, Bleker, Carissa R. <blekercr@xxxxxxxx
<mailto:blekercr@xxxxxxxx>> wrote:

Hi,

I was trying to map Illumina contigs to a mira assembled Pacbio referene.
My config looks like:

'''
project = glycomyces_mapping_try1
job = genome,mapping,accurate

# parameter settings
parameters = COMMON_SETTINGS -GE:not=8, -DI:trt=/tmp/
parameters = -NW:cmrnl=no

# since no fasta qualtity file for illumina
parameters = SOLEXA_SETTINGS --noqualities

# reference sequence
readgroup = GlycomycesPacbio
is_reference
data = /path/to/file/glycomyces_assembly_pacbio_try1_out.caf

# illumina sequences
readgroup = GlyvomyceseIllumina
data = /path/to/file/glycomyces_illumina.fasta.fna
technology = solexa
default_qual = 30 # fake quality value
'''

After running for a few hours I get the error:

'''
Internal logic/programming/debugging error (*sigh* this should not have
happened)

********************************************************************************
* from + len > size of contig?
*
********************************************************************************
->Thrown: void Contig::updateCountVectors(const int32 from, const int32 len,
vector<char>::const_iterator updateI, const uint32 seqtype, const bool
addiftrue, int32 coveragemultiplier)
->Caught: void Contig::stripToBackbone()

Aborting process, probably due to an internal error.
'''

I noticed a previous problem like this in the mailing list and a
recommendation was to use only one thread, however this gave exactly the
same error at the some point in the mapping. I also tried both the CAF and
MAF files from the initial Pacbio denovo assembly.

This is my first time doing an assembly, so any and all advice is welcome!




--
Chris Hoefler, PhD
Postdoctoral Research Associate
Straight Lab
Texas A&M University
2128 TAMU
College Station, TX 77843-2128



--
Chris Hoefler, PhD
Postdoctoral Research Associate
Straight Lab
Texas A&M University
2128 TAMU
College Station, TX 77843-2128



Other related posts: