[mira_talk] Re: Internal logic/programming/debugging error during mapping assembly with MAF reference

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 12 Aug 2014 10:08:37 +0200

On 12 Aug 2014, at 1:06 , John Eppley <jmeppley@xxxxxxxxxx> wrote:
> I get the following message from Mira 4.0.2:
> […]
> I'm trying do do something like an iterative assembly using two pairs of 
> Illumina (MiSeq) fastq files. The plan is to do a denovo assembly with one 
> pair of files, and then a mapping assembly using the first assembly (as a MAF 
> files) as the reference. My eventual goal is to be able to assemble something 
> that might not otherwise fit into RAM.

Good thinking, but that approach will unfortunately backfire in terms of RAM 
usage: contigs are incredibly RAM expensive beasts and I suspect you will end 
up using more RAM doing this than by doing a full de-novo, I’m sorry.

> […]
> The data are randomly fragmented transcripts from a mixed population, hence 
> the est approach.
> My first question is: is this a reasonable thing to attempt? Can Mira pull 
> off this sort of iterative assembly?

The approach will also not work for other reasons: if a transcript is broken in 
two (or more parts) in the first assembly because of missing data, these parts 
will not be joined in the subsequent mapping. There are a couple of other 
reasons (those can be worked around though), but I think that this one will 
already be a no-go and cannot be worked around.

> […]
> If so, then what is there to do about this error?
> In the meantime, I'll try to reproduce with a smaller set of reads.

You cannot do anything yourself as this looks like a programming error. Must be 
something weird though, it’s deep within the core contig routines. It maybe 
already fixed (I’m not sure), you can have a try at my current development 
branch and report whether it still appears if you wish: 
   
http://www.chevreux.org/tmp/mira_develop-0-g81642a1_linux-gnu_x86_64_static.tar.bz2

If it is still present, any small data set you can give me to reproduce and go 
on a bug hunt is welcome.

Best,
  Bastien


Other related posts: