On 12 Aug 2014, at 1:06 , John Eppley <jmeppley@xxxxxxxxxx> wrote: > I get the following message from Mira 4.0.2: > […] > I'm trying do do something like an iterative assembly using two pairs of > Illumina (MiSeq) fastq files. The plan is to do a denovo assembly with one > pair of files, and then a mapping assembly using the first assembly (as a MAF > files) as the reference. My eventual goal is to be able to assemble something > that might not otherwise fit into RAM. Good thinking, but that approach will unfortunately backfire in terms of RAM usage: contigs are incredibly RAM expensive beasts and I suspect you will end up using more RAM doing this than by doing a full de-novo, I’m sorry. > […] > The data are randomly fragmented transcripts from a mixed population, hence > the est approach. > My first question is: is this a reasonable thing to attempt? Can Mira pull > off this sort of iterative assembly? The approach will also not work for other reasons: if a transcript is broken in two (or more parts) in the first assembly because of missing data, these parts will not be joined in the subsequent mapping. There are a couple of other reasons (those can be worked around though), but I think that this one will already be a no-go and cannot be worked around. > […] > If so, then what is there to do about this error? > In the meantime, I'll try to reproduce with a smaller set of reads. You cannot do anything yourself as this looks like a programming error. Must be something weird though, it’s deep within the core contig routines. It maybe already fixed (I’m not sure), you can have a try at my current development branch and report whether it still appears if you wish: http://www.chevreux.org/tmp/mira_develop-0-g81642a1_linux-gnu_x86_64_static.tar.bz2 If it is still present, any small data set you can give me to reproduce and go on a bug hunt is welcome. Best, Bastien