Hi Bastien, I was afraid you would answer something like that :D Thank you very much for your fast answer, I will consider other options for my mapping, but will stay aware of your next developpements, maybe your 'radar' will 'blip' soon on this issue! Cheers, Magalie De : Bastien Chevreux <bach@xxxxxxxxxxxx> A : mira_talk@xxxxxxxxxxxxx Date : 18/10/2011 22:15 Objet : [mira_talk] Re: Mapping on a large and repeated genome Envoyé par : mira_talk-bounce@xxxxxxxxxxxxx On Oct 18, 2011, at 18:23 , Magalie.LEVEUGLE@xxxxxxxxxxxx wrote: I am trying to use mira 3.4 to map 1.5 milion 454 titanium reads on a large (2 Gb) and mostly repeat-containing plant genome. I know that mira is not currently optimised for this type of genome, ... It indeed is not. {...} So I suspect the delay to be because of those "a", and of the large numbers in //, I've noticed a few lines with very large number in the first position too : [410891] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 300239183 38304 / 9405835 / 0 The problem are not the 'a' per se, rather the pretty large numbers beside the line (which are timing information). As each trial takes a few days at least, I would like to know what could I change in my parameters now, or maybe try the developement version which was given in the other thread? I can send my log file if it helps.. No, it would not ... the above lines told me everything I needed to know. MIRA is indeed currently not really suited for this use case. The problem lies in the rather simplistic data structures with which contigs are represented internally. They're pretty good for de-novo assembly where contigs attain a couple of 100kb, but they start to fail in the megabase range and certainly so when contigs go well above 10 megabases. Although, I have to rephrase that: they fail big time as soon as insertions or deletions need to be done. Which is certainly the case for mapping 454 data, for Illumina data one does not really feel the problem as there are not too many indels. This is a weakness which has bitten me in the past few months, especially with mapping of IonTorrent data ... or de-novo of 454 and Illumina hybrids. So I have that on my radar (and it's quite high on the priority list). Back to your problem: sorry, there is absolutely nothing you can do from the parameter side. The only work-around I can propose to you at the moment is to sub-divide your reference genome in chunks of 10 to 20 megabases. Not really an option, I know. Best, Bastien