Hi, I am trying to use mira 3.4 to map 1.5 milion 454 titanium reads on a large (2 Gb) and mostly repeat-containing plant genome. I know that mira is not currently optimised for this type of genome, but I've tried a few combination of parameters, and if the mapping seems finaly to run, it is taking forever, and so i would like to know how I could improve my settings. At first, I've tried mapping on the "normal" genome, but reached very high level of memory ( around 330Go) and after 5 days the mapping was stuck in the first contig. As it was not possible to keep this going I tried to modify my command line to include the -highlyrepetitive option as shown below, but got mira to stop due to megahubs. I changed the nrr parameter from 10 to 5 and reduce the mhpr to 100 without more success. mira -project=m04 -job=mapping,genome,accurate,454 --notraceinfo --highlyrepetitive -GE:not=12 -SB:lsd=yes:bsn=m_v1:bft=fasta:bbq=30 -SK:pr=95:mhpr=100:not=12:nrr=5 454_SETTINGS -LR:ft=fasta -AL:mo=20:mrs=95 1>> m04.log I did not increase mmhr parameter for now, but tried another approach to avoid the large portion of repetitive sequences. As my reads are theoretically targeted on low frequency sequences, I decided to do the mapping on a masked version of my genome. This time it worked, but after 6 days of computing, it is only starting to map the second chromosome.. Here is my command line : mira -project=m04_wgm -job=mapping,genome,accurate,454 --notraceinfo -GE:not=12 -SB:lsd=yes:bsn=m_v1:bft=fasta:bbq=30 -SK:pr=95:not=12 454_SETTINGS -LR:ft=fasta -AL:mo=20:mrs=95 1>> m04_wgm.log I searched in the archive of the mailing list, and found a thread about a similar problem ("where is my assembly at?"), so I looked at my log file and found this : [522402] +++a++a++a+aa++a+++a+a++a+aa+++++a+a++aa++++aa+++++++++++a++ 300282317 4 / 1345519 / 29 [522444] a+++++a+a++++++++a+++++++++aaa+a+++++a+a++aaa+a+aa+++aaa++++ 300282320 3 / 844126 / 30 [522485] a++++a+aa+aa++++++a+++++++aa+++++++++a+++++aaaa+++a+++a+++++ 300282325 8 / 714580 / 35 [522529] ++aa+++a++a+++++++aa+++a+++++++a+aa+a++++++++++++a++++aa++++ 300282332 4 / 1190728 / 32 [522575] a++a++a+aa++a+++aa+a+++a+++a+++++a+a+aa+a+++++a+++++++a+++++ 300282338 4 / 630878 / 37 [522617] +a+aa+a++++aaa++a++a++aaaa++a+a++aa++a++a++a++++++a+a+++a+++ 300282342 3 / 522819 / 28 [522654] ++++++++++++++++a++a+++a+++aa+++a+aa++++++++aaa+a++aa+++a+++ 300282344 3 / 218685 / 28 So I suspect the delay to be because of those "a", and of the large numbers in //, I've noticed a few lines with very large number in the first position too : [410891] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 300239183 38304 / 9405835 / 0 As each trial takes a few days at least, I would like to know what could I change in my parameters now, or maybe try the developement version which was given in the other thread? I can send my log file if it helps.. Thank you very much for reading me, Magalie -- Magalie Leveugle, PhD Research Scientist Bioinformatics Team - Upstream Genomics Group BIOGEMMA Site de La Garenne CS 90126 63720 CHAPPES, FRANCE Tel : 04 73 67 88 57