OK, a couple of points have gone through this thread where I'd like to a say few things: Re read name length: MIRA has no limit regarding the length of a read name, but some (older) tools downstream may have. This is why MIRA per default stops with a warning when it encounters read names which are long. But there's, of course, a switch to tell it to ignore and continue. But what I often do is rename the reads to some shorter names anyway ... long names unnecessarily eat RAM. Re paired end naming scheme (pre-CASAVA 1.8): True, for MIRA 3.4 you need to convert the read names so that MIRA recognizes that they form pairs. One solution was posted, but there are others like a simple sed one-liner. Alternatively you can use the development version of MIRA (3.9.x) which knows hoe to deal with CASAVA 1.8 style naming. Re mapping against human genome: very good idea, I recommend doing that. MIRA would probably be able to deal with these reads, but it reduces the amount of data which needs to be looked at (every bit counts). You might want to be strict though in the mapping: not more than two mismatches or similar. Re misassemblies: I suspect there will be some. However, for a first rough gene catalogue I would not bother too much: most things will be right. One more thing I thought about today: MIRA has a pretty good clipping of bad quality data based on kmers (called "proposed end clipping", -CL:pec). However, this might interfere with metagenome assemblies where I suspect that many of the organisms will be only lowly covered. You might want to test with and without -CL:pec. B. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html