On Freitag 14 August 2009 Marcin Swiatek wrote: > I seem to have difficulties getting good results from Mira. Or perhaps > 'expected' would be a better word. Here is my story: I am trying to > assemble the genome of a strain of a Lactobacillus bacteria. It is a > naughty little microbe [...] Hi Marcin, welcome to the club. Lactobacillus has been a nightmare project also for me. Especially as I had no paired-end at the time. > I got decently looking results, but there is one thing I > don't understand: where all these paired ends went? They are in the input > files I think, I saw these reads in the generated traceinfo file... My first guess would be: in the contigs, where they belong. > However, while both Celera and Newbler produced contigs *and* scaffolds, in > Mira's output I find contigs only. In the beginning the users I know liked to use MIRA and combine it with dedicated scaffolders (BAMBUS, own scripts etc.), therefore I never really felt the urge to implement an own scaffolder. This has considerably changed as inquiries for a scaffolder have noticeably increased in the past year. I think I'll have to cave in at some point: not for the 3.0 version which I'm finalising at the moment, but it's now pretty high on the TODO. In the mean time, some time ago I had asked a few people who I know use the AMOS scaffolder to write a short HOWTO for data comming from MIRA. But I haven't heard back from any at the moment. > Contigs computed by Mira (using > 'accurate') are quite similar in number and size distribution to what I get > with other assemblers, but I see no scaffolds and no evidence of use of > paired end data. > [...] > Now the questions. Firstly, how do I tell if paired ends were indeed used > or not. Secondly, if they weren't, how do I go about putting them to use. MIRA uses them without making too much noise about it. One way for you to check: in the output, there's a line saying Generated XXX unique template ids for YYY valid reads. If XXX is smaler than YYY, then MIRA has assigned read-pairs to templates and uses that information later on in the assembly: [1321] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [1381] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++s+ [1440] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [1500] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [1560] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [1620] ++++++++++++++++++++++++++++++++++++++++++++++++++t+t+t+++++ The "+" shows reads assembled without problems, "s" means a read has been rejected at a given contig position because of template size violation and "t" because of template direction violation. So you'll see the template usage only when there's a (temporary) problem during construction, all others are assembled without any more notice. > And if they were, why don't I see scaffolds (or longer contigs with little > gaps in them). Because there's no scaffolder. As I wrote, it's on the TODO. In the mean time, I'd propose to use the one from AMOS as I heard it works quite well (never used it though). > I will have another query, but I think I will try that one by one. No problem. Best, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html