On Mittwoch 24 Juni 2009 Björn Nystedt wrote: > running MIRA (V2.9.45x1) I get report on 1 megahub in my data (it might be > a problem with my vector clipping and I am still investigating that). > Anyway, I had trouble finding info in the manuals about the log-file: > *posmatch_megahubs_preassembly.0.lst > In my case, I have a singel read name in this file. Anyone knows how to > interpret that? Björn Nystedt Hi Björn, basically, having one read as 'megahub' means that you seem to have a number of reads which are quiterepetitive and one of them (by chance) gets over the threshold of 'being a megahub'. Could you please have a look at the new manual in the *46 distribution which is a first draft on how to assemble 'nasty' data. There's a section which deals on how to find out which parts are causing problems (it's now also available online: Finding out repetitive parts in reads http://chevreux.org/uploads/media/mirav2946_hard.html#section_6). Basically, I'd propose you have a look at the hash statistics of your project (described in help file). Then, restart the assembly with -SK:mnr=yes and - SK:nrrr=XXX, for choosing XXX I'd suggest a rather high number that you determine from the hash statistics where things 'look funny'. During that run the file will be created that does contain both the read names as well as the masked parts of the reads, so you will be able to quickly find out what is causing havoc in your data. Don't go too low with -SK:nrr as you might then also find legitimate repetitive sequence (rRNAs come to mind in bacteria) and not only the contaminants. Guessing a bit, I'd say that choosing nrr=20 is a first good start. I would be interested to see the hash statistics of your project, could you please send it to me to have a look at? Thanks. Regards, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html