[mira_talk] Re: Megahub info

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 24 Jun 2009 18:43:04 +0200

On Mittwoch 24 Juni 2009 Björn Nystedt wrote:
> running MIRA (V2.9.45x1) I get report on 1 megahub in my data (it might be
> a problem with my vector clipping and I am still investigating that).
> Anyway, I had trouble finding info in the manuals about the log-file:
> *posmatch_megahubs_preassembly.0.lst
> In my case, I have a singel read name in this file. Anyone knows how to
> interpret that? Björn Nystedt

Hi Björn,

basically, having one read as 'megahub' means that you seem to have a number 
of reads which are quiterepetitive and one of them (by chance) gets over the 
threshold of 'being a megahub'.

Could you please have a look at the new manual in the *46 distribution which 
is a first draft on how to assemble 'nasty' data. There's a section which deals 
on how to find out which parts are causing problems (it's now also available 
online: Finding out repetitive parts in reads 
http://chevreux.org/uploads/media/mirav2946_hard.html#section_6).

Basically, I'd propose you have a look at the hash statistics of your project 
(described in help file). Then, restart the assembly with -SK:mnr=yes and -
SK:nrrr=XXX, for choosing XXX I'd suggest a rather high number that you 
determine from the hash statistics where things 'look funny'. During that run 
the file will be created that does contain both the read names as well as the 
masked parts of the reads, so you will be able to quickly find out what is 
causing havoc in your data. Don't go too low with -SK:nrr as you might then 
also find legitimate repetitive sequence (rRNAs come to mind in bacteria) and 
not only the contaminants. Guessing a bit, I'd say that choosing nrr=20 is a 
first good start.

I would be interested to see the hash statistics of your project, could you 
please send it to me to have a look at? Thanks.

Regards,
  Bastien


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: