[mira_talk] Re: Megahub info

  • From: Björn Nystedt <bjorn.nystedt@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 25 Jun 2009 18:42:36 +0200

Hi, 
a bit short on time to analyse this (this run is somewhat of a sidetrack at the 
moment). Anyway:

I think the megahub is actually an artefact; the single read in the megahub 
logfile is a long (15kb) fake read comprising a complete PCR product. (The 
complte run is 300000 GS20, 16000 Sanger and ~20 PCR fake reads)

I don't know if this is a problem; I simply allowed for the megahub (mmhr=1) 
and the run seems to be ok. (Using the PCR fake reads as reference contigs in a 
mapping assembly would be possible too, but that requires some manual joining 
in the end, and I want to avoid that as far as possible.)

Thanks for a great software.
B



On Wed, 24 Jun 2009 18:43:04 +0200
Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Mittwoch 24 Juni 2009 Björn Nystedt wrote:
> > running MIRA (V2.9.45x1) I get report on 1 megahub in my data (it might be
> > a problem with my vector clipping and I am still investigating that).
> > Anyway, I had trouble finding info in the manuals about the log-file:
> > *posmatch_megahubs_preassembly.0.lst
> > In my case, I have a singel read name in this file. Anyone knows how to
> > interpret that? Björn Nystedt
> 
> Hi Björn,
> 
> basically, having one read as 'megahub' means that you seem to have a number 
> of reads which are quiterepetitive and one of them (by chance) gets over the 
> threshold of 'being a megahub'.
> 
> Could you please have a look at the new manual in the *46 distribution which 
> is a first draft on how to assemble 'nasty' data. There's a section which 
> deals 
> on how to find out which parts are causing problems (it's now also available 
> online: Finding out repetitive parts in reads 
> http://chevreux.org/uploads/media/mirav2946_hard.html#section_6).
> 
> Basically, I'd propose you have a look at the hash statistics of your project 
> (described in help file). Then, restart the assembly with -SK:mnr=yes and -
> SK:nrrr=XXX, for choosing XXX I'd suggest a rather high number that you 
> determine from the hash statistics where things 'look funny'. During that run 
> the file will be created that does contain both the read names as well as the 
> masked parts of the reads, so you will be able to quickly find out what is 
> causing havoc in your data. Don't go too low with -SK:nrr as you might then 
> also find legitimate repetitive sequence (rRNAs come to mind in bacteria) and 
> not only the contaminants. Guessing a bit, I'd say that choosing nrr=20 is a 
> first good start.
> 
> I would be interested to see the hash statistics of your project, could you 
> please send it to me to have a look at? Thanks.
> 
> Regards,
>   Bastien
> 
> 
> -- 
> You have received this mail because you are subscribed to the mira_talk 
> mailing list. For information on how to subscribe or unsubscribe, please 
> visit http://www.chevreux.org/mira_mailinglists.html


-- 
====================================
Björn Nystedt (Sällström)
PhD Student
Molecular Evolution
EBC, Uppsala University
Norbyv. 18C, 752 36  Uppsala
Sweden
phone: +46 (0)18-471 45 88
email: Bjorn.Nystedt@xxxxxxxxx
====================================

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: