[mira_talk] Re: Megahub info
- From: Björn Nystedt <bjorn.nystedt@xxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Thu, 25 Jun 2009 18:42:36 +0200
Hi,
a bit short on time to analyse this (this run is somewhat of a sidetrack at the
moment). Anyway:
I think the megahub is actually an artefact; the single read in the megahub
logfile is a long (15kb) fake read comprising a complete PCR product. (The
complte run is 300000 GS20, 16000 Sanger and ~20 PCR fake reads)
I don't know if this is a problem; I simply allowed for the megahub (mmhr=1)
and the run seems to be ok. (Using the PCR fake reads as reference contigs in a
mapping assembly would be possible too, but that requires some manual joining
in the end, and I want to avoid that as far as possible.)
Thanks for a great software.
B
On Wed, 24 Jun 2009 18:43:04 +0200
Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:
> On Mittwoch 24 Juni 2009 Björn Nystedt wrote:
> > running MIRA (V2.9.45x1) I get report on 1 megahub in my data (it might be
> > a problem with my vector clipping and I am still investigating that).
> > Anyway, I had trouble finding info in the manuals about the log-file:
> > *posmatch_megahubs_preassembly.0.lst
> > In my case, I have a singel read name in this file. Anyone knows how to
> > interpret that? Björn Nystedt
>
> Hi Björn,
>
> basically, having one read as 'megahub' means that you seem to have a number
> of reads which are quiterepetitive and one of them (by chance) gets over the
> threshold of 'being a megahub'.
>
> Could you please have a look at the new manual in the *46 distribution which
> is a first draft on how to assemble 'nasty' data. There's a section which
> deals
> on how to find out which parts are causing problems (it's now also available
> online: Finding out repetitive parts in reads
> http://chevreux.org/uploads/media/mirav2946_hard.html#section_6).
>
> Basically, I'd propose you have a look at the hash statistics of your project
> (described in help file). Then, restart the assembly with -SK:mnr=yes and -
> SK:nrrr=XXX, for choosing XXX I'd suggest a rather high number that you
> determine from the hash statistics where things 'look funny'. During that run
> the file will be created that does contain both the read names as well as the
> masked parts of the reads, so you will be able to quickly find out what is
> causing havoc in your data. Don't go too low with -SK:nrr as you might then
> also find legitimate repetitive sequence (rRNAs come to mind in bacteria) and
> not only the contaminants. Guessing a bit, I'd say that choosing nrr=20 is a
> first good start.
>
> I would be interested to see the hash statistics of your project, could you
> please send it to me to have a look at? Thanks.
>
> Regards,
> Bastien
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
--
====================================
Björn Nystedt (Sällström)
PhD Student
Molecular Evolution
EBC, Uppsala University
Norbyv. 18C, 752 36 Uppsala
Sweden
phone: +46 (0)18-471 45 88
email: Bjorn.Nystedt@xxxxxxxxx
====================================
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: