[mira_talk] Re: mmhr problem

Thanks for the quick reply!  I'll try that and let you know how it works!

I really like this software BTW, thanks so much for making it.  I am not a
big bioinformatics person and I feel like I am getting along with it very
well.  I can tell it was made by someone who needed to use it, and not just
someone who wanted to sell it.

Cheers,
C

On Thu, Nov 20, 2008 at 5:54 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Thursday 20 November 2008 23:59, Clayton Coffman wrote:
> > I am having a problem running a de novo 454 est assembly.  Essentially it
> > tells me I have lots of megahubs, and I am trying to track down what
> these
> > are.  To do that I am trying to force it to go ahead with the assembly by
> > setting mmhr=1 but it always aborts anyways saying the ratio is greater
> > than 0, even though I set it to max at 1.  I could be doing it wrong,
> heres
> > what I do:
> >
> > mira  -fasta -project=Px -SK:mmhr=1  -job=denovo,est,draft,454
>
> Hello Clayton,
>
> I need to clarify in my docs that the quick switches (--job=... and
> friends)
> should be used towards the front of the command line as they overwrite
> almost
> every other option of MIRA.
>
> So, use:  "mira  -fasta -project=Px  -job=denovo,est,draft,454 -SK:mmhr=1"
>
> and you're good to go.
>
> > Is there a better way to find out what the megahub is?  My sequences
> aren't
> > paired-end and I set ssf_extract to trim an apporpriate number of bases
> on
> > the left to account for an adapter which I know is supposed to be there.
>
> To see whether the adaptor is consistently on the left side of your reads,
> use
> sff_extract once without a left clip. If the adaptor sequence is
> consistently
> there, sff_extract will report that.
>
> The following is a short guide on how to find out the really nasty repeats
> in
> your reads. I admittedly need to smooth out a few things with MIRA, but at
> least it works :-)
>
> Run the assembly in a separate directory once with
>   "-SK:mnr=yes:rt=<some-int-between-5-and-10>"
>
> MIRA will then mask the <int>-% highest occuring k-mers in your reads and
> report these to a file in the log directory.
>
> This happens almost directly after loading, so you can CTRL-C the program
> once
> you've seen these lines:
>
>
> ---------------------------------------------------------------------------
>
> Skimming for repeats (1/3)
>  [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|....
> [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|....
> [80%] ....|.... [90%] ....|.... [100%]
> Compressing hash histogram ... done. Sorting ... (this may take a while)
> ...
> done.
> Used hashes: 6105868
> Unused hashes: 262329588
> Median hashes: 14
> Alternative median hashes: 21
> Max hashes: 265
> Masking starts at: 210
>
> Skimming for repeats (2/3)
>  [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|....
> [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|....
> [80%] ....|.... [90%] ....|.... [100%]
>
> Skimming for repeats (3/3)
>  [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|....
> [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|....
> [80%] ....|.... [90%] ....|.... [100%]
> Localtime: Fri Nov 21 00:48:02 2008
>
> ---------------------------------------------------------------------------
>
> The file you should look for is
> named "*_int_skimmarknastyrepeats_nastyseq_preassembly.0.lst" and is in tab
> delimited format (name, masked sequence):
>
> nGGMAW54TR
> GGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGG
> nGGMAY80TF
> GGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGG
> nGGMB067TR     GGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGG
> nGGMBD71TR     GGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGG
>
> Hope it helps,
>  Bastien
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

Other related posts: