[mira_talk] Re: mmhr problem
- From: Bastien Chevreux <bach@xxxxxxxxxxxx>
- To: mira_talk@xxxxxxxxxxxxx
- Date: Fri, 21 Nov 2008 00:54:20 +0100
On Thursday 20 November 2008 23:59, Clayton Coffman wrote:
> I am having a problem running a de novo 454 est assembly. Essentially it
> tells me I have lots of megahubs, and I am trying to track down what these
> are. To do that I am trying to force it to go ahead with the assembly by
> setting mmhr=1 but it always aborts anyways saying the ratio is greater
> than 0, even though I set it to max at 1. I could be doing it wrong, heres
> what I do:
>
> mira -fasta -project=Px -SK:mmhr=1 -job=denovo,est,draft,454
Hello Clayton,
I need to clarify in my docs that the quick switches (--job=... and friends)
should be used towards the front of the command line as they overwrite almost
every other option of MIRA.
So, use: "mira -fasta -project=Px -job=denovo,est,draft,454 -SK:mmhr=1"
and you're good to go.
> Is there a better way to find out what the megahub is? My sequences aren't
> paired-end and I set ssf_extract to trim an apporpriate number of bases on
> the left to account for an adapter which I know is supposed to be there.
To see whether the adaptor is consistently on the left side of your reads, use
sff_extract once without a left clip. If the adaptor sequence is consistently
there, sff_extract will report that.
The following is a short guide on how to find out the really nasty repeats in
your reads. I admittedly need to smooth out a few things with MIRA, but at
least it works :-)
Run the assembly in a separate directory once with
"-SK:mnr=yes:rt=<some-int-between-5-and-10>"
MIRA will then mask the <int>-% highest occuring k-mers in your reads and
report these to a file in the log directory.
This happens almost directly after loading, so you can CTRL-C the program once
you've seen these lines:
---------------------------------------------------------------------------
Skimming for repeats (1/3)
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|....
[40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|....
[80%] ....|.... [90%] ....|.... [100%]
Compressing hash histogram ... done. Sorting ... (this may take a while) ...
done.
Used hashes: 6105868
Unused hashes: 262329588
Median hashes: 14
Alternative median hashes: 21
Max hashes: 265
Masking starts at: 210
Skimming for repeats (2/3)
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|....
[40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|....
[80%] ....|.... [90%] ....|.... [100%]
Skimming for repeats (3/3)
[0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|....
[40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|....
[80%] ....|.... [90%] ....|.... [100%]
Localtime: Fri Nov 21 00:48:02 2008
---------------------------------------------------------------------------
The file you should look for is
named "*_int_skimmarknastyrepeats_nastyseq_preassembly.0.lst" and is in tab
delimited format (name, masked sequence):
nGGMAW54TR GGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGG
nGGMAY80TF GGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGG
nGGMB067TR GGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGG
nGGMBD71TR GGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGGCTTCGG
Hope it helps,
Bastien
--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html
Other related posts: