[mira_talk] Re: Hash frequency specification (precomputation)

  • From: Robert Bruccoleri <bruc@xxxxxxxxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 23 Oct 2011 16:44:10 -0400

Dear Bastien,
   No, not what I'm asking.
   In Ascii art form:

Read: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Word1:     [-----------n1 times----------]
Word2:        [-----------n2 times----------]
Word3:           [-----------n3 times----------]
Word4:              [-----------n4 times----------]
Word5:                 [-----------n5 times----------]
etc.

For each 31mer, I have precomputed the number of times it appears in my entire data set, but I don't want to assemble the entire dataset all at once. I want to feed Mira subsets of reads to assemble along with the word counts, so it can apply its algorithms correctly based on the entire dataset, not just subsets used for one small assembly. BTW, all this information is in a 3TB database. With the indexing that I have used, I can get values out quickly.

Is this clear?

Cheers,
Bob

Bastien Chevreux wrote:

On Sunday 23 October 2011 22:14:03 Robert Bruccoleri wrote:

> I am working on a problem where I can precompute hash frequencies in

> advance of an assembly. I want to use this information to assemble short

> regions of my genome. Is it possible to specify the hash frequencies on

> a set of reads as input?


Ummm ... say again?


The only thing I could think of would be to give MIRA the average expected hash frequency instead of computing it. Is this what you were asking? (and then the answer would be "no, not at the moment")


B.



begin:vcard
fn:Robert Bruccoleri
n:Bruccoleri;Robert
org:Audacious Energy, LLC and Congenomics, LLC
adr:;;;;;;USA
email;internet:bruc@xxxxxxx
title:President
version:2.1
end:vcard

Other related posts: