[mira_talk] Re: RFC: bundling or not bundling rRNA databases with MIRA

  • From: Chris Hoefler <hoeflerb@xxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Thu, 17 Dec 2015 17:04:41 -0600

Another consideration if you separate out the rRNA db from the binary is
the local vs. system installation issue. If I install my own local copy of
Mira, will it know where to find the db if a system-wide Mira (possibly
with a conflicting version) is installed on the same machine? Not hard to
work around, but has to be kept in mind.

On Thu, Dec 17, 2015 at 4:49 PM, Chris Hoefler <hoeflerb@xxxxxxxxx> wrote:

What is wrong with a separate tar.gz file which needs to extract into say
/usr/share/mira/db/ (controlled via MIRADB env variable)?



Just a matter of "design", not really a matter of disk or download
capacity


Erm, re-reading Bastien's question, are we talking about compiling the
rRNA data into the mira binary, or are we talking about packaging the rRNA
data with the source/binary tarball? If the former, the question amounts to
convenience vs. flexibility. It is convenient to compile-in for
users/package maintainers who can just pick up the binary and move it
around, but it is less flexible in the sense that you can't opt out (unless
there is a compile-time switch), you wouldn't be able to update one or the
other without updating both (ie: a minor revision to the rRNA data means
re-compiling and redistributing the whole mira binary), and you wouldn't be
able to package them separately (which you might want to do if you are a
package maintainer). Since it's only 10 Mb it's a bit of a tough call. I
guess I would ask, 1) Is the data useful to any other program or utility
outside of Mira? 2) Is it likely to be enlarged or augmented in some way in
the future? 3) Is it likely to require updates or changes on a
semi-recurrent basis, either by Bastien or by end-users? If the answer to
any of those is "yes", I would opt for the flexibility approach over the
convenience approach. Sven is technically correct about "design", but if
the data is useless to anything outside of Mira, and it's not going to be
changed or updated (or if a change in the data would require a
corresponding change within Mira), and it's only 10 Mb (which costs 0.03 US
cents by current disk prices, btw), then separating it out doesn't
accomplish much from a usefulness perspective.

If we are talking about bundling as a separate file/directory within the
source or pre-compiled binary tarball, then the question amounts to
installation complexity vs. space requirements. Space is really a moot
issue in my opinion, so I would go with bundling to make installation
easier for novice users. Package maintainers can still separate it out into
a different package if they wish. Mira would have to be able to function
without it, though, in the event that it does not get installed (or gets
deleted by the user), which might be more work for Bastien....

On Thu, Dec 17, 2015 at 2:08 PM, Sven Klages <sir.svencelot@xxxxxxxxx>
wrote:


2015-12-17 20:48 GMT+01:00 Peter Stockwell <peter.stockwell@xxxxxxxxxxx>:

At 10Mb why not just include it and be done with it.



-
​ because it's data, it is static, no software
- because in future there might be some other (bigger?) dataset, enabling
MIRA to screen out XYZ and/or ABC ...

Just a matter of "design", not really a matter of disk or download
capacity .. but I am a lone wolf here ;-)

best,
Sven​


Other related posts: