[mira_talk] Re: RFC: bundling or not bundling rRNA databases with MIRA

  • From: Martin MOKREJŠ <mmokrejs@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 17 Dec 2015 17:40:00 +0100

Hi Bastien et al.,
well, seems I am the only one on this list who would care about the extra
size. Maybe consider capacity of the many mirrors servers of various Linux
distros, etc. They are short on space.
I would propose keeping a separate bundle, or if you want, 'make fetch' could
fetch the files for a user. But this will likely result in users re-fetching
same data next time they upgrade mira, so I wouldn't like this. What is wrong
with a separate tar.gz file which needs to extract into say /usr/share/mira/db/
(controlled via MIRADB env variable)? Having its separate version numbers and
out-of-sync release cycle.

Just my 2c. I am contributing a lot of packages to Gentoo Linux science overlay
at least. ;-)
Martin

Bastien Chevreux wrote:

Dear all,

I plan to release MIRA 4.9.6 soon, either shortly before Christmas or by mid
January. While the bump in version number is small, a lot has happened behind
the scenes.

One feature I have added is the ability of mira/mirabait to directly fish for
or fish out rRNA sequences, something extremely useful in RST/RNASeq
assemblies. There’s just a slight problem: the dataset for this functionality
is ~10Mb. Not several gigabytes like RFAM, Silva or other rRNA databases, just
10 megabytes … and with that one should be able to recognise rRNA reads for the
vast majority of sequenced organisms on this planet.

The question I currently have: do I bundle this together with the MIRA binaries
or not?

Pro:
- easy install for novices (and forgetful ppl)
- easy for package and system maintainers

Con:
- the size of the binary distributable package doubles from 10 Mb to 20 Mb

I’m strongly tending for bundling as in today’s world, 10 Mb or 20 Mb are more
or less negligible sizes. However, I would like to have feedback on this just
in case someone sees a larger inconvenience.

Bastien



--
Martin Mokrejs, Ph.D.
Adapter/artefact removal from datasets based on the following technologies:
454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina
http://www.bioinformatics.cz/software/supported-protocols/

--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html

Other related posts: