[contestms-dev] Re: Fwd: Better Public Cloud support for CMS

  • From: William Di Luigi <williamdiluigi@xxxxxxxxx>
  • To: contestms-dev@xxxxxxxxxxxxx, contestms@xxxxxxxxxxxxx
  • Date: Thu, 01 Oct 2015 13:46:58 +0000

Hi,
I wonder if this can already be done using docker-gen [1]

Basically: docker-gen uses the docker API to retrieve which containers are
started and then it renders a template file by injecting the correct IP
addresses in it. To assign "roles" to containers you use environment
variables (e.g. docker run --env CMS_ROLE=worker ...), which are available
inside the template. For example we recently worked on a containers.tmpl
[2] template which produces a nginx configuration file. I think it's
possible to create (in the same fashion) a cms.tmpl file which renders to
cms.conf, and then share that cms.conf between docker containers.

[1]: https://github.com/jwilder/docker-gen#docker-gen
[2]:
https://github.com/veluca93/cms/blob/ec9a338fb22977d55d5247a052925166beee49a4/docker/containers.tmpl

On Thu, Oct 1, 2015 at 11:08 AM Motiejus Jakštys <desired.mta@xxxxxxxxx>
wrote:

+contestms-dev@

Please reply to this email (not the first one).

---------- Forwarded message ----------
Hi all,

Since last year, in Lithuania we are running CMS on a public cloud
except finals. This year we want to include on-site finals. There are
a few inconveniences that, if solved, would make our lives easier.
First, how we use CMS.

* Bootstrap PostgreSQL on a hosted database (AWS RDS). A few hours
before the contest, we increase the size of the database to something
expensive and powerful. After the contest stops, we downgrade the DB
back to the minimal one. So database endpoint is always static and
taken care of.
* Workers and CWSs are separate clusters on different availability
zones, also spawned a few hours before the contest. We have two age
groups (two contest IDs), that requires two sets of CWSs. Workers go
to a common pool on different AZs.
* Server with the rest of the services (admin, queue, grading) is
always one, hand-configured, with medium performance. We call it
"management".

Reliability-wise, we are quite happy. The only SPOF is the
"management", but, since we can tolerate a few minutes of downtime to
allow it to re-bootstrap, it's kind-of OK. Most heavily loaded are
CWS'es. During the duration of the contest, we check the CPU usage of
these, and, when needed, add extra hosts. Here's a 15x
over-provisioned cluster from last year, the first time we ran the
contest on AWS[1].

As you could guess, the most annoying inconvenience is that the
servers change during the duration of the contest. Their IPs are
hard-coded into cms.conf, which requires a change cms.conf and restart
of the whole cluster when a new machine is added. We have some
primitive reconfiguration automation now: servers periodically
download cms.conf from a central location, and, if changes are
detected, restart the service locally. That still requires
hand-modifying cms.conf while being extra careful, and requires to
know the server IPs. Ideally, we'd love to avoid manually changing the
cms.conf when servers are added or removed.

Ideally, the servers could auto-discover themselves. You start a
server (or a Docker container), it knows its role. It should be able
to add itself to the cluster automatically without further
handholding.

The solution would be to set up a separate etcd/consul cluster to deal
with this stuff: it could generate cms.conf and restart the server
when needed. Before I do this, I have a few questions:

1. Does this mechanism sound reasonable? Maybe, given some engineering
time, there is a better way to deal with this problem?
2. Is there anybody else interested in dynamic CMS clusters? Having
someone else to talk to would already be a benefit. I am thinking of
writing a design document. Besides the CMS core developers, are there
any users that would like to contribute/participate?
3. Maybe someone wants to volunteer an implementation? It is a cool
project, and I wouldn't be surprised if someone did. My time is very
limited, otherwise I would do it myself. I could happily consult,
review and provide feedback. If no-one wants to do it, I`ll look for
an interested student from high-school as a small final-year project
(or something like that).
4. Is this something upstream is interested in merging? This could be
anywhere between a completely separate project that just generates
cms.conf and generically restarts cmsResourceService, to a built-in
service in the CMS code-base. That would mean a dependency on one of
the (consul/etcd/zookeeper/etc). What do core developers think?

... not fully related ...
5. Is it possible to have cmsChecker, cmsEvaluationService and
cmsScoringService on >1 node at a time?

Thanks,
Motiejus Jakštys

[1]:
https://scontent-ams3-1.xx.fbcdn.net/hphotos-xtp1/t31.0-8/10712382_1582190968693324_5182991449106511357_o.png


--
Motiejus Jakštys


Other related posts: