[contestms-dev] Re: Multiple processes vs multiple services in CMS

  • From: Luca Wehrstedt <luca.wehrstedt@xxxxxxxxx>
  • To: contestms-dev@xxxxxxxxxxxxx
  • Date: Sun, 17 Nov 2013 10:47:01 +0100

Some random remarks, as they come to my mind.

The "natural" way to write an importer is to subclass BaseLoader [1] and
add support for your format: this allows it to be plugged into a generic
importer [2] and reimporter [3]. See YamlLoader [4] as an example (for the
italian format).
Unfortunately this has been designed as a "one-shot" process: it doesn't
support the continuous polling you'd like to have. On the other hand, a
service that rebuilds and reimports test data on new commits may be useful
for any task format. Therefore I think that:
- the second of your "three main parts" should be written as a Loader:
given a directory, it should build the data it finds there (if needed) and
construct an in-memory Contest object (with all its children objects) that
Reimporter will then sync into the database;
- the first part should become a new separate service, that watches a git
repository and, on any new commit, fires a reimport (this will be generic
and will support all available loaders, including the one of the point
above).
We can discuss if this really needs to be a full-featured Service
(specified in the configuration, with the ability to send and receive RPC
calls, started by ResourceService, etc.) or just a long-running
command-line script.

As for the monitoring web interface I think that if we put in in AWS then
it has to be available for all importers, that means that it has to be
handled by (Re)Importer, and not by the Loader. Because of the "one-shot"
design I don't think it's easy to provide a per-Loader monitoring web
server.

These are my personal thoughts, not necessarily shared by the other senior
developers (who, I hope, will soon comment on their own).

Luca

PS: We try to avoid the need for multiprocessing and multithreading by
using gevent, a coroutine-based green-threading library to simulate threads
in an async loop.

[1] https://github.com/cms-dev/cms/blob/master/cmscontrib/BaseLoader.py
[2] https://github.com/cms-dev/cms/blob/master/cmscontrib/Importer.py
[3] https://github.com/cms-dev/cms/blob/master/cmscontrib/Reimporter.py
[4] https://github.com/cms-dev/cms/blob/master/cmscontrib/YamlLoader.py


On Sun, Nov 17, 2013 at 12:34 AM, Ludwig Schmidt
<ludwigschmidt2@xxxxxxxxx>wrote:

> Hi all,
>
> Fabian and I are currently working on a new service for CMS that
> automatically syncs the task data from a git repository into CMS (more on
> this later). The service consists of three main parts:
>
> - One part polls the given git repository regularly and inserts new work
> entries into the database if it finds new commits.
>
> - Another part is responsible for actually building the tasks in the git
> repository (running test case generators etc.) and syncing them into CMS.
>
> - The third part is a simple web interface showing the state of the
> syncing process (with history etc.).
>
> Currently we are debating how we should implement these three parts in
> CMS. The two main questions are:
>
> - Should the git polling process and the sync process be separate services
> communicating via RPCs? Or should they be two processes in the same
> program, using python's multiprocessing library?
>
> - Should we add the web UI into the admin web server or write a separate
> web service for it?
>
> Ideally, we would eventually like to merge these changes back into CMS as
> a new task importer. So getting your feedback on the overall design would
> be very helpful.
>
> Best,
> Ludwig
>

Other related posts: