[nanomsg] Re: FYI: SmartStack, Airbnb's thinking on service discovery

  • From: Paul Colomiets <paul@xxxxxxxxxxxxxx>
  • To: "nanomsg@xxxxxxxxxxxxx" <nanomsg@xxxxxxxxxxxxx>
  • Date: Fri, 25 Oct 2013 00:20:55 +0300

Hi Dirkjan,

On Thu, Oct 24, 2013 at 9:50 AM, Dirkjan Ochtman <dirkjan@xxxxxxxxxx> wrote:
> This might provide helpful input to our discovery stuff:
>
> http://nerds.airbnb.com/smartstack-service-discovery-cloud/
>

Nice article. I have many similar ideas about configuration service.
Here are few controversial points:

1. They use haproxy everywhere. In our company we use zeromq device
that is local on every node too. I still don't know whether it should
be:

a) Special pattern in configuration service

b) Configured with the generic config service rules (too verbose I think)

c) Hard-coded to our apps, so config service sees only this per-box
device and not the individual processes

2. The name/configuration service is tightly coupled with monitoring.
I'm thinking on how to do the integration without too much coupling,
because many useful healthchecks are very application specific.
However, this is smaller problem for nanomsg (see below).

3. Many projects strive toward the idea that service registers it's
presence when starting up, rather than having map of the services in
the central config. All my config service prototypes provide latter.
I'm still not convinced of either way. Although, nanoconfig can handle
both, it's hard to write name service that can. And nanoconfig can
handle only single name service in a process (should it be fixed?)

4. In the real world name service should not assume there are only
nanomsg services. So it should probably handle others too. However, if
the airbnb uses HAProxy for the access to redis (for example), then
the idea to proxy redis with nanomsg protocol is probably worth
trying. Also looking towards Zookeeper integration makes sense too.

There are also things where nanomsg-based services are already
superior (I think) to what airbnb have:

(I) In nanomsg the health of the whole system is assumed to be kept by
load-balancing, priorities and TCP pushback, rather than by removing
dead nodes from cluster (this makes point 2 above weak)

(II) The airbnb guys say they restart HAProxy for certain kinds of
configuration changes to be applied. nanoconfig can already do that on
the fly. When graceful shutdown will be implemented, nanomsg will not
even loose any message on reconfig (unless backend is failing of
course).

Thoughts?

-- 
Paul

Other related posts: