[nanomsg] The monitoring for nanomsg

From: Paul Colomiets <paul@xxxxxxxxxxxxxx>
To: nanomsg@xxxxxxxxxxxxx
Date: Thu, 5 Sep 2013 01:29:01 +0300
Hi,

This thread tries to summarize what should be implemented in nanomsg
for the state of the art monitoring support. Everything is IMO, so
feel free to discuss.

There are basically three separate tasks for monitoring:

1. Logging
.
2. Statistics

3. Topology info

I think they are different enough so that it would be wrong to mix
them in single protocol. So I discuss each sepately.

Note as outlined in ticket for logging [1] all three interfaces are
targeted at administrators rather than programmers. So they should be
as transparent as possible for programmers.


Logging
==========

Logging should should be used for the erroneous situations that can't
be delivered otherwise through the API. The examples are:

1. DNS name can't be resolved

2. An extra connection attempt for NN_PAIR socket

But, I think that logging should not cover the situations that (a)
library can fix without intervention and (b) can be generated thousand
times a second, for example:

1. Socket disconnect in the middle of the message

2. Connection limit reached

The point (a) is pretty weak, so common sense might be applied. The
(b) however, is necessary to avoid filling logs while being
DoS-attacked. E.g. if socket disconnected in the middle of the
message, it might be network overloaded and logging messages for that
only add up traffic and doesn't help to solve the problem. So this
kind of warnings should be generated at the monitoring system itself
using the Statistical data discussed in the next section. In other
words: the delivered correct numeric values are much more reliable
proof of system's health than an empty log. So the logging subsystem
should most of the time give hints about misconfigurations on the
system, rather than identify transient errors or check the current
health of the system.

This kind of limitation of what log messages are, give us two
important decisions:

1. There are no log levels. You may think of it as log level always
ERROR. This gives us smaller number of runtime configuration needed
which is A Good Thing.

2. The log messages are text. That whats all admin tools work with.

The next thing is API. In [1] Luca proposes an programmatic API for
logging, which is (simplified):

typedef void (*nn_log_callback)(char *msg);

NN_EXPORT void nn_log_register(nn_log_callback cb);

It very complex to write right callback, because of the following requirements:

1. The callback must be thread safe, and can be called from worker thread

2. The callback must be reentrant

3. The callback must be non-blocking

4. In general, the callback must not use any API from nanomsg (except
perhaps nn_strerror)

So it's unclear how to write the callback that sends a messages to
syslog. Writing messages to a file is not safe either, because it may
block for some time too.

So I think its better to define a socket-based API. It may look like

int sock = nn_socket(AF_SP, NN_PUSH);
nn_connect(sock, "ipc://log.socket");
nn_log_register(sock);

There are several other variations of registering a socket possible,
but they can be discussed when it's clear that sockets should be used
for log messages. Similarly exact format for log message should be
discussed. Probably it should be what rsyslog-zeromq uses, or
something similar.

The pros of the socket-based API:

* It allows to configure logging messages to be delivered to a correct
thread (via inproc transport), to local syslog-like daemon (via ipc
transport), or to network via any nanomsg transport

* Reuse of configuration of sockets (addresses, buffers..). The app
that uses *.ini file to configure other sockets, can use same code to
configure logging

* Reuse of infrastructure, e.g. devices for sending logs by network.
Same encryption keys when eventually that is implemented. Any nanomsg
transport, and so on.

* Easy to read the logs with nanocat, no matter how logs are actually stored


Statistics
===============

The examples of statistical data:

1. Number of messages per second sent/received through socket

2. Number of connections per socket/app

3. Number of connection attempts/rejected connections/dropped
connections over period of time

Anything that might be helpful for knowing the health of the system
belongs here. To keep subsystem simple, I assume that only numerical
data is collected.

Probably same argument for socket-based API holds here. So this data
should be delivered through PUB socket at regular intervals using some
simple to parse protocol. For example graphite [2] or ESTP [3]. The
graphite message looks like:

socket.1.messages 4 1378318683

The nanomsg pubsub protocol allows multiple monitoring services to
keep an eye on that values, filter messages and so on. So it seems to
be perfect fit. (Why not SNMP is discussed below).

Note: I don't propose use surveyor here because that will add random
jitter to time when the value is reported. Also using multiple
surveyors is more resource consuming than using several subscribers.
Also each surveyor will get a slightly different data.

For the case of DoS attack or network overload, the port used for
statistics can be prioritized over other data. The traffic for
statistics is fairly low, and what is more important the amount of
traffic is predictable (fixed amount of records per socket). Also
important things like messages per seconds should be sent through the
wire as incrementing counters of messages sent since the creation of
the socket instead of plain messages per second, so that loosing
single message doesn't hurt the statistics (that's why I prefer ESTP
over graphite protocol).


Topology Info
============

It would be nice to have the topology graph drawn for us. Building it
from log (as proposed in [1]) is not feasible, because in case of
disaster log may miss few important records. Also as outlined above I
do see it wrong to log each connect and disconnect attempt. The
statistical data is not enough to build the graph too, because it
shows only number of connections not the actual peers. So the
collection of sockets and their peers should be sent separately.

Comparing with the statistics data, this topology info is:

1. Changed sparsely and is not sensitive to survey time jitter

2. Potentially contains lots of data (e.g. lots of inactive connections)

3. Occasionally should be updated immediately (button in web panel, or
a command-line tool)

So I believe it fits SURVEYOR pattern nicely. I.e. the monitoring
system sends periodic (say once a minute) survey, and every process
replies with a serialized list of endpoints and peers. In case user
open web panel and push refresh button, the survey can be sent right
away.

In the future a delta protocol can be invented, so that survey
contains timestamp of topology it last received and only changes are
transferred.

The subtle things here are:

1. Naming the sockets

2. Devices

The naming sockets will be discussed in a separate section, as it is
relevant for logging and statistics too.

Without info on devices, the graph may contain node hostnames and a
number of connections between each pair of nodes. It's not very
interesting, as in my projects almost complete graph would be drawn.
Standard devices (those created by nn_device) are easy to report. But
there are plenty of use-cases for other devices. So to correctly
display devices we need one of the following:

1. Name convention for sockets (see below)

2. Annotating the socket with details about device using setsockopt

3. API, for attaching arbitrary annotations to the topology info

4. As survey can be replied infinite number of times, application can
answer survey itself alongside with internal reply by nanomsg. This
effectively gets #3, but seems to be quite ugly.

I'm not sure what to choose. Note, however, that device can consist of
any number of sockets not just two. E.g. it can be a socket which
packs multiple topologies into a single cross-data-center connection.

Also a serialization format have to be chosen. I'm pretty sure there
is no suitable format for topology data. So I think we should make
something up based on msgpack or json.


Identifying the Sockets
================

The data would only be useful if all sockets can be identified easily
in log messages, statistical data and topology info. I propose to
borrow format from the ESTP. The latter declares basically the
following structure of the name:

<hostname>:<app_name>:<resource>

(the :<metric> part is skipped as it's relevant only for statistics).
Basically we can do names like this:

org.nanomsg.example:nanomsg.1234:socket.anonymous.7

Where 1234 is an pid of the process and 7 is a socket number. The host
name should probably be the (reversed) name returned by gethostname()
or `hostname --fqdn`, and doesn't need to be configurable in nanomsg.

The app_name and resource as are automatically generated like above
clearly identify the socket at any given moment, but are garbage in
the long term. So they should be overriden with the socket option:

nn_setsockopt(7, NN_SOL_SOCKET, NN_SOCK_NAME,
    "request_db", 9);
nn_setsockopt(7, NN_SOL_SOCKET, NN_SOCK_APP_NAME,
    "myapp.1", 7);

The above should turn name into:

org.nanomsg.example:myapp.1:socket.request_db

Not sure about app name. It may be set on the statistics submitter
socket, or nn_setsockopt(-1, NN_GLOBAL, ...) or another global API
function may be invented.


Random Thoughts
=============

I use the term "monitoring software", but does not describe what it
is. I think it should be obvious. In the long term special software
may be written for handling all the nanomsg specific stuff. In the
short term the existing solutions should work well. E.g. for logging
the rsyslog is obvious candidate, the tiny plugin for submitting the
data must be written however (there are one for zeromq). For
statistics either graphite or collectd may be used with tiny plugin
either (there are ones for zeromq too). Or even nanocat can be used to
submit data to these ones or variety of other monitoring systems (e.g.
nagios mostly uses command-line utils to submit everything).

AFAIK, there is no existing software to draw the topology. But with
proposed solution the 15-minute command-line script in python can
produce *.dot file for the topology. In the long term some better
software will appear.

In-between the application process and the monitoring software all the
standard devices can be used, e.g. to gather data from all processes
locally and send it through single socket to the monitoring. No
special software needed.

Why not SNMP? That's a very good question. Note that SNMP may only be
used for Statistical data, AFAICS. The reasons why I don't want SNMP
are:

1. It doesn't support all infrastructure we have in nanomsg: devices,
transports (and encryption that will eventually be added), etc. It
have it's own devices and encryption, see #2.

2. It's another thing to know. However, it might be argued that SNMP
is already known by admins. But it's also another *complex* (see
below) thing to know by nanomsg developers. I would say it's more
complex than nanomsg itself, given the amount legacy the protocol
collected for decades.

3. AFAIU, separate daemon needed on each node to answer SNMP requests

4. The OIDs management is ugly. They are long dotted integers that's
complex to read, complex to search for, complex to create for your own
app.

5. The protocol is pretty incomprehensible by me, and probably to 99%
users of nanomsg too (subjective, yeah). The wikipedia page lists 27
RFCs for SNMP. Ah, well, it's not the real reason. Real reason is:
it's hard to explain how it works and how to find out those OID's in
five minutes, unlike for the ESTP.

Anyway the proposed solution leaves the gap where one can write a
daemon that collects statistics by pubsub  locally and turns it into
SNMP. The daemon is required even if nanomsg would implement some
SNMP-related functionality, so nothing is lost.

I believe that patterns described here match their function well.
Maybe the special pattern(s) for monitoring data could be invented.
But I see the current patterns as low level building blocks. And it's
a big win that the whole monitoring system can be built on top of
existing patterns. It's also a good example for dissection a task into
a small patterns.

For logging and statistics it's nice that same socket can be used for
delivering application-specific data, so that nanomsg establishes a
standard for logging and statistics for applications built on top of
it. Nevertheless the standard is fully optional to use.

Thoughts?


[1] https://github.com/250bpm/nanomsg/issues/81
[2] http://graphite.readthedocs.org/en/1.0/feeding-carbon.html
[3] https://github.com/estp/estp

-- 
Paul
Follow-Ups:
- [nanomsg] Re: The monitoring for nanomsg
  - From: Paul Baxter
- [nanomsg] Re: The monitoring for nanomsg
  - From: Paul Colomiets
[nanomsg] The monitoring for nanomsg

Other related posts: