[nanomsg] towards a more robust socket model

From: Drew Crawford <drew@xxxxxxxxxxxxxxxxxx>
To: nanomsg@xxxxxxxxxxxxx
Date: Sun, 16 Nov 2014 01:51:46 -0600

In my time on this project I have observed some disagreements that seem
intractable:

There has been a lot of discussion about implementing
security/encryption/cryptography in nanomsg, going back years
There has been months of discussion and a failed patch regarding #324
<https://github.com/nanomsg/nanomsg/issues/324>, which is a design error that
creates undesirable coupling between certain protocols and transports, and is
an issue that I think has been worked around in some fashion by every 3rd-party
protocol/transport contributor that I know of in various incompatible ways
There is now discussion about transports, whether they are “content aware” or
they are “content dumb”, or “protocol aware” or “protocol dumb” in the context
of opcodes for WS. The issue arises because WS as specified by W3C has
elements of both a nanomsg protocol and a nanomsg transport, but so far has
been implemented as a transport and not as a protocol.

I believe these problems (and maybe others) can be solved by introducing a more
robust software architecture for sockets. However as far as I’m aware nobody
has proffered a particular alternative architecture or explains how it solves
any problem. I would like to proffer such an architecture and explain how it
solves, or substantially improves, all three of these problems.

I am most familiar with the security/encryption problem because I have solved
it. I have worked on this for over a year and my solution is currently
deployed to around 10k users as part of a commercial project. For reasons that
will become clear, my solution so far only works for REQ/REP, and next year I
have a requirement to get it working for at least one other protocol family.
So keep in mind that at the end of the tunnel for this architecture problem is
potentially getting commercially-developed security contributed back to core.

The problem with doing cryptography really reduces to the following situation.
You probably want to enforce message integrity across a complex and
multi-hopped network. This implies injecting some code to sign and verify
messages *near the application layer, above any scalability protocols*.
Meanwhile you want to encrypt and decrypt traffic on a hop-by-hop basis, to
stop an NSA-level adversary from seeing (ciphertext, but identical) packets
moving from hop to hop and thereby deducing who is talking to who. This
implies injecting some code *right above the transport layer*, to handle the
hop-by-hop situation. It turns out it is also useful to inject code in more
places, to handle sessions, and some other crypto-related problems.

However nanomsg sockets have only 2 slots of customization. A protocol, and a
transport. So if your integrity code lives in the protocol slot, and your
hop-by-hop encryption lives in the transport slot, then you have no slots to
specify a scalability protocol nor a transport!

The way I solve this at the moment is a terrible tower of hacks. Some of those
hacks have wandered upstream, for example my work on modular devices is at its
core a way to inject code into more places as a packet traverses a network than
the 2 customization slots that nanomsg recognizes.

What I’d like to propose is instead of having 2 fixed slots, a nanomsg socket
to be composed of a stack of components of arbitrary length. Each component
has its output piped to the input of the next component, like processes in a
Unix shell. We call the bottom-most component which talks to the network a
transport, and we call all other components, that each talk to the next
component in the dataflow, a protocol.

[protocol]->[protocol]->[protocol]->[transport] <———————>
[transport]->[protocol]->[protocol]->[protocol]

socket 1
socket 2

In this way what I am proposing is not a radical departure from the existing
architecture, but rather a generalization of the existing architecture to
sockets with arbitrary numbers of protocols; a way to extend the architecture
to more kinds of sockets than can be realized today. In particular, it’s
backwards compatible; all existing sockets can be represented very naturally in
this architecture, as sockets of length 2 with 1 protocol and 1 transport (or,
in another way, that I propose later in this email).

This architecture solves the security problem in the following way. It allows
me to inject security code at arbitrary places in the network stack such as

[Integrity] -> [Scalability Protocol] -> [Hop-by-hop encryption] ->
[Transport]

where the first and third component would be protocols new to nanomsg, while
the second and fourth components are existing components.

This architecture solves several implementation problems for WebSockets in the
following way. It would be possible to build for example

[WS Opcodes Protocol] -> [Scalability Protocol] -> [WS Protocol] -> [TCP or
TLS]

This splits WS implementation into 2 protocols, one that can handle opcode
switching above scalability layer and another that handles the bulk of WS below
scalability layer. These protocols could be used together, in the case that
opcode support is desired, or with just the main WS protocol, if opcode support
is not desired. Finally, the actual transport (TCP or TLS, which IIRC isn’t
supported in the current WebSocket “transport”) is moved out into a proper
transport, where it is pluggable and interchangeable, both for WS and also for
any ordinary non-WS socket to use. Under this scheme the *actual transport*
really is just a dumb pipe, which has been one important philosophical
objection to WS opcodes. Nor is WS coupled to the *actual transport*, another
philosophical objection that has been raised to coupling between protocols and
transports.

This architecture also provides a clear path to more robust headers, which I
think at present is non-intuitive and leads to unexpected situations (like
#324). With this architecture, one would simply walk down a socket’s stack and
ask each protocol/transport how many bytes of overhead for headers it would
like to reserve [0]. Then nanomsg can do one up-front allocation of the
correct size for the message. We could even standardize on a struct describing
the header format being declared by each protocol, [1] so that for a
well-known stack the header can be easily parsed, for debugging or any other
purpose.

Finally, in addition to fixing those 3 problems, this proposal simplifies the
implementation of at least one existing practice, that of SP vs RAW sockets.
Currently SP vs RAW is implemented in a pseudo-OO fashion, where SP sockets are
essentially a subclass of their RAW counterparts, selectively overriding
certain methods and calling into the superclass as appropriate by the use of
method tables. Whereas under the proposed architecture, SP sockets have a
natural dataflow representation:

[SP socket] -> [RAW socket] -> [Transport]

In conclusion I think this architecture makes significant progress on, if not
completely solves, many problems that have been previously intractable, and
also improves things not currently contemplated as problems. It also manages
to unify many concerns into a common design that previously we have been
studying separately. Finally as a generalization on the existing system, it is
backwards compatible and not a radical departure from the previous design.
Really this proposal is a very old proposal, as old as the OSI layer model
itself. We long ago decided that layering was fundamental to networking, and I
think it is time we brought this philosophy into nanomsg itself.

Some problems remain. One problem is how to create these many-protocoled
sockets from an API perspective. Another problem is how to implement ispeer().
I’m sure there are other issues on both API and implementation levels.
However I think that if we can reach a broad consensus on architecture in the
abstract, that API and implementation issues will prove much more tractable,
than the very intractable problems we were arguing before over long periods of
time.

If on the other hand we don’t reach some consensus on improving the
architecture, I think we should be concerned. I’m already shipping in
production an extensive cryptographic fork that cannot make it back into
upstream because it depends on finding solutions to the architecture problems
outlined here. We are now facing a second decision point in that a second
commercial user has appeared who cannot fit well into the existing
architecture, and it will be the second time that someone works around the
architecture and invests resources into yet another fork which cannot for
technical reasons swim upstream. If we fail to act now then I see no reason
why this trend will not continue, and rob nanomsg of free contributions that
would otherwise raise the tide for all boats. I also see no reason why, once
enough affected users have accumulated, they wouldn’t consolidate themselves
into a unified fork that addresses their architecture concerns, and devote
their resources to that.

I don’t mean to get apocalyptic, and I think we have some time to look at this
problem and reach a carefully-weighed conclusion that balances a lot of
competing concerns. I think for the first time we have, if nothing else, a
*specific* proposal that purports to address particular problems in a
particular way, rather than general criticism that the existing architecture
fails to contemplate some use case. Historically I have been responsible for
the latter, I hope now to be more responsible for the former.

I look forward to seeing if we can reach a consensus on this architecture or
something like it that can unify many of our pet problems into a common purpose.

Drew

[0] As an optimization, each component in the stack could specify whether it is
happy to work with headers in a separate buffer, or require headers to be
included in the message body. At send-time, the protocols would get as send()
arguments a headers* and body* buffer that, if the entire stack agreed, happens
to be 2 different buffers, or if one or more protocols disagreed, happens to be
a contiguous buffer. I think this resolves the inproc/ipc discrepancy, since
inproc can declare its willingness to store headers in a noncontiguous buffer,
while ipc can insist on a contiguous buffer, and under the C specification,
code written for non-contiguous buffers always works in the contiguous case.
And as a notable improvement against any competing proposal on the headers
issue, security protocols could also force contiguous buffers, so they could
insist on encrypting headers and bodies together, which is an important design
issue for them. The result is a fast, zero-copy implementation where that
makes sense, and a single-copy implementation in all other cases.

[1] Although this specific item would imply the use of #pragma pack, which is
not strictly portable. I think most modern compilers support it, but it’s not a
standard.

Follow-Ups:
- [nanomsg] Re: towards a more robust socket model
  - From: Martin Sustrik
- [nanomsg] Re: towards a more robust socket model
  - From: Martin Sustrik
- [nanomsg] Re: towards a more robust socket model
  - From: Garrett D'Amore

[nanomsg] towards a more robust socket model

Other related posts: