[nanomsg] towards a more robust socket model

  • From: Drew Crawford <drew@xxxxxxxxxxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Sun, 16 Nov 2014 01:51:46 -0600

In my time on this project I have observed some disagreements that seem 
intractable:

There has been a lot of discussion about implementing 
security/encryption/cryptography in nanomsg, going back years
There has been months of discussion and a failed patch regarding #324 
<https://github.com/nanomsg/nanomsg/issues/324>, which is a design error that 
creates undesirable coupling between certain protocols and transports, and is 
an issue that I think has been worked around in some fashion by every 3rd-party 
protocol/transport contributor that I know of in various incompatible ways
There is now discussion about transports, whether they are “content aware” or 
they are “content dumb”, or “protocol aware” or “protocol dumb” in the context 
of opcodes for WS.  The issue arises because WS as specified by W3C has 
elements of both a nanomsg protocol and a nanomsg transport, but so far has 
been implemented as a transport and not as a protocol.

I believe these problems (and maybe others) can be solved by introducing a more 
robust software architecture for sockets.  However as far as I’m aware nobody 
has proffered a particular alternative architecture or explains how it solves 
any problem. I would like to proffer such an architecture and explain how it 
solves, or substantially improves, all three of these problems.

I am most familiar with the security/encryption problem because I have solved 
it.  I have worked on this for over a year and my solution is currently 
deployed to around 10k users as part of a commercial project.  For reasons that 
will become clear, my solution so far only works for REQ/REP, and next year I 
have a requirement to get it working for at least one other protocol family.  
So keep in mind that at the end of the tunnel for this architecture problem is 
potentially getting commercially-developed security contributed back to core.

The problem with doing cryptography really reduces to the following situation.  
You probably want to enforce message integrity across a complex and 
multi-hopped network.  This implies injecting some code to sign and verify 
messages *near the application layer, above any scalability protocols*.  
Meanwhile you want to encrypt and decrypt traffic on a hop-by-hop basis, to 
stop an NSA-level adversary from seeing (ciphertext, but identical) packets 
moving from hop to hop and thereby deducing who is talking to who.  This 
implies injecting some code *right above the transport layer*, to handle the 
hop-by-hop situation.  It turns out it is also useful to inject code in more 
places, to handle sessions, and some other crypto-related problems.

However nanomsg sockets have only 2 slots of customization.  A protocol, and a 
transport.  So if your integrity code lives in the protocol slot, and your 
hop-by-hop encryption lives in the transport slot, then you have no slots to 
specify a scalability protocol nor a transport!

The way I solve this at the moment is a terrible tower of hacks.  Some of those 
hacks have wandered upstream, for example my work on modular devices is at its 
core a way to inject code into more places as a packet traverses a network than 
the 2 customization slots that nanomsg recognizes.

What I’d like to propose is instead of having 2 fixed slots, a nanomsg socket 
to be composed of a stack of components of arbitrary length.  Each component 
has its output piped to the input of the next component, like processes in a 
Unix shell.  We call the bottom-most component which talks to the network a 
transport, and we call all other components, that each talk to the next 
component in the dataflow, a protocol.

[protocol]->[protocol]->[protocol]->[transport]    <———————> 
[transport]->[protocol]->[protocol]->[protocol]

socket 1                                                                        
                           socket 2


In this way what I am proposing is not a radical departure from the existing 
architecture, but rather a generalization of the existing architecture to 
sockets with arbitrary numbers of protocols; a way to extend the architecture 
to more kinds of sockets than can be realized today.  In particular, it’s 
backwards compatible; all existing sockets can be represented very naturally in 
this architecture, as sockets of length 2 with 1 protocol and 1 transport (or, 
in another way, that I propose later in this email).

This architecture solves the security problem in the following way.  It allows 
me to inject security code at arbitrary places in the network stack such as

[Integrity]  ->  [Scalability Protocol]  ->  [Hop-by-hop encryption]  ->  
[Transport]

where the first and third component would be protocols new to nanomsg, while 
the second and fourth components are existing components.

This architecture solves several implementation problems for WebSockets in the 
following way.  It would be possible to build for example

[WS Opcodes Protocol]  -> [Scalability Protocol]  ->  [WS Protocol]  -> [TCP or 
TLS]

This splits WS implementation into 2 protocols, one that can handle opcode 
switching above scalability layer and another that handles the bulk of WS below 
scalability layer.  These protocols could be used together, in the case that 
opcode support is desired, or with just the main WS protocol, if opcode support 
is not desired.  Finally, the actual transport (TCP or TLS, which IIRC isn’t 
supported in the current WebSocket “transport”) is moved out into a proper 
transport, where it is pluggable and interchangeable, both for WS and also for 
any ordinary non-WS socket to use.  Under this scheme the *actual transport* 
really is just a dumb pipe, which has been one important philosophical 
objection to WS opcodes.  Nor is WS coupled to the *actual transport*, another 
philosophical objection that has been raised to coupling between protocols and 
transports.

This architecture also provides a clear path to more robust headers, which I 
think at present is non-intuitive and leads to unexpected situations (like 
#324).  With this architecture, one would simply walk down a socket’s stack and 
ask each protocol/transport how many bytes of overhead for headers it would 
like to reserve [0].  Then nanomsg can do one up-front allocation of the 
correct size for the message.  We could even standardize on a struct describing 
the header format being declared by each protocol, [1]  so that for a 
well-known stack the header can be easily parsed, for debugging or any other 
purpose.

Finally, in addition to fixing those 3 problems, this proposal simplifies the 
implementation of at least one existing practice, that of SP vs RAW sockets.  
Currently SP vs RAW is implemented in a pseudo-OO fashion, where SP sockets are 
essentially a subclass of their RAW counterparts, selectively overriding 
certain methods and calling into the superclass as appropriate by the use of 
method tables.  Whereas under the proposed architecture, SP sockets have a 
natural dataflow representation:

[SP socket] -> [RAW socket] -> [Transport]

In conclusion I think this architecture makes significant progress on, if not 
completely solves, many problems that have been previously intractable, and 
also improves things not currently contemplated as problems.  It also manages 
to unify many concerns into a common design that previously we have been 
studying separately.  Finally as a generalization on the existing system, it is 
backwards compatible and not a radical departure from the previous design.  
Really this proposal is a very old proposal, as old as the OSI layer model 
itself.  We long ago decided that layering was fundamental to networking, and I 
think it is time we brought this philosophy into nanomsg itself.

Some problems remain.  One problem is how to create these many-protocoled 
sockets from an API perspective.  Another problem is how to implement ispeer(). 
 I’m sure there are other issues on both API and implementation levels.  
However I think that if we can reach a broad consensus on architecture in the 
abstract, that API and implementation issues will prove much more tractable, 
than the very intractable problems we were arguing before over long periods of 
time.

If on the other hand we don’t reach some consensus on improving the 
architecture, I think we should be concerned.  I’m already shipping in 
production an extensive cryptographic fork that cannot make it back into 
upstream because it depends on finding solutions to the architecture problems 
outlined here.  We are now facing a second decision point in that a second 
commercial user has appeared who cannot fit well into the existing 
architecture, and it will be the second time that someone works around the 
architecture and invests resources into yet another fork which cannot for 
technical reasons swim upstream.  If we fail to act now then I see no reason 
why this trend will not continue, and rob nanomsg of free contributions that 
would otherwise raise the tide for all boats.  I also see no reason why, once 
enough affected users have accumulated, they wouldn’t consolidate themselves 
into a unified fork that addresses their architecture concerns, and devote 
their resources to that.

I don’t mean to get apocalyptic, and I think we have some time to look at this 
problem and reach a carefully-weighed conclusion that balances a lot of 
competing concerns.  I think for the first time we have, if nothing else, a 
*specific* proposal that purports to address particular problems in a 
particular way, rather than general criticism that the existing architecture 
fails to contemplate some use case.  Historically I have been responsible for 
the latter, I hope now to be more responsible for the former.

I look forward to seeing if we can reach a consensus on this architecture or 
something like it that can unify many of our pet problems into a common purpose.

Drew

[0] As an optimization, each component in the stack could specify whether it is 
happy to work with headers in a separate buffer, or require headers to be 
included in the message body.  At send-time, the protocols would get as send() 
arguments a headers* and body* buffer that, if the entire stack agreed, happens 
to be 2 different buffers, or if one or more protocols disagreed, happens to be 
a contiguous buffer.  I think this resolves the inproc/ipc discrepancy, since 
inproc can declare its willingness to store headers in a noncontiguous buffer, 
while ipc can insist on a contiguous buffer, and under the C specification, 
code written for non-contiguous buffers always works in the contiguous case.  
And as a notable improvement against any competing proposal on the headers 
issue, security protocols could also force contiguous buffers, so they could 
insist on encrypting headers and bodies together, which is an important design 
issue for them.  The result is a fast, zero-copy implementation where that 
makes sense, and a single-copy implementation in all other cases.

[1]  Although this specific item would imply the use of #pragma pack, which is 
not strictly portable. I think most modern compilers support it, but it’s not a 
standard.





Other related posts: