In my time on this project I have observed some disagreements that seem intractable: There has been a lot of discussion about implementing security/encryption/cryptography in nanomsg, going back years There has been months of discussion and a failed patch regarding #324 <https://github.com/nanomsg/nanomsg/issues/324>, which is a design error that creates undesirable coupling between certain protocols and transports, and is an issue that I think has been worked around in some fashion by every 3rd-party protocol/transport contributor that I know of in various incompatible ways There is now discussion about transports, whether they are “content aware” or they are “content dumb”, or “protocol aware” or “protocol dumb” in the context of opcodes for WS. The issue arises because WS as specified by W3C has elements of both a nanomsg protocol and a nanomsg transport, but so far has been implemented as a transport and not as a protocol. I believe these problems (and maybe others) can be solved by introducing a more robust software architecture for sockets. However as far as I’m aware nobody has proffered a particular alternative architecture or explains how it solves any problem. I would like to proffer such an architecture and explain how it solves, or substantially improves, all three of these problems. I am most familiar with the security/encryption problem because I have solved it. I have worked on this for over a year and my solution is currently deployed to around 10k users as part of a commercial project. For reasons that will become clear, my solution so far only works for REQ/REP, and next year I have a requirement to get it working for at least one other protocol family. So keep in mind that at the end of the tunnel for this architecture problem is potentially getting commercially-developed security contributed back to core. The problem with doing cryptography really reduces to the following situation. You probably want to enforce message integrity across a complex and multi-hopped network. This implies injecting some code to sign and verify messages *near the application layer, above any scalability protocols*. Meanwhile you want to encrypt and decrypt traffic on a hop-by-hop basis, to stop an NSA-level adversary from seeing (ciphertext, but identical) packets moving from hop to hop and thereby deducing who is talking to who. This implies injecting some code *right above the transport layer*, to handle the hop-by-hop situation. It turns out it is also useful to inject code in more places, to handle sessions, and some other crypto-related problems. However nanomsg sockets have only 2 slots of customization. A protocol, and a transport. So if your integrity code lives in the protocol slot, and your hop-by-hop encryption lives in the transport slot, then you have no slots to specify a scalability protocol nor a transport! The way I solve this at the moment is a terrible tower of hacks. Some of those hacks have wandered upstream, for example my work on modular devices is at its core a way to inject code into more places as a packet traverses a network than the 2 customization slots that nanomsg recognizes. What I’d like to propose is instead of having 2 fixed slots, a nanomsg socket to be composed of a stack of components of arbitrary length. Each component has its output piped to the input of the next component, like processes in a Unix shell. We call the bottom-most component which talks to the network a transport, and we call all other components, that each talk to the next component in the dataflow, a protocol. [protocol]->[protocol]->[protocol]->[transport] <———————> [transport]->[protocol]->[protocol]->[protocol] socket 1 socket 2 In this way what I am proposing is not a radical departure from the existing architecture, but rather a generalization of the existing architecture to sockets with arbitrary numbers of protocols; a way to extend the architecture to more kinds of sockets than can be realized today. In particular, it’s backwards compatible; all existing sockets can be represented very naturally in this architecture, as sockets of length 2 with 1 protocol and 1 transport (or, in another way, that I propose later in this email). This architecture solves the security problem in the following way. It allows me to inject security code at arbitrary places in the network stack such as [Integrity] -> [Scalability Protocol] -> [Hop-by-hop encryption] -> [Transport] where the first and third component would be protocols new to nanomsg, while the second and fourth components are existing components. This architecture solves several implementation problems for WebSockets in the following way. It would be possible to build for example [WS Opcodes Protocol] -> [Scalability Protocol] -> [WS Protocol] -> [TCP or TLS] This splits WS implementation into 2 protocols, one that can handle opcode switching above scalability layer and another that handles the bulk of WS below scalability layer. These protocols could be used together, in the case that opcode support is desired, or with just the main WS protocol, if opcode support is not desired. Finally, the actual transport (TCP or TLS, which IIRC isn’t supported in the current WebSocket “transport”) is moved out into a proper transport, where it is pluggable and interchangeable, both for WS and also for any ordinary non-WS socket to use. Under this scheme the *actual transport* really is just a dumb pipe, which has been one important philosophical objection to WS opcodes. Nor is WS coupled to the *actual transport*, another philosophical objection that has been raised to coupling between protocols and transports. This architecture also provides a clear path to more robust headers, which I think at present is non-intuitive and leads to unexpected situations (like #324). With this architecture, one would simply walk down a socket’s stack and ask each protocol/transport how many bytes of overhead for headers it would like to reserve [0]. Then nanomsg can do one up-front allocation of the correct size for the message. We could even standardize on a struct describing the header format being declared by each protocol, [1] so that for a well-known stack the header can be easily parsed, for debugging or any other purpose. Finally, in addition to fixing those 3 problems, this proposal simplifies the implementation of at least one existing practice, that of SP vs RAW sockets. Currently SP vs RAW is implemented in a pseudo-OO fashion, where SP sockets are essentially a subclass of their RAW counterparts, selectively overriding certain methods and calling into the superclass as appropriate by the use of method tables. Whereas under the proposed architecture, SP sockets have a natural dataflow representation: [SP socket] -> [RAW socket] -> [Transport] In conclusion I think this architecture makes significant progress on, if not completely solves, many problems that have been previously intractable, and also improves things not currently contemplated as problems. It also manages to unify many concerns into a common design that previously we have been studying separately. Finally as a generalization on the existing system, it is backwards compatible and not a radical departure from the previous design. Really this proposal is a very old proposal, as old as the OSI layer model itself. We long ago decided that layering was fundamental to networking, and I think it is time we brought this philosophy into nanomsg itself. Some problems remain. One problem is how to create these many-protocoled sockets from an API perspective. Another problem is how to implement ispeer(). I’m sure there are other issues on both API and implementation levels. However I think that if we can reach a broad consensus on architecture in the abstract, that API and implementation issues will prove much more tractable, than the very intractable problems we were arguing before over long periods of time. If on the other hand we don’t reach some consensus on improving the architecture, I think we should be concerned. I’m already shipping in production an extensive cryptographic fork that cannot make it back into upstream because it depends on finding solutions to the architecture problems outlined here. We are now facing a second decision point in that a second commercial user has appeared who cannot fit well into the existing architecture, and it will be the second time that someone works around the architecture and invests resources into yet another fork which cannot for technical reasons swim upstream. If we fail to act now then I see no reason why this trend will not continue, and rob nanomsg of free contributions that would otherwise raise the tide for all boats. I also see no reason why, once enough affected users have accumulated, they wouldn’t consolidate themselves into a unified fork that addresses their architecture concerns, and devote their resources to that. I don’t mean to get apocalyptic, and I think we have some time to look at this problem and reach a carefully-weighed conclusion that balances a lot of competing concerns. I think for the first time we have, if nothing else, a *specific* proposal that purports to address particular problems in a particular way, rather than general criticism that the existing architecture fails to contemplate some use case. Historically I have been responsible for the latter, I hope now to be more responsible for the former. I look forward to seeing if we can reach a consensus on this architecture or something like it that can unify many of our pet problems into a common purpose. Drew [0] As an optimization, each component in the stack could specify whether it is happy to work with headers in a separate buffer, or require headers to be included in the message body. At send-time, the protocols would get as send() arguments a headers* and body* buffer that, if the entire stack agreed, happens to be 2 different buffers, or if one or more protocols disagreed, happens to be a contiguous buffer. I think this resolves the inproc/ipc discrepancy, since inproc can declare its willingness to store headers in a noncontiguous buffer, while ipc can insist on a contiguous buffer, and under the C specification, code written for non-contiguous buffers always works in the contiguous case. And as a notable improvement against any competing proposal on the headers issue, security protocols could also force contiguous buffers, so they could insist on encrypting headers and bodies together, which is an important design issue for them. The result is a fast, zero-copy implementation where that makes sense, and a single-copy implementation in all other cases. [1] Although this specific item would imply the use of #pragma pack, which is not strictly portable. I think most modern compilers support it, but it’s not a standard.