[nanomsg] nanomsg: the big picture

  • From: Martin Sustrik <sustrik@xxxxxxxxxx>
  • To: nanomsg <nanomsg@xxxxxxxxxxxxx>
  • Date: Thu, 05 Sep 2013 09:07:50 +0200

Hi all,

There have been several discussions both on mailing list and on the IRC channel that can't be really resolved without understanding the big picture of where nanomsg is going. There's an long email about monitoring from Paul from yesterday, but also several discussions about service discovery, DNS and such are relevant to the topic.

This should be probably written down as a more formal article, however, let me just briefly describe the vision at the moment.

From user's perspective nanomsg is basically done. Although the implementation is not perfect yet, the conceptual framework (sockets, scalability protocols, topologies, etc.) is unlikely to change in the future.

However, from administrative perspective almost nothing have been done yet and even the very concepts are not defined. And that's what I am trying to do here.

The main idea is strictly separating the user API from from the admin API, or mechanism from policy, if you will.

To get the idea, think of TCP. The user establishes the connection, then sends and receives data, but is completely ignorant about the underlying network infrastructure. Is there a simple cable between the two boxes? Is there a LAN? Are 10 IP hops involved? Have an intermediary IP router crashed somewhere on the path and have it been routed around? The user never knows.

Now there are admins who administer the network. These see all these components and issues and work hard to make the whole thing working. However, the point is that they don't do that directly via IP or TCP. They use specialised administrative interfaces such as SNMP.

So, in the end you have two distinct APIs and two clearly delineated sets of users: programmers and admins. The former write business logic, the latter take care that the infrastructure is working.

Let's apply the above to nanomsg now.

The idea would be to shield the user from the details of the topology same way as the TCP user doesn't see all the routers on the path. This can be done by user connecting to a topology ("market data feed"), rather than to a specific endpoint ("129.168.0.111:5555"):

    nn_connect (s, "topology://market-data-feed");

I am not going to disucss implementatation details here, but the idea is that admins store the actual info about topology setup in a distributed database (such as DNS) and that the library translates topology://market-data-feed" into actual endpoint(s) to connect to by querying the database.

Thus, from the user perspective, they are connecting to a "cloud" called "market-data-feed" without being aware of its internal structure:

[back1.png]

It's up to admins to define the internal structure. During the development phase it may be something simple like:

[back2.png]

When deployed to production it may be more complex to account for administrative and geographical boundaries:

[back3.png]

The main point is: There's no difference visible to the user between the cases. This allows admins to optimise and re-structure the topology without affecting the applications.

End of chapter one.

Now forget about users and imagine you are an admin tasked with maintaining a topology. What kind of tools do you need to do your job?

First, you need a way to update the distributed database (such as DNS) so that you can configure the topology according to your needs.

Second, you need a tool to check whether the topology is working as expected.

As for the latter, the statistics from the entire topology must be collected and presented to the admin in such a comprehensive way:

[back4.png]

So, the admin looks at the graph above, sees there's an disconnection between two intermediary devices and can actually do something about it. Is a network connection broken? Or maybe he just set the address wrong in the DNS? Etc.

As I want to keep this email as short as possible, I won't elaborate further, however, it's easy to see the implications of the conceptual model described above. For example, the admin wants to check the topology as a whole so querying local logs on individual machines probably won't fly.

Martin

Attachment: back1.png
Description: PNG image

Attachment: back2.png
Description: PNG image

Attachment: back3.png
Description: PNG image

Attachment: back4.png
Description: PNG image

Other related posts: