[nanomsg] Re: accessing control IDs

From: Garrett D'Amore <garrett@xxxxxxxxxx>
To: "nanomsg@xxxxxxxxxxxxx" <nanomsg@xxxxxxxxxxxxx>
Date: Tue, 6 May 2014 18:23:08 -0700
Nanomsg is stateless from the point of view of the app.  Adding session state 
would be inappropriate.  The pipe is almost certainly insufficient by itself 
and recognize that it is subject to change as a result of connects or 
disconnects in the underlying transport. 

The information I would like to make available would allow for actual strong 
authentication on those transports that can support it, but admittedly it's 
only on receipt and it ties to a specific message so at the abstract level is 
still stateless. 

Sent from my iPhone

> On May 6, 2014, at 4:20 PM, Drew Crawford <drew@xxxxxxxxxxxxxxxxxx> wrote:
> 
> I don’t think putting it in the body is the right solution, or at least not 
> right for every case.
> 
> For one thing it requires allocating storage for this is in the messages.  It 
> is an interesting question how much storage is required but the naive 
> implementation would be a 128-bit GUID.  For a 4-byte messages this bloats 
> the network traffic significantly.  Particularly when there is a simple 
> solution with zero overhead—pull the pipe key from nanomsg.
> 
> For a another thing, ZeroMQ clearly and unambiguously supports this model 
> (via ROUTER).  So anyone porting ZeroMQ code is in for a rough time 
> implementing their own scheme atop nanomsg on all the codebases to get the 
> same behavior they had before.  There are of course legitimate reasons to be 
> incompatible with ZeroMQ (such as achieving compatibility instead with BSD 
> sockets) but I think this case is much more harmful than helpful and does not 
> isolate itself to one particular system or codebase.
> 
>> let’s think about cases where the client sending requests loses connection, 
>> then reconnects with a different address, or connects from multiple 
>> endpoints (mobile device, desktop computer, …), if remote endpoint 
>> information is provided by nanomsg all these endpoints will appear as 
>> different entities, whereas in your application logic they should be 
>> considered the same.
> 
> 
> You should probably not rely on transport-layer guarantees to authenticate a 
> user.  However *given* transport-layer information, it *becomes possible to 
> implement* many authentication schemes.
> 
> For example (and this is the problem that motivated this discussion) suppose 
> I have some decision oracle which can determine with complete certainty 
> whether a particular user intended to send a packet.  Then
> 
>> if (!oracle_user_sent_message(user,message)) {
>>   end_session();
>> }
> 
> Now if we know (or can guess) that John sent the message via transport-layer 
> information, the solution is straightforward.  But trying all the possible 
> users is impractical for a slow oracle.  So the transport-layer information 
> can comprise part of an application-level authentication scheme to identify 
> which user the oracle should be asked about.  Even for sockets that do not 
> have 1:1 fanout to users, the fanout may divide the users into enough buckets 
> that asking the oracle about each member in the bucket becomes practical.
> 
> Now of course we could prepend some session ID to the message rather than 
> rely on transport-level data, and bloat the message accordingly.  However 
> anybody who gets ahold of the session ID could spoof messages with that 
> session ID from anywhere on the network.  Now these would be rejected by our 
> perfect oracle, but not before ending the user’s session, comprising a DDoS 
> attack against the legitimate user.  Alternatively, relying on the TCP 
> information significantly increases the difficulty of the attack, requires a 
> TCP MITM technique or some other advanced persistent threat capability to 
> execute.  This is a major security and reliability advantage to relying on 
> TCP data in my situation.
> 
> Drew
> 
> 
> 
> 
>> On May 6, 2014, at 12:13 PM, Achille Roussel <achille.roussel@xxxxxxxxx> 
>> wrote:
>> 
>> Could you put the state information in the body of your message instead of 
>> attempting to get it from nanomsg. HTTP is also stateless but websites 
>> maintain more or less state using cookies, session ids or access tokens… 
>> maybe you can implement this at the application logic level.
>> 
>> I think it’s a sain design to have your transport protocol separated from 
>> your application logic, let’s think about cases where the client sending 
>> requests loses connection, then reconnects with a different address, or 
>> connects from multiple endpoints (mobile device, desktop computer, …), if 
>> remote endpoint information is provided by nanomsg all these endpoints will 
>> appear as different entities, whereas in your application logic they should 
>> be considered the same.
>> 
>>> On May 6, 2014, at 1:23 AM, Drew Crawford <drew@xxxxxxxxxxxxxxxxxx> wrote:
>>> 
>>>> There are many cases that require state full networking
>>>> 
>>> I’m in such a case.  The open question at this point is how to achieve it.
>>> 
>>> 
>>>> On May 5, 2014, at 10:52 PM, Apostolis Xekoukoulotakis 
>>>> <xekoukou@xxxxxxxxx> wrote:
>>>> 
>>>> Req rep were designed by default to be stateless, that is why finding the 
>>>> address of the message has been hidden on purpose.
>>>> 
>>>> There are many cases that require state full networking but state full is 
>>>> more difficult because it requires that you implement an update mechanism 
>>>> on the routing information.
>>>> 
>>>>> On May 6, 2014 5:02 AM, "Drew Crawford" <drew@xxxxxxxxxxxxxxxxxx> wrote:
>>>>> I have dug a little deeper into this.  it appears that in global.c [1] 
>>>>> msg_controllen is never set.  I’m not sure if that’s expected.
>>>>> 
>>>>> The attached patch sets controllen based on the size of the chunk.  
>>>>> Whether right or wrong, this seems to produce the behavior expected by 
>>>>> zerotacg and Achille, e.g., control bytes are emitted in the RAW case.  
>>>>> The 8 bytes are
>>>>> 
>>>>>> d0,4e,c0,00,c1,7f,00,00,
>>>>> 
>>>>> 
>>>>> Three of which (bytes[3],bytes[4],and bytes[5]) seem to change from 
>>>>> run-to-run.  This is mildly surprising, because the RFC documents the 
>>>>> control ID at being 32 bits, so one would expect four bytes to change 
>>>>> from one execution to the next.  I’m also unable to account for the 
>>>>> presence of the remaining bytes.  Something may be wrong with my patch, 
>>>>> or with my understanding of the codebase or RFC.
>>>>> 
>>>>> This is an interesting line of inquiry, but since a solution along this 
>>>>> line has the limitation of requiring me to implement my own end-to-end 
>>>>> behaviors on top of a raw socket, I’m wondering if it would be desirable 
>>>>> to introduce an API for this purpose
>>>>> 
>>>>>> /* Returns an integer that uniquely identifies the immediate sender of 
>>>>>> the most-recently-received message.  Returns an error if no messages 
>>>>>> have ever been received on the socket */
>>>>> 
>>>>>> int nn_sender(int socket);
>>>>> 
>>>>> Such API could work equally well for raw sockets as full sockets, could 
>>>>> be implemented for different socket topologies, and does not introduce an 
>>>>> application-layer dependency on parsing the control header format.
>>>>> 
>>>>> 
>>>>> [1] https://github.com/nanomsg/nanomsg/blob/master/src/core/global.c#L817
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On May 5, 2014, at 4:38 PM, Drew Crawford <drew@xxxxxxxxxxxxxxxxxx> 
>>>>>> wrote:
>>>>>> 
>>>>>> I thought about that, however, msg_controllen still returns -1 when 
>>>>>> using raw sockets, suggesting there are no control information 
>>>>>> available, as the sample below illustrates.  Maybe something is wrong 
>>>>>> with the code sample?
>>>>>> 
>>>>>> Another problem is that use of raw sockets would require me to roll my 
>>>>>> own end-to-end behavior which may be undesirable.
>>>>>> 
>>>>>> 
>>>>>>>     int client = nn_socket(AF_SP,NN_REQ);
>>>>>>>     int server = nn_socket(AF_SP_RAW,NN_REP);
>>>>>>>     nn_connect(client,"inproc://test");
>>>>>>>     nn_bind(server,"inproc://test");
>>>>>>>     nn_send(client,"A",1,0);
>>>>>>>     
>>>>>>>     int rc;
>>>>>>>     void *body;
>>>>>>>     void *control;
>>>>>>>     struct nn_iovec iov;
>>>>>>>     struct nn_msghdr hdr;
>>>>>>> 
>>>>>>>     iov.iov_base = &body;
>>>>>>>     iov.iov_len = NN_MSG;
>>>>>>>     memset (&hdr, 0, sizeof (hdr));
>>>>>>>     hdr.msg_iov = &iov;
>>>>>>>     hdr.msg_iovlen = 1;
>>>>>>>     hdr.msg_control = &control;
>>>>>>>     hdr.msg_controllen = NN_MSG;
>>>>>>>     rc = nn_recvmsg (server, &hdr, 0);
>>>>>>>     print_array(body,rc,"body”); //contains only A
>>>>>>> 
>>>>>>>     printf("msg_iovlen %d\n",hdr.msg_iovlen); // 1
>>>>>>>     printf("msg_controllen %d\n",hdr.msg_controllen); // -1
>>>>>> 
>>>>>> 
>>>>>>> On May 5, 2014, at 4:32 PM, Achille Roussel <achille.roussel@xxxxxxxxx> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> You have to use AF_SP_RAW sockets to get access to these info in the 
>>>>>>> control header when receiving a message with nn_recvmsg. 
>>>>>>> 
>>>>>>>> On May 5, 2014, at 2:27 PM, Drew Crawford <drew@xxxxxxxxxxxxxxxxxx> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I have a REP socket.  I’m trying to identify the channel (sender or 
>>>>>>>> forwarder) on which some message has arrived to the socket.  A 
>>>>>>>> transport-layer understanding of the sender is not required; any 
>>>>>>>> identifying value, such as an integer, is sufficient.  Consulting the 
>>>>>>>> REQREP spec  suggests that the topmost “channel ID”, one of the 
>>>>>>>> records in the “backtrace”, is the identifier I’m looking for.
>>>>>>>> 
>>>>>>>> Clearly this identifier is not exposed over the nn_recv interface.  I 
>>>>>>>> had some hopes that it would be accessible in the nn_recvmsg 
>>>>>>>> interface, possibly as control information, but it seems not to be the 
>>>>>>>> case:
>>>>>>>> 
>>>>>>>>>     int client = nn_socket(AF_SP,NN_REQ);
>>>>>>>>>     int server = nn_socket(AF_SP,NN_REP);
>>>>>>>>>     nn_connect(client,"inproc://test");
>>>>>>>>>     nn_bind(server,"inproc://test");
>>>>>>>>>     nn_send(client,"A",1,0);
>>>>>>>>>     
>>>>>>>>>     int rc;
>>>>>>>>>     void *body;
>>>>>>>>>     void *control;
>>>>>>>>>     struct nn_iovec iov;
>>>>>>>>>     struct nn_msghdr hdr;
>>>>>>>>> 
>>>>>>>>>     iov.iov_base = &body;
>>>>>>>>>     iov.iov_len = NN_MSG;
>>>>>>>>>     memset (&hdr, 0, sizeof (hdr));
>>>>>>>>>     hdr.msg_iov = &iov;
>>>>>>>>>     hdr.msg_iovlen = 1;
>>>>>>>>>     hdr.msg_control = &control;
>>>>>>>>>     hdr.msg_controllen = NN_MSG;
>>>>>>>>>     rc = nn_recvmsg (server, &hdr, 0);
>>>>>>>>>     print_array(body,rc,"body”); //contains only A
>>>>>>>>> 
>>>>>>>>>     printf("msg_iovlen %d\n",hdr.msg_iovlen); //1
>>>>>>>>>     printf("msg_controllen %d\n",hdr.msg_controllen); //-1
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I have consulted a previous mailing thread on this topic which 
>>>>>>>> suggests channel IDs are manipulated in rep.c.  Indeed, the 
>>>>>>>> information I’m looking for seems to be moved around between 
>>>>>>>> nn_sockbase, nn_msg, nn_rep, and similar structures.  However I cannot 
>>>>>>>> work out a sane way to get those structures from application code.  
>>>>>>>> 
>>>>>>>> Any suggestions on identifying the sender of a remote message?
>>>>>>>> 
>>>>>>>> Drew
>
Follow-Ups:
- [nanomsg] Re: accessing control IDs
  - From: Drew Crawford
References:
- [nanomsg] accessing control IDs
  - From: Drew Crawford
- [nanomsg] Re: accessing control IDs
  - From: Achille Roussel
- [nanomsg] Re: accessing control IDs
  - From: Drew Crawford
- [nanomsg] Re: accessing control IDs
  - From: Drew Crawford
- [nanomsg] Re: accessing control IDs
  - From: Apostolis Xekoukoulotakis
- [nanomsg] Re: accessing control IDs
  - From: Drew Crawford
- [nanomsg] Re: accessing control IDs
  - From: Achille Roussel
- [nanomsg] Re: accessing control IDs
  - From: Drew Crawford
[nanomsg] Re: accessing control IDs

Other related posts: