Hi Tino,
(a) Is there documentation regarding thread management and message flow?
Unfortunately, not yet. How it works, in short, is that each socket is inside a critical section. Each socket also has a condition variable to wait on in blocking functions such as nn_recv() and a worker thread. When sending message from socket A to socket B, following sequence of steps happens:
1. nn_recv() on B is called. 2. B's critical section is entered 3. There are no messages waiting in B 4. Wait for condition variable in B 5. nn_send() is called on A 6. A's critical section is entered 7. message is written into shared pipe 8. event is sent to B's worker thread (via eventfd) 9. A's critical section is exited 10. B's worker thread gets the event 11. It signals B's condition variable 12. Application thread blocked in nn_recv() is unblocked 13. It receives the message from the shared pipe 14. It leaves B's critical section 15. nn_recv() on B exitsI would say all the above is pretty straightforward except for the worker thread thing. Specifically, why doesn't the sender thread signal the receiver's condition variable directly, instead of going through the worker thread?
The reason is that if it is done so, there's a race condition when both sides of the inproc connection are sending at the same time:
1. nn_send(A) 2. lock mutex A 3. nn_send(B) 4. lock mutex B 5. message is written to a shared pipe 6. lock mutex A <-------- deadlock happens here 7. signal A's condition variable etc.
(b) Have you implemented some sort of busy wait already (for the workers at least) ?
No. Not yet. I guess busy wait could reduce the latency of the "post" step from 7us to 0.5us.
If so, and if "event" step can be somehow eliminated altogether, we can expect inproc latency below 5us. That would be really nice.
Martin