[nanomsg] Re: ReqRep high performance

  • From: Garrett D'Amore <garrett@xxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Mon, 19 Jan 2015 18:37:24 -0800

You’re using device() literally —that’s wrong.

You want to have AF_SP_RAW used for the socket used by the worker.  That means 
you have to save the header and restore it — the device() routine has this 
logic, but you need to copy that logic as appropriate, rather than just trying 
to call device() directly.

        - Garrett

> On Jan 19, 2015, at 6:11 PM, junyi sun <ccnusjy@xxxxxxxxx> wrote:
> 
> I wrote a simple multi-thread REQ/REP server, it seems to work well.  But, 
> the performance is not better than single-thread server.
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <nanomsg/nn.h>
> #include <nanomsg/tcp.h>
> #include <nanomsg/reqrep.h>
> #include <pthread.h>
> #include <assert.h>
> #include <unistd.h>
> 
> #define WORKER_NUM 10
> 
> int main_sock;
> int device_sock;
> 
> void* work(void* param) {
>     char* addr = (char*)param;
>     int sock = nn_socket(AF_SP, NN_REP);
>     assert(sock>=0);
>     int opt = 1;
>     pthread_t cur_th = pthread_self();
>     printf("init[%x]\n", &cur_th);
> 
>     int ret = nn_connect(sock, addr);
>     assert(ret>=0);
>     while (1) {
>             char* buf = NULL;
>             int buf_len = nn_recv(sock, &buf, NN_MSG, 0);
>             printf("%.*s\n",  buf_len, buf);
> 
>             int written = nn_send(sock, buf, buf_len, 0); //echo back
>             if (buf_len != written) {
>                 abort();
>             }
>             nn_freemsg(buf);
>     }
> }
> 
> void * start_device(void* param) {
>     int c_ret = nn_device(device_sock, main_sock);
>     assert(c_ret >= 0);
> }
> 
> int main(int argc, char* argv[]) {
>     pthread_t thread_ary[WORKER_NUM];
> 
>     const char* addr="tcp://0.0.0.0:12345 <http://0.0.0.0:12345/>";
>     const char* addr_device="inproc://hub";
> 
>     main_sock = nn_socket(AF_SP_RAW, NN_REP);
>     int ret = nn_bind(main_sock, addr);
>     assert(ret >= 0);
>     device_sock = nn_socket(AF_SP_RAW, NN_REQ);
>     ret = nn_bind(device_sock, addr_device);
>     assert(ret >= 0);
> 
>     pthread_t de_th;
>     pthread_create(&de_th, NULL, start_device, NULL);
>     sleep(1);
> 
>     for (int i=0; i<WORKER_NUM; i++) {
>         pthread_t tid;
>         pthread_create(&tid, NULL, work, (void*)addr_device);
>         thread_ary[i] = tid;
>     }
> 
>     for (int i=0; i<WORKER_NUM; i++) {
>         pthread_join(thread_ary[i], NULL);
>     }
> 
>     return 0;
> }
> 
> On Tue, Jan 20, 2015 at 1:01 AM, Garrett D'Amore <garrett@xxxxxxxxxx 
> <mailto:garrett@xxxxxxxxxx>> wrote:
> Look at the device framework.  You don't need parallel links just parallel 
> processing.  I'm not sure that other examples exist.
> 
> Sent from my iPhone
> 
> > On Jan 18, 2015, at 11:55 PM, Pierre Salmon <pierre.salmon@xxxxxxxxxxxxx 
> > <mailto:pierre.salmon@xxxxxxxxxxxxx>> wrote:
> >
> > Hi Garrett, thanks for your answers. I will parallelize my code to open 
> > multiple links. Where can i find an example of Raw REQ/REP ?
> >
> > Pierre
> >
> > On 01/16/2015 06:11 PM, Garrett D'Amore wrote:
> >>> On Jan 16, 2015, at 8:00 AM, Pierre Salmon <pierre.salmon@xxxxxxxxxxxxx 
> >>> <mailto:pierre.salmon@xxxxxxxxxxxxx>> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I have a little question, what is the best architecture to have 
> >>> request/response system with high performance (300000 msg/s).
> >>> Now, i use REQREP socket pattern but, with simple example, i hace only 
> >>> ~30000 msg/s (1 thread with REQ socket and 1 thread with REP socket). if 
> >>> i add new threads (REP+REQ) in apps, i cannot increase this result 
> >>> (always 30000).
> >>> what am i doing wrong ?
> >>>
> >>> Pierre
> >> This begs many, many questions.
> >>
> >> The code can probably achieve > 1M messages per second, but *not* if 
> >> you’re running a vanilla req/rep socket.  Those sockets are strictly 
> >> serialized, and you wind up losing performance because you can only have a 
> >> single message *outstanding*.  Networking latency thus becomes the limiter 
> >> in that situation.
> >>
> >> The solution to that problem is to make sure you’re using raw modes — 
> >> RREQ/RREP if I recall the code properly.  (In mangos you get this by 
> >> setting the socket option to Raw mode, but nanomsg instead makes you 
> >> select it during socket initialization.)
> >>
> >> Be aware that running in raw mode means that you have to take care to 
> >> match replies to requests, by looking at the header, and copying the 
> >> header from the request to the reply.
> >>
> >> There may be other factors limiting you too.   For example, do you have 
> >> enough resources; do you have other serialization points in your 
> >> application code; does your threading code properly engage multiple cores; 
> >> do you have enough bandwidth to serve the traffic; etc. etc.  But at 
> >> *first* guess, its probably the raw vs. cooked mode that is limiting you.  
> >> If you’re already in raw mode, you will need to do further analysis.
> >>
> >> If you have to run serialized, you won’t be able to get such high message 
> >> rates per second.  To get 300K messages per second you’d need to have a 
> >> round trip latency of only 3 usec.  I’m not aware of any commodity 
> >> transport that can do that, or even get close.  TCP transports over 
> >> ethernet are probably on the order of 10x that latency.   (Note that raw 
> >> ethernet, assuming 64-byte frames, can do about 9M packets per second over 
> >> 10Gbe, or just under 1M for 1Gbe.   That’s running at wire rate with zero 
> >> interpacket latency.   At 1GbE even 1 usec latency cuts that rate in 
> >> *half*, so you *have* to get parallelization to achieve high rates.)
> >>
> >>    - Garrett
> >
> >
> 
> 

Other related posts: