[nanomsg] Re: ReqRep high performance

  • From: junyi sun <ccnusjy@xxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Tue, 20 Jan 2015 10:11:26 +0800

I wrote a simple multi-thread REQ/REP server, it seems to work well.  But,
the performance is not better than single-thread server.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <nanomsg/nn.h>
#include <nanomsg/tcp.h>
#include <nanomsg/reqrep.h>
#include <pthread.h>
#include <assert.h>
#include <unistd.h>

#define WORKER_NUM 10

int main_sock;
int device_sock;

void* work(void* param) {
    char* addr = (char*)param;
    int sock = nn_socket(AF_SP, NN_REP);
    assert(sock>=0);
    int opt = 1;
    pthread_t cur_th = pthread_self();
    printf("init[%x]\n", &cur_th);

    int ret = nn_connect(sock, addr);
    assert(ret>=0);
    while (1) {
            char* buf = NULL;
            int buf_len = nn_recv(sock, &buf, NN_MSG, 0);
            printf("%.*s\n",  buf_len, buf);

            int written = nn_send(sock, buf, buf_len, 0); //echo back
            if (buf_len != written) {
                abort();
            }
            nn_freemsg(buf);
    }
}

void * start_device(void* param) {
    int c_ret = nn_device(device_sock, main_sock);
    assert(c_ret >= 0);
}

int main(int argc, char* argv[]) {
    pthread_t thread_ary[WORKER_NUM];

    const char* addr="tcp://0.0.0.0:12345";
    const char* addr_device="inproc://hub";

    main_sock = nn_socket(AF_SP_RAW, NN_REP);
    int ret = nn_bind(main_sock, addr);
    assert(ret >= 0);
    device_sock = nn_socket(AF_SP_RAW, NN_REQ);
    ret = nn_bind(device_sock, addr_device);
    assert(ret >= 0);

    pthread_t de_th;
    pthread_create(&de_th, NULL, start_device, NULL);
    sleep(1);

    for (int i=0; i<WORKER_NUM; i++) {
        pthread_t tid;
        pthread_create(&tid, NULL, work, (void*)addr_device);
        thread_ary[i] = tid;
    }

    for (int i=0; i<WORKER_NUM; i++) {
        pthread_join(thread_ary[i], NULL);
    }

    return 0;
}

On Tue, Jan 20, 2015 at 1:01 AM, Garrett D'Amore <garrett@xxxxxxxxxx> wrote:

> Look at the device framework.  You don't need parallel links just parallel
> processing.  I'm not sure that other examples exist.
>
> Sent from my iPhone
>
> > On Jan 18, 2015, at 11:55 PM, Pierre Salmon <pierre.salmon@xxxxxxxxxxxxx>
> wrote:
> >
> > Hi Garrett, thanks for your answers. I will parallelize my code to open
> multiple links. Where can i find an example of Raw REQ/REP ?
> >
> > Pierre
> >
> > On 01/16/2015 06:11 PM, Garrett D'Amore wrote:
> >>> On Jan 16, 2015, at 8:00 AM, Pierre Salmon <
> pierre.salmon@xxxxxxxxxxxxx> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I have a little question, what is the best architecture to have
> request/response system with high performance (300000 msg/s).
> >>> Now, i use REQREP socket pattern but, with simple example, i hace only
> ~30000 msg/s (1 thread with REQ socket and 1 thread with REP socket). if i
> add new threads (REP+REQ) in apps, i cannot increase this result (always
> 30000).
> >>> what am i doing wrong ?
> >>>
> >>> Pierre
> >> This begs many, many questions.
> >>
> >> The code can probably achieve > 1M messages per second, but *not* if
> you’re running a vanilla req/rep socket.  Those sockets are strictly
> serialized, and you wind up losing performance because you can only have a
> single message *outstanding*.  Networking latency thus becomes the limiter
> in that situation.
> >>
> >> The solution to that problem is to make sure you’re using raw modes —
> RREQ/RREP if I recall the code properly.  (In mangos you get this by
> setting the socket option to Raw mode, but nanomsg instead makes you select
> it during socket initialization.)
> >>
> >> Be aware that running in raw mode means that you have to take care to
> match replies to requests, by looking at the header, and copying the header
> from the request to the reply.
> >>
> >> There may be other factors limiting you too.   For example, do you have
> enough resources; do you have other serialization points in your
> application code; does your threading code properly engage multiple cores;
> do you have enough bandwidth to serve the traffic; etc. etc.  But at
> *first* guess, its probably the raw vs. cooked mode that is limiting you.
> If you’re already in raw mode, you will need to do further analysis.
> >>
> >> If you have to run serialized, you won’t be able to get such high
> message rates per second.  To get 300K messages per second you’d need to
> have a round trip latency of only 3 usec.  I’m not aware of any commodity
> transport that can do that, or even get close.  TCP transports over
> ethernet are probably on the order of 10x that latency.   (Note that raw
> ethernet, assuming 64-byte frames, can do about 9M packets per second over
> 10Gbe, or just under 1M for 1Gbe.   That’s running at wire rate with zero
> interpacket latency.   At 1GbE even 1 usec latency cuts that rate in
> *half*, so you *have* to get parallelization to achieve high rates.)
> >>
> >>    - Garrett
> >
> >
>
>

Other related posts: