I wrote a simple multi-thread REQ/REP server, it seems to work well. But, the performance is not better than single-thread server. #include <stdio.h> #include <stdlib.h> #include <string.h> #include <nanomsg/nn.h> #include <nanomsg/tcp.h> #include <nanomsg/reqrep.h> #include <pthread.h> #include <assert.h> #include <unistd.h> #define WORKER_NUM 10 int main_sock; int device_sock; void* work(void* param) { char* addr = (char*)param; int sock = nn_socket(AF_SP, NN_REP); assert(sock>=0); int opt = 1; pthread_t cur_th = pthread_self(); printf("init[%x]\n", &cur_th); int ret = nn_connect(sock, addr); assert(ret>=0); while (1) { char* buf = NULL; int buf_len = nn_recv(sock, &buf, NN_MSG, 0); printf("%.*s\n", buf_len, buf); int written = nn_send(sock, buf, buf_len, 0); //echo back if (buf_len != written) { abort(); } nn_freemsg(buf); } } void * start_device(void* param) { int c_ret = nn_device(device_sock, main_sock); assert(c_ret >= 0); } int main(int argc, char* argv[]) { pthread_t thread_ary[WORKER_NUM]; const char* addr="tcp://0.0.0.0:12345"; const char* addr_device="inproc://hub"; main_sock = nn_socket(AF_SP_RAW, NN_REP); int ret = nn_bind(main_sock, addr); assert(ret >= 0); device_sock = nn_socket(AF_SP_RAW, NN_REQ); ret = nn_bind(device_sock, addr_device); assert(ret >= 0); pthread_t de_th; pthread_create(&de_th, NULL, start_device, NULL); sleep(1); for (int i=0; i<WORKER_NUM; i++) { pthread_t tid; pthread_create(&tid, NULL, work, (void*)addr_device); thread_ary[i] = tid; } for (int i=0; i<WORKER_NUM; i++) { pthread_join(thread_ary[i], NULL); } return 0; } On Tue, Jan 20, 2015 at 1:01 AM, Garrett D'Amore <garrett@xxxxxxxxxx> wrote: > Look at the device framework. You don't need parallel links just parallel > processing. I'm not sure that other examples exist. > > Sent from my iPhone > > > On Jan 18, 2015, at 11:55 PM, Pierre Salmon <pierre.salmon@xxxxxxxxxxxxx> > wrote: > > > > Hi Garrett, thanks for your answers. I will parallelize my code to open > multiple links. Where can i find an example of Raw REQ/REP ? > > > > Pierre > > > > On 01/16/2015 06:11 PM, Garrett D'Amore wrote: > >>> On Jan 16, 2015, at 8:00 AM, Pierre Salmon < > pierre.salmon@xxxxxxxxxxxxx> wrote: > >>> > >>> Hi, > >>> > >>> I have a little question, what is the best architecture to have > request/response system with high performance (300000 msg/s). > >>> Now, i use REQREP socket pattern but, with simple example, i hace only > ~30000 msg/s (1 thread with REQ socket and 1 thread with REP socket). if i > add new threads (REP+REQ) in apps, i cannot increase this result (always > 30000). > >>> what am i doing wrong ? > >>> > >>> Pierre > >> This begs many, many questions. > >> > >> The code can probably achieve > 1M messages per second, but *not* if > you’re running a vanilla req/rep socket. Those sockets are strictly > serialized, and you wind up losing performance because you can only have a > single message *outstanding*. Networking latency thus becomes the limiter > in that situation. > >> > >> The solution to that problem is to make sure you’re using raw modes — > RREQ/RREP if I recall the code properly. (In mangos you get this by > setting the socket option to Raw mode, but nanomsg instead makes you select > it during socket initialization.) > >> > >> Be aware that running in raw mode means that you have to take care to > match replies to requests, by looking at the header, and copying the header > from the request to the reply. > >> > >> There may be other factors limiting you too. For example, do you have > enough resources; do you have other serialization points in your > application code; does your threading code properly engage multiple cores; > do you have enough bandwidth to serve the traffic; etc. etc. But at > *first* guess, its probably the raw vs. cooked mode that is limiting you. > If you’re already in raw mode, you will need to do further analysis. > >> > >> If you have to run serialized, you won’t be able to get such high > message rates per second. To get 300K messages per second you’d need to > have a round trip latency of only 3 usec. I’m not aware of any commodity > transport that can do that, or even get close. TCP transports over > ethernet are probably on the order of 10x that latency. (Note that raw > ethernet, assuming 64-byte frames, can do about 9M packets per second over > 10Gbe, or just under 1M for 1Gbe. That’s running at wire rate with zero > interpacket latency. At 1GbE even 1 usec latency cuts that rate in > *half*, so you *have* to get parallelization to achieve high rates.) > >> > >> - Garrett > > > > > >