You’re using device() literally —that’s wrong. You want to have AF_SP_RAW used for the socket used by the worker. That means you have to save the header and restore it — the device() routine has this logic, but you need to copy that logic as appropriate, rather than just trying to call device() directly. - Garrett > On Jan 19, 2015, at 6:11 PM, junyi sun <ccnusjy@xxxxxxxxx> wrote: > > I wrote a simple multi-thread REQ/REP server, it seems to work well. But, > the performance is not better than single-thread server. > > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <nanomsg/nn.h> > #include <nanomsg/tcp.h> > #include <nanomsg/reqrep.h> > #include <pthread.h> > #include <assert.h> > #include <unistd.h> > > #define WORKER_NUM 10 > > int main_sock; > int device_sock; > > void* work(void* param) { > char* addr = (char*)param; > int sock = nn_socket(AF_SP, NN_REP); > assert(sock>=0); > int opt = 1; > pthread_t cur_th = pthread_self(); > printf("init[%x]\n", &cur_th); > > int ret = nn_connect(sock, addr); > assert(ret>=0); > while (1) { > char* buf = NULL; > int buf_len = nn_recv(sock, &buf, NN_MSG, 0); > printf("%.*s\n", buf_len, buf); > > int written = nn_send(sock, buf, buf_len, 0); //echo back > if (buf_len != written) { > abort(); > } > nn_freemsg(buf); > } > } > > void * start_device(void* param) { > int c_ret = nn_device(device_sock, main_sock); > assert(c_ret >= 0); > } > > int main(int argc, char* argv[]) { > pthread_t thread_ary[WORKER_NUM]; > > const char* addr="tcp://0.0.0.0:12345 <http://0.0.0.0:12345/>"; > const char* addr_device="inproc://hub"; > > main_sock = nn_socket(AF_SP_RAW, NN_REP); > int ret = nn_bind(main_sock, addr); > assert(ret >= 0); > device_sock = nn_socket(AF_SP_RAW, NN_REQ); > ret = nn_bind(device_sock, addr_device); > assert(ret >= 0); > > pthread_t de_th; > pthread_create(&de_th, NULL, start_device, NULL); > sleep(1); > > for (int i=0; i<WORKER_NUM; i++) { > pthread_t tid; > pthread_create(&tid, NULL, work, (void*)addr_device); > thread_ary[i] = tid; > } > > for (int i=0; i<WORKER_NUM; i++) { > pthread_join(thread_ary[i], NULL); > } > > return 0; > } > > On Tue, Jan 20, 2015 at 1:01 AM, Garrett D'Amore <garrett@xxxxxxxxxx > <mailto:garrett@xxxxxxxxxx>> wrote: > Look at the device framework. You don't need parallel links just parallel > processing. I'm not sure that other examples exist. > > Sent from my iPhone > > > On Jan 18, 2015, at 11:55 PM, Pierre Salmon <pierre.salmon@xxxxxxxxxxxxx > > <mailto:pierre.salmon@xxxxxxxxxxxxx>> wrote: > > > > Hi Garrett, thanks for your answers. I will parallelize my code to open > > multiple links. Where can i find an example of Raw REQ/REP ? > > > > Pierre > > > > On 01/16/2015 06:11 PM, Garrett D'Amore wrote: > >>> On Jan 16, 2015, at 8:00 AM, Pierre Salmon <pierre.salmon@xxxxxxxxxxxxx > >>> <mailto:pierre.salmon@xxxxxxxxxxxxx>> wrote: > >>> > >>> Hi, > >>> > >>> I have a little question, what is the best architecture to have > >>> request/response system with high performance (300000 msg/s). > >>> Now, i use REQREP socket pattern but, with simple example, i hace only > >>> ~30000 msg/s (1 thread with REQ socket and 1 thread with REP socket). if > >>> i add new threads (REP+REQ) in apps, i cannot increase this result > >>> (always 30000). > >>> what am i doing wrong ? > >>> > >>> Pierre > >> This begs many, many questions. > >> > >> The code can probably achieve > 1M messages per second, but *not* if > >> you’re running a vanilla req/rep socket. Those sockets are strictly > >> serialized, and you wind up losing performance because you can only have a > >> single message *outstanding*. Networking latency thus becomes the limiter > >> in that situation. > >> > >> The solution to that problem is to make sure you’re using raw modes — > >> RREQ/RREP if I recall the code properly. (In mangos you get this by > >> setting the socket option to Raw mode, but nanomsg instead makes you > >> select it during socket initialization.) > >> > >> Be aware that running in raw mode means that you have to take care to > >> match replies to requests, by looking at the header, and copying the > >> header from the request to the reply. > >> > >> There may be other factors limiting you too. For example, do you have > >> enough resources; do you have other serialization points in your > >> application code; does your threading code properly engage multiple cores; > >> do you have enough bandwidth to serve the traffic; etc. etc. But at > >> *first* guess, its probably the raw vs. cooked mode that is limiting you. > >> If you’re already in raw mode, you will need to do further analysis. > >> > >> If you have to run serialized, you won’t be able to get such high message > >> rates per second. To get 300K messages per second you’d need to have a > >> round trip latency of only 3 usec. I’m not aware of any commodity > >> transport that can do that, or even get close. TCP transports over > >> ethernet are probably on the order of 10x that latency. (Note that raw > >> ethernet, assuming 64-byte frames, can do about 9M packets per second over > >> 10Gbe, or just under 1M for 1Gbe. That’s running at wire rate with zero > >> interpacket latency. At 1GbE even 1 usec latency cuts that rate in > >> *half*, so you *have* to get parallelization to achieve high rates.) > >> > >> - Garrett > > > > > >