> In fact, I'm almost sure that if it isn't TCP itself, its going to be the req > timing out This theory makes a lot of sense. The other sockets took significant time to transfer data, and this neatly explains the results I got in the test. > How on this green earth you'd expect to have functional networking with 25% > of your packets falling on the floor is beyond me. In my defense, I did no further investigation of the problem. 25% packet loss was well outside my mission parameters. It's simply that I had done all the work to do some really robust benchmarking, I might as well torture the subject a bit. Sent from my iPad > On Aug 7, 2014, at 2:19 AM, "Garrett D'Amore" <garrett@xxxxxxxxxx> wrote: > > Well, there is a brief handshake that nanomsg performs... but as it is just > on top of TCP, that shouldn't matter. TCP guarantees a reliable connection > *once the connection is established*. The performance may be in the toilet, > and its possible that its bad enough that it takes longer to get a correct > packet thru than the req/rep timers. > > In fact, I'm almost sure that if it isn't TCP itself, its going to be the req > timing out -- if it doesn't get a reply in "n" milliseconds (don't recall > what n is, I think its tunable), it resubmits, which effectively "drops" the > previous effort. > > How on this green earth you'd expect to have functional networking with 25% > of your packets falling on the floor is beyond me. You'd have better > reliability with carrier pigeons. :-) (There's an RFC for that, too! :-) > > - Garrett > > > >> On Wed, Aug 6, 2014 at 10:47 PM, Drew Crawford <drew@xxxxxxxxxxxxxxxxxx> >> wrote: >> It’s not TCP. I benchmarked other TCP-based protocols that actually make >> connections under those circumstances. >> >> My theory was at the time—and again, I’ve never investigated this, it’s just >> a theory—that something inside nanomsg’s transport or protocol layer has >> some extra steps in its handshake, and so the combined TCP + nanomsg >> handshake somehow exceeds the combined TCP + otherprotocol transport for >> various other protocols in my benchmark. >> >> >>> On Aug 7, 2014, at 12:43 AM, Garrett D'Amore <garrett@xxxxxxxxxx> wrote: >>> >>> That's going to be TCP. TCP melts down badly under extreme pressure. At >>> 25% packet loss, your sessions don't live long enough to successfully >>> survive to the point of actually exchanging user data. (The 3-way >>> handshake probably doesn't complete a substantial amount of the time.) >>> >>> 1% packet loss is considered a very bad network. Well before 10% loss you >>> start looking for people to fire or equipment to replace. >>> >>> >>>> On Wed, Aug 6, 2014 at 9:30 PM, Drew Crawford <drew@xxxxxxxxxxxxxxxxxx> >>>> wrote: >>>> For whatever it’s worth, I benchmarked nanomsg req/rep in my “very bad >>>> network lab" and it did very poorly in a packet loss scenario. I think >>>> when packet loss rose above 25% or so it was impossible to transmit a >>>> single message. >>>> >>>> The problem wasn’t critical enough at that time to merit any further >>>> investigation from me, but if there’s interest from somebody else in >>>> submitting some patches I’d be happy to benchmark them on what is a pretty >>>> robust test environment. >>>> >>>> Drew >>>> >>>> >>>> On Aug 6, 2014, at 3:09 PM, Alex Elsayed <eternaleye@xxxxxxxxx> wrote: >>>> >>>> > Name Withheld wrote: >>>> > >>>> >> I have two linux machines (X and Y), with 2-20 extremely unreliable IP >>>> >> connections between them. TCP is more reliable even without the its >>>> >> reliability control than UDP in this setting, because the various ISPs >>>> >> along the way apparently drop UDPs when they are congested, but not TCP. >>>> >> The connections use different media (frame relay, GPRS, 3G, 4G, WiFi, >>>> >> mesh networks, you name it), and are mostly there although each goes >>>> >> away for few minutes to a few hours every few days. A jungle, no doubt. >>>> >> >>>> >> I want to use all the connections available at a given moment to >>>> >> increase the bandwidth, and since I can modify the applications running >>>> >> on both machines - I wondered if I could use nanomsg for that? I can >>>> >> deal with reordering and duplicate messages, but not with missing >>>> >> messages ("at least once" delivery is needed) >>>> >> >>>> >> From reading the documentation, it sounds like two pipeline connections >>>> >> (X push -> Y pull, X pull <- Y push) would give me the load balancing, >>>> >> as long as I can use different IP addresses to guarantee the connections >>>> >> are going out through the different connections (which I can! each >>>> >> machine has 20 IP addresses). However, I can't figure out from the docs >>>> >> if there is retransmit if a connection dies while a message is "in >>>> >> flight" . Also, I can't figure out from the docs how a broken transport >>>> >> is detected - will it have to wait until the TCP connection died >>>> >> (>1minute), or is there in inner timeout I can control? >>>> >> >>>> >> So, the question is: >>>> >> >>>> >> Is my understanding correct, and pipeline is the way to go? Or is there >>>> >> a better solution? (Or, is nanomsg totally not the right tool for me?) >>>> >> >>>> >> I've considered using the linux bonding interface, and doing a tcp >>>> >> connection above that; However, this would introduce crazy latencies and >>>> >> retransmits because tcp tries to keep packet order, which I can do >>>> >> without. >>>> >> >>>> >> Thanks in advance. >>>> > >>>> > One thing I'd suggest looking into is MPTCP[1] (Multipath TCP) - it's >>>> > designed for basically this exact use case. >>>> > >>>> > If you can't (or don't want to) build kernel modules, then the MPTCP >>>> > proxy[2] (which runs in userspace using netfilter/iptables) may be of >>>> > interest. >>>> > >>>> > Either option would be essentially transparent underneath nanomsg, due to >>>> > the design of MPTCP. >>>> > >>>> > Another option might be SCTP (since it supports multihoming), although >>>> > that >>>> > would likely require adding an SCTP transport to nanomsg. >>>> > >>>> > [1] http://www.multipath-tcp.org/ >>>> > [2] >>>> > http://www.ietf.org/mail-archive/web/multipathtcp/current/msg01934.html >>>> > >>>> > >>>> >>>> >>> >> >