[nanomsg] Re: Nanomsg noobie - how to use nanomsg to solve disparate database integration

From: Alex Elsayed <eternaleye@xxxxxxxxx>
To: nanomsg@xxxxxxxxxxxxx
Date: Thu, 12 Feb 2015 18:13:49 -0800
Marl wrote:

> We are planning a project that involves the use of multiple nosql
> databases.  These databases share some key pieces of information and so we
> are looking at using something like Mqtt or nanomsg/0mq to accomplish the
> integration.
> 
> Some questions that are running through my mind include:
> 1.  Are we even barking up the right tree as far as technology set is
> concerned?
> 2.  If tools like nanomsg are the right ones for the job, is it good
> practice to send large amounts of data across the wire?  For example, I
> can see sending some small amounts of data but what about blobs or even
> large chunks of tables?  Historically, I'm used to seeing something like
> this for data exchange:
> 
> Server A running database A is updated with new data via some mechanism
> like a web app. After the table in the db is changed, a trigger inserts
> the changed records into a shadow table that basically is used to hold all
> the deltas. A message broker of sorts notifies all listeners that there
> has been a change ... With some sort of a change id embedded in the
> message.
> This change id is associated with the relevant records in the shadow
> table.  Once the listener has grabbed the records it needs, the broker
> removes them from the shadow table.

One thing I'd worry about here - what id two databases get conflicting 
updates, before the delta has propagated? That's the kind of problem 
("shared log"/"distributed state machine") that Paxos and Raft are intended 
to solve - just pushing the data around isn't enough; you need to enforce 
consistency as well.

Now, Paxos or Raft _over_ nanomsg might very well be a nice thing, because 
nanomsg's topologies make some of the tasks (broadcast especially) much 
easier.

> The problem with the above design is that the listeners need to know how
> to connect to the source db which creates tight coupling between the
> various systems.

There's a relatively simple solve.

Each database acts as a REP and a PUB. On the PUB, it just sends out 
changesets as they occur, with a monotonically-increasing change id. On the 
REP, it replies to requests for ranges of changeids (for if someone falls 
behind).

At this point, the protocol is "Each database connects to each other 
database's PUB and listens for changes, followed by connecting to the 
ajoining REP. If [on the PUB/SUB] (received id) > (current id + 1), request 
the changesets in between via a REQ.

At that point, all a new node needs to know in order to connect is the list 
of existing servers, which is information every server is already tracking. 
Since there's already a REP on each server, just have it accept two more 
types of messages other than just the "get range" - "get list of nodes" and 
"register peer". When a peer is joining, it needs the address of a single 
existing node. It connects, asks for a list, and then calls each one up and 
registers its own address. Then it does the SUB/REQ connections, and they 
connect to it.

And there you have a bootstrappable cluster that replicates events to all 
nodes.

> Can you give me a 30,000 foot view of how nanomsg could be used to solve
> the above problem?  From what I read in the archives there doesn't seem to
> be a max size for messages in nano so theoretically I can send an entire
> table (not that I would actually do that.  But I can see sending several
> records worth of data in the case of a complex transaction...or a blob
> like an MP3 or some other audio file) across the wire. But just because I
> can doesn't mean I should.  Also what pattern would I use?  Are there any
> examples / documentation that I can check out to solidify my
> understanding?

While the above answers the question you /asked/, it's still incomplete for 
the /use case/ you laid out, because of the issue of consistency. If your 
databases are sharded such that all operations are commutative, then it's a 
non-issue; however that's generally not the case.

Because of that, I'd suggest looking into Raft or Paxos. You may wind up 
using a pre-existing library (and I don't believe any of them use nanomsg at 
this time), or you might decide to roll your own - and if you do roll your 
own, I suspect that nanomsg would be a good foundation to build upon.

> If you have any insights or suggestions for reading I'd appreciate it. 
> I'm leaning towards nano more than zero or mqtt just based on the little
> that I've read. But I'm not conversant enough to make a solid case to my
> team.

I hope I've helped, and if you have any further questions don't hesitate to 
ask!

> Thanks for reading this post...
> 
> Sent from my iPad
References:
- [nanomsg] Nanomsg noobie - how to use nanomsg to solve disparate database integration
  - From: Marl
[nanomsg] Re: Nanomsg noobie - how to use nanomsg to solve disparate database integration

Other related posts: