[mysql-dde] Re: Connection Fail

  • From: "Peter B. Volk" <peter.benjamin.volk@xxxxxxxxxxxxxxxxx>
  • To: <mysql-dde@xxxxxxxxxxxxx>
  • Date: Thu, 8 Dec 2005 22:37:51 +0100

  ----- Original Message ----- 
  From: Fabricio Mota 
  To: Peter B. Volk ; mim 
  Sent: Tuesday, December 06, 2005 2:44 PM
  Subject: Connection Fail

  Hey Peter,

  I was thinking a bit more about connection liveness and server liveness 
management, and I've got another problem: visibility is a bidiretional 
  That is: suppose a cluster with 3 servers, S1, S2 and S3. At the first 
moment, everybody sees everybody.
  But in a second moment, due a network transient, S1 stops to see S2, although 
S2 still sees S1. That means, only connection from S1 (as client) to S2 (as 
server) failed. So, I think that there are no Islands here, only a non-complete 
graph will be found.

  Of course, S3 will not be prohibited to perform any operation, because it 
completely sees everybody. 

  But what about S1 and S2?

  Well, if we (eventually) consider to use any billateral management, so S2 
will be also able to perform operations origined by it. That's because it also 
sees everybody.

  But S1 will try perform active-cluster operations under late-synchronization 
proceeding. So, it will do that is needed communicating with S3, and could save 
a buffer to inform modifications to S2 later. 

  It could be dangerous if we imagine a RDD updating example, as above:

  1. S1 updates MyRDDTable, because it is protected under majority-criteria law.
    1.1 S1 updates itself;
    1.2 S1 sends update command to S3;
    1.3 S1 creates a buffer (late sync) for when S2 come active again;

  2. And a little after, S2 try to update MyRDDTable, since it is protected 
under majority-criteria law, too.
    1.1 S2 updates itself (before to receive buffered command from S1, it's 
very bad!);
    1.2 S2 sends update command to S3;
    1.3 S2 sends update command to S1 (ishhhh);

  How do you suggest to solve it?

  [Peter] Hmmm good question. Once S1 doesnt see S2 anymore S2 is suspected by 
S1 to be down and communicates this to S3 since S3 can see all of them ther is 
a classification of this error -> networkerror. Then S1-3 should decide on who 
should be taken out of the cluster S1 or S2. 

  How does that sound?


Other related posts: