[mysql-dde] Re: Adicionar na Especificação

  • From: Fabricio Mota <fabricio.mota@xxxxxxxxx>
  • To: Mim <fabricio.mota@xxxxxxxxx>, mysql-dde@xxxxxxxxxxxxx
  • Date: Thu, 8 Dec 2005 22:31:54 -0300

 Peter,

Boa noite (= good evening)
**
I'm intending to include that in spec, by means to generalize late sync for
other applications. Tell me what you think.

**
*Late Synchronization:*
Assume as *faultous *any server that is:
  - Crashed (down);
  - Byzantine (having unexpected behavior);
  - Out of communication to other servers;

Consider for above a faultous server *S*, member of a* *cluster *C*,
leaving back
an active cluster *C - S. *Consider also the occurrence an operation
*O *involving
a given resource *R*, within a transaction *T. *Hence, due to perform late
synchronization, active cluster *C - S *will always mantain a buffer* B*,
that allows keeping any operation *O* targeted to *S*.
*Lemma 1 - agreement: *Late sync in a cluster *C *will only be allowed for a
server *S *if:
    *1.1: *Active cluster *C - S *agree about *S *fault;
    *1.2: *Server *S, *case being alive, agree about its fault.

Notice that 1.1 is important because to justify the use of a late
synchronization. If S there is no subject to defer to perform the operation,
it should not be deferred. In other hand, 1.2 ensures is need because, once
the decision of altering the distributed resource is taken by the active
cluster, it's very important that no operation could be performed by server
S, that could take it to a different state, in respect to the group.

*Lemma 2 - accuracy: *In a cluster *C*, if a server *S *is faultous, and
resource *R *is considered replied, then the current state of *R *in *S* cannot
be published as an actual state, until it be reintegrated to *C*.

That's important because, if the server* *is faultous, there is no warranty
of communication between it and the other members of the cluster. Once
it may not to be communicable, it might not to assume the correct state, due
to the possibility of changes in the active cluster. So, the correct action
is to block the read of the resource * *until it come back to *C*.

*Lemma 3 - integrity: *In a cluster *C*, if a server *S *is faultous, and
the state of a resource *R *is considered dependent from the state of
the cluster *C*, then the current state of *R *in *S *may not be
modified until it be reintegrated to *C*.

In a faultous server, if a given distributed resource is dependent from the
others servers, then its state may not to reflect the actual state assumed
by the active cluster. So, consider that updating the state of a resource
may to depend on its previous state, which may not to reflect its current
state defined in the active cluster. In other hand, its new change may not
to be reflected in the active cluster, too. So, data integrity may be
violated.


*Lemma 4 - autonomy: *Late sync in a cluster *C*, disposed for a
server *S,*will only be allowed for an operation
*O* over a distributed resource *R, * if *S* is not able, by local means, to
deicide individually if *O *over* R* will be performed or aborted.

*Corollary of Lemma 4: *In a late-synchronizable operation O, only the
remaining group *C - S* could have the autonomy to deicide if *O *over *R *will
be performed or aborted.

In this set of this operations are included those which have impact only
over distributed resources (such as RDD alterations), and no specific local
conditions are able to interrupt the given operation. Thus, the
impossibility of stablish foreign keys between DDE tables and non-DDE tables
is a strong mean to ensure a agreement among all servers.

Some special cases could induce the transaction to fail (if the faultous
server would be alive), such as an Operational System or an Disk fail.
However, this kind of fail could not have impact over the global
consistence. With late sync, it could be possible to repair the server
before reintegrate it to the cluster, and everything come back to the normal
operation.

*Lemma 5 - atomicity: *If a server *S* fails within a transaction *T*, then:
*    5.1* all operations from *T *already performed by *S *must be undone
before it to reintegrate to cluster *C*;
*    5.2* all operations from *T, *known by *C - S *to be already performed
by *S, *must be kept in buffer *B *to be performed by *S *again in future;

Consider that, since S fails, a consistent state must be ensured, both for
the server and for the remaining cluster.

*Lemma 6 - aceptance: *A *server S *must only be accepted as active
member in a cluster *C*, after to perform all pendent operations existent in
buffer *B *from* C, *targeted to *S*.
**
If any operation - because of any subject - could not be performed by the
reintegrating server, then its membership cannot be continued, because its
state may not to satisfy to the state expected from the group. So,
the global consistence may be violated.


Fabricio Mota
Analista de Sistemas
FCPC - Divisão de Planejamento do Cadastro Comercial

Other related posts: