On 01/06/2018 21:35, Ilya Markov wrote:
This is an automated email from the git hooks/post-receive script.
IlyaMarkovMipt pushed a commit to branch gh-3098-remapping-replicas
in repository tarantool.
commit 54600183d2a6e9d2ff44c92284259b50a4bc3d46
Author: Ilya Markov <imarkov@xxxxxxxxxxxxx>
AuthorDate: Fri Jun 1 13:08:28 2018 +0300
Add mapping rfc
---
doc/rfc/3098-replicas-id-remapping.md | 120 ++++++++++++++++++++++++++++++++++
1 file changed, 120 insertions(+)
diff --git a/doc/rfc/3098-replicas-id-remapping.md
b/doc/rfc/3098-replicas-id-remapping.md
new file mode 100644
index 0000000..3ae1254
--- /dev/null
+++ b/doc/rfc/3098-replicas-id-remapping.md
@@ -0,0 +1,120 @@
+## Problems and ways to overcome them
+
+1. Problem with primary key in _cluster. So far primary key in _cluster is
replica_id.
+But as we want to update inside before triggers according to our local replica
id assigning,
+ we need to update this field. Nevertheless, it's prohibited to update primary
key field inside before_triggers.
+
+*Solution*:
+ That's why we alter primary index to indexing uuid field. The second index we
alter to indexing replica_id.
+
+
+2. Problem with simultaneous appliers. When several appliers exist in one
moment, several triggers
+are set and each of them will be called. The problem is when the new tuple is
delivered,
+we want to handle it only once, therewith by the trigger set by the applier
+ for which tuple has come for.
+Therefore, we need to map tuples to appliers inside triggers.> +
+*Solution*:
+The idea we decided to implement is to add third field in tuples representing
the uuid of replica it was sent.
+With that we can decide whether this tuple was sent to the applier for which
trigger was called,
+simply comparing third field of tuple with applier->uuid.
+
+3. Before triggers are not called on join operation, so we don't update some
of our _cluster meta data.
+ It's not a problem for mappings, because the joining replica doesn't have
_cluster at all.
+ But it's problem for local replica id counter. It should be updated on each
new replica added.
+
+ *Solution*: On the call of _cluster trigger(the one is not assigned to any
applier),
+ we check if we have already updated local replica id counter.
+ If yes, we use its value.
+ Otherwise, we use the maximum replica id from _cluster table.
+
+ Also the problem here is that the third field is not updated on join.
+ But as such not-updated tuple are written in snapshots and in future can be
handled only in join again,
+ this field will be unnecessary.
+
+4. When should we set up the triggers? The initial data reception at join
phase does not require mapping
+ because within that phase node doesn't have an empty _cluster. But on
subscribe or on recovery triggers are required.
+
+*Solution*: Trigger used for global counter is set on bootstrap,
+the others are set either in join after initial data receiving, or in
subscribe phase.
+
+5. How to handle global counter? Global counter is used to assign new replicas
ids.
+We have to assign it unique in order not to overlap it with other alive and
disabled replicas.
+
+*Solution* Let's assign replica counter `RC`.
+On new replica registration we calculate `RC = max(max_id(_cluster), RC) + 1`
+With this formula we take into account the fact that triggers are not called
on initial data reception during join phase,
+and the fact that replicas may be deleted.
+
+6. Another issue is the tuples whose third field(source uuid) is unknown for
replica.
+
+In this case we would spoil _cluster, because we don't have trigger to handle
this tuple.
+
+*Solution*: Skip such tuples. We need this tuples from _cluster mostly only
for tracking vclocks.
+But if replica doesn't have applier for the replica with such uuid then this
replica should not be vclock representation.
+
+## Alternatives
+
+Possible alternative was to use the uniqueness of UUID and
+ store uuid instead of replica id in vclocks and xrows. In this way, there
would be no need in remapping, as we could easily distinguish the replica.
+ But the approach consumes much more memory and message size than previous one.
+ Size of uuids is bigger in magnitude than simple identifiers.