On Thu, Sep 27, 2018 at 01:27:23PM +0300, Olga Arkhangelskaia wrote:
27/09/2018 12:04, Vladimir Davydov пишет:
On Thu, Sep 27, 2018 at 10:37:57AM +0300, Olga Arkhangelskaia wrote:It does not. Just explain why I did it in such a way.
How does it contradict what I said?
26/09/2018 17:46, Vladimir Davydov пишет:
On Fri, Sep 21, 2018 at 09:25:03PM +0300, Olga Arkhangelskaia wrote:It is easy to change, but as I understood we need to throw away replica.
Adds possibility to get list of alive replicas in a replicaset,A documentation request with the new API description is missing.
prune from box.space_cluster those who is not considered as alive,
and if one has doubts see state of replicaset.
Replica is considered alive if it is just added, its status after
timeout period is not stopped or disconnected. However it it has both
roles (master and replica) we consider such instance dead only if its
upstream and downstream status is stopped or disconnected.
If replica is considered dead we can prune its uuid from _cluster
space.
If one not sure if the replica is dead or is there is any activity on
it
it is possible to list replicas with its role, status and lsn
statistics.
If you have some ideas how else we can/should decide whether replica
is dead
please share.
Closes #3110
---
https://github.com/tarantool/tarantool/issues/3110
https://github.com/tarantool/tarantool/tree/OKriw/gh-3110-prune-dead-replica-from-replicaset-1.10
Tests don't pass on Travis CI.
Regarding the code:
1. Why do you add a function that lists *alive* replicas? The issue
author didn't ask for that. He asked for a script that would
delete
dead replicas from the _cluster system space. We might want to
add a
function that would list *dead* replicas so that he/she could
check
what replicas would be deleted (aka "dry run"), but it doesn't
make
sense to list alive replicas.
I still did not get "active". Is it when writes occure at the specified
Real time when the replica was last active.2. Dead replica detection is utterly ridiculuous: the functionsSo we need changes in core code? About lastt time of activity, what do you
sleeps
for the given amount of time and then deletes inactive replicas.
As a user, I'd want to have an ability to delete replicas that
have
been inactive for, say, a day. Does this mean that I have to wait
for a whole day before this function completes? Obviously, no.
I guess tarantool core should keep track of the time each replica
was active
mean? Lasn change, vclock, status?
replica?
And last time it got updates from master?
I think I will write you in privat to discuss.
If replica is dead for long perion of time we can see its status. And as IDisconnected status only means that the replica is not available
undestand we have heartbeat to monitor the connecion, so if there is
problems with it - we see status.
right now. We want to delete replicas that haven't been active for
the specified amount of time, say a day or even a week.
Good point. I will try to do.
BTW, forgot to mention: this function should probably be defined in
box.ctl.
Why ctl?