[tarantool-patches] Re: [PATCH 4/4] Introduce storage reload evolution

  • From: Vladislav Shpilevoy <v.shpilevoy@xxxxxxxxxxxxx>
  • To: tarantool-patches@xxxxxxxxxxxxx, AKhatskevich <avkhatskevich@xxxxxxxxxxxxx>
  • Date: Mon, 23 Jul 2018 17:44:26 +0300

Thanks for the patch! See 4 comments below.

On 23/07/2018 14:14, AKhatskevich wrote:

Changes:
1. Introduce storage reload evolution.
2. Setup cross-version reload testing.

1:
This mechanism updates Lua objects on reload in case they are
changed in a new vshard.storage version.

Since this commit, any change in vshard.storage.M has to be
reflected in vshard.storage.reload_evolution to guarantee
correct reload.

2:
The testing uses git infrastructure and is performed in the following
way:
1. Copy old version of vshard to a temp folder.
2. Run vshard on this code.
3. Checkout the latest version of the vshard sources.
4. Reload vshard storage.
5. Make sure it works (Perform simple tests).

Notes:
* this patch contains some legacy-driven decisions:
   1. SOURCEDIR path retrieved differently in case of
      packpack build.
   2. git directory in the `reload_evolution/storage` test
      is copied with respect to Centos 7 and `ro` mode of
      SOURCEDIR.

diff --git a/test/reload_evolution/storage.result 
b/test/reload_evolution/storage.result
new file mode 100644
index 0000000..54ff6b7
--- /dev/null
+++ b/test/reload_evolution/storage.result
@@ -0,0 +1,248 @@
+test_run = require('test_run').new()
+---
+...
+git_util = require('lua_libs.git_util')
+---
+...
+util = require('lua_libs.util')
+---
+...
+vshard_copy_path = util.BUILDDIR .. '/test/var/vshard_git_tree_copy'
+---
+...
+evolution_log = 
git_util.log_hashes({args='vshard/storage/reload_evolution.lua', 
dir=util.SOURCEDIR})
+---
+...
+-- Cleanup the directory after a previous build.
+_ = os.execute('rm -rf ' .. vshard_copy_path)
+---
+...
+-- 1. `git worktree` cannot be used because PACKPACK mounts
+-- `/source/` in `ro` mode.
+-- 2. Just `cp -rf` cannot be used due to a little different
+-- behavior in Centos 7.
+_ = os.execute('mkdir ' .. vshard_copy_path)
+---
+...
+_ = os.execute("cd " .. util.SOURCEDIR .. ' && cp -rf `ls -A --ignore=build` ' 
.. vshard_copy_path)
+---
+...
+-- Checkout the first commit with a reload_evolution mechanism.
+git_util.exec_cmd({cmd='checkout', args='-f', dir=vshard_copy_path})
+---
+...
+git_util.exec_cmd({cmd='checkout', args=evolution_log[#evolution_log] .. '~1', 
dir=vshard_copy_path})
+---
+...
+REPLICASET_1 = { 'storage_1_a', 'storage_1_b' }
+---
+...
+REPLICASET_2 = { 'storage_2_a', 'storage_2_b' }
+---
+...
+test_run:create_cluster(REPLICASET_1, 'reload_evolution')
+---
+...
+test_run:create_cluster(REPLICASET_2, 'reload_evolution')
+---
+...
+util = require('lua_libs.util')
+---
+...
+util.wait_master(test_run, REPLICASET_1, 'storage_1_a')
+---
+...
+util.wait_master(test_run, REPLICASET_2, 'storage_2_a')
+---
+...
+test_run:switch('storage_1_a')
+---
+- true
+...
+vshard.storage.bucket_force_create(1, vshard.consts.DEFAULT_BUCKET_COUNT / 2)
+---
+- true
+...
+bucket_id_to_move = vshard.consts.DEFAULT_BUCKET_COUNT
+---
+...
+test_run:switch('storage_2_a')
+---
+- true
+...
+fiber = require('fiber')
+---
+...
+vshard.storage.bucket_force_create(vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1, 
vshard.consts.DEFAULT_BUCKET_COUNT / 2)
+---
+- true
+...
+bucket_id_to_move = vshard.consts.DEFAULT_BUCKET_COUNT
+---
+...
+vshard.storage.internal.reload_evolution_version
+---
+- null
+...
+box.space.test:insert({42, bucket_id_to_move})
+---
+- [42, 3000]
+...
+while test_run:grep_log('storage_2_a', 'The cluster is balanced ok') == nil do 
vshard.storage.rebalancer_wakeup() fiber.sleep(0.1) end

1. Now you have wait_rebalancer_state util from the previous commit.

+---
+...
+test_run:switch('default')
+---
+- true
+...
+git_util.exec_cmd({cmd='checkout', args=evolution_log[1], 
dir=vshard_copy_path})
+---
+...
+test_run:switch('storage_2_a')
+---
+- true
+...
+package.loaded["vshard.storage"] = nil
+---
+...
+vshard.storage = require("vshard.storage")
+---
+...
+test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution: upgraded 
to') ~= nil
+---
+- true
+...
+vshard.storage.internal.reload_evolution_version
+---
+- 1
+...
+-- Make sure storage operates well.
+vshard.storage.bucket_force_drop(2000)
+---
+- true
+...
+vshard.storage.bucket_force_create(2000)
+---
+- true
+...
+vshard.storage.buckets_info()[2000]
+---
+- status: active
+  id: 2000
+...
+vshard.storage.call(bucket_id_to_move, 'read', 'do_select', {42})
+---
+- true
+- - [42, 3000]
+...
+vshard.storage.bucket_send(bucket_id_to_move, replicaset1_uuid)
+---
+- true
+...
+vshard.storage.garbage_collector_wakeup()
+---
+...
+fiber = require('fiber')
+---
+...
+while box.space._bucket:get({bucket_id_to_move}) do fiber.sleep(0.01) end
+---
+...
+test_run:switch('storage_1_a')
+---
+- true
+...
+vshard.storage.bucket_send(bucket_id_to_move, replicaset2_uuid)
+---
+- true
+...
+test_run:switch('storage_2_a')
+---
+- true
+...
+vshard.storage.call(bucket_id_to_move, 'read', 'do_select', {42})
+---
+- true
+- - [42, 3000]
+...
+-- Check info() does not fail.
+vshard.storage.info() ~= nil
+---
+- true
+...
+--
+-- Send buckets to create a disbalance. Wait until the rebalancer
+-- repairs it. Similar to `tests/rebalancer/rebalancer.test.lua`.
+--
+vshard.storage.rebalancer_disable()
+---
+...
+move_start = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1
+---
+...
+move_cnt = 100
+---
+...
+assert(move_start + move_cnt < vshard.consts.DEFAULT_BUCKET_COUNT)
+---
+- true
+...
+for i = move_start, move_start + move_cnt - 1 do box.space._bucket:delete{i} 
end
+---
+...
+box.space._bucket.index.status:count({vshard.consts.BUCKET.ACTIVE})
+---
+- 1400
+...
+test_run:switch('storage_1_a')
+---
+- true
+...
+move_start = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1
+---
+...
+move_cnt = 100
+---
+...
+vshard.storage.bucket_force_create(move_start, move_cnt)
+---
+- true
+...
+box.space._bucket.index.status:count({vshard.consts.BUCKET.ACTIVE})
+---
+- 1600
+...
+test_run:switch('storage_2_a')
+---
+- true
+...
+vshard.storage.rebalancer_enable()
+---
+...
+vshard.storage.rebalancer_wakeup()

2. You do not need explicit rebalancer_wakeup. wait_rebalancer_state
calls it.

+---
+...
+wait_rebalancer_state("Rebalance routes are sent", test_run)
+---
+...
+wait_rebalancer_state('The cluster is balanced ok', test_run)
+---
+...
+box.space._bucket.index.status:count({vshard.consts.BUCKET.ACTIVE})
+---
+- 1500
+...
+test_run:switch('default')
+---
+- true
+...
+test_run:drop_cluster(REPLICASET_2)
+---
+...
+test_run:drop_cluster(REPLICASET_1)
+---
+...
+test_run:cmd('clear filter')
+---
+- true
+...
diff --git a/test/unit/reload_evolution.result 
b/test/unit/reload_evolution.result
new file mode 100644
index 0000000..342ac24
--- /dev/null
+++ b/test/unit/reload_evolution.result
@@ -0,0 +1,45 @@
+test_run = require('test_run').new()
+---
+...
+fiber = require('fiber')
+---
+...
+log = require('log')
+---
+...
+util = require('util')
+---
+...
+reload_evolution = require('vshard.storage.reload_evolution')
+---
+...
+-- Init with the latest version.
+fake_M = { reload_evolution_version = reload_evolution.version }
+---
+...
+-- Test reload to the same version.
+reload_evolution.upgrade(fake_M)
+---
+...
+test_run:grep_log('default', 'vshard.storage.evolution') == nil
+---
+- true
+...
+-- Test downgrage version.
+log.info(string.rep('a', 1000))
+---
+...
+fake_M.reload_evolution_version = fake_M.reload_evolution_version + 1
+---
+...
+err = util.check_error(reload_evolution.upgrade, fake_M)
+---
+...
+err:match('auto%-downgrade is not implemented')
+---
+- auto-downgrade is not implemented

3. Why do you need match? check_error output is ok already. And
what is 'auto%'? I see that you always print exactly "auto-downgrade"
in reload_evolution.lua.

+...
+test_run:grep_log('default', 'vshard.storage.evolution', 1000) ~= nil
+---
+- false
+...
@@ -105,6 +110,11 @@ if not M then
          -- a destination replicaset must drop already received
          -- data.
          rebalancer_sending_bucket = 0,
+
+        ------------------------- Reload -------------------------
+        -- Version of the loaded module. This number is used on
+        -- reload to determine which upgrade scripts to run.
+        reload_evolution_version = reload_evolution.version,

4. Please, rename it to just 'version' or 'reload_version' or
'module_version'. 'Reload_evolution_version' is too long and
complex.

      }
  end

Other related posts: