[tarantool-patches] Re: [PATCH v8 1/3] box: factor fiber_gc out of txn_commit

From: Vladislav Shpilevoy <v.shpilevoy@xxxxxxxxxxxxx>
To: "n.pettik" <korablev@xxxxxxxxxxxxx>, tarantool-patches@xxxxxxxxxxxxx
Date: Wed, 31 Oct 2018 00:32:38 +0300

On 30/10/2018 23:06, Vladislav Shpilevoy wrote:

On 30/10/2018 23:03, Vladislav Shpilevoy wrote:

Thanks for the review!

On 30/10/2018 17:30, n.pettik wrote:

On 29 Oct 2018, at 20:33, imeevma@xxxxxxxxxxxxx wrote:

Now txn_commit is judge, jury and executioner. It both
commits or rollbacks data, and collects it calling fiber_gc,
which destroys the region.

Nit: both commits and rollbacks.

Fixed.

But SQL wants to use some transactional data after commit. It is
autogenerated identifiers - a list of sequence values generated
for autoincrement columns and explicit sequence:next() calls.

It is possible to store the list on malloced mem inside Vdbe, but
it complicates deallocation.

What is the problem with deallocation? AFAIU it is enough to
simply iterate over the list and release each element - not big deal.

If you want to use region, mb it is worth to store separate region
specially for VDBE? We already have it in parser, so what prevents
us for adding the same thing to VDBE? I guess we can store many
things there, not only list of ids. I understand that parser in its turn
has nothing in common (at least it should, except for analyze machinery)
with transaction routines, so separate region is likely to be more
reasonable for parser, but anyway...

I've decided to say more details. Parser never yields. This is why we can
waste here any resources, rack and ruin everything, but at the end of
parsing it should be returned back.

Vdbe, on the contrary, yields. So it holds some system resources while
other fibers can not use them. If we added a special region to Vdbe, it
would steal slabs from the thread's slab cache, while other fibers may
want to use it. Hence, when we use one region for all transactional data,
including language specific, allocations are much less fragmented over
different slabs.

Is this explanation decent?

Also, I do not agree, that 'deallocation is just iteration and it is
ok'. It is O(n) iteration and freeing of heap objects. If a one inserted
10k rows with autogenerated ids, it would waste 10k heap fragments,
10k calls of malloc/free - in my opinion it is an abysmal overhead, but
what is more, it can be avoided for free. Instead of 10k free() it boils
down to deallocation of N slabs, where N = slab_size / (10k * 8); 8 - size
of autogenerated it; slab size is at least 64Kb, so N = 64*1024/80000 < 1.
It takes 1 deallocation vs 10k deallocations. So I think this refactoring
is worth.

Sorry, an error. N = 10Kb * 8 / slab_size ~= 2. Versus 10k still is
significant.

10Kb -> 10k, sorry again. I should go sleep ...

References:
- [tarantool-patches] [PATCH v8 0/3] sql: return all generated ids via IPROTO
  - From: imeevma
- [tarantool-patches] [PATCH v8 1/3] box: factor fiber_gc out of txn_commit
  - From: imeevma
- [tarantool-patches] Re: [PATCH v8 1/3] box: factor fiber_gc out of txn_commit
  - From: n.pettik
- [tarantool-patches] Re: [PATCH v8 1/3] box: factor fiber_gc out of txn_commit
  - From: Vladislav Shpilevoy
- [tarantool-patches] Re: [PATCH v8 1/3] box: factor fiber_gc out of txn_commit
  - From: Vladislav Shpilevoy

[tarantool-patches] Re: [PATCH v8 1/3] box: factor fiber_gc out of txn_commit

Other related posts: