[procps] Re: [OmegaPhil@xxxxxxxxxxxxx: Bug#799716: free considers 'cached' to include SUnreclaim]

From: OmegaPhil <OmegaPhil@xxxxxxxxxxxxx>
To: procps@xxxxxxxxxxxxx
Date: Thu, 31 Mar 2016 20:42:36 +0100

On 30/01/16 20:16, OmegaPhil wrote:

On 14/01/16 19:23, OmegaPhil wrote:

On 13/01/16 16:10, Jaromir Capik wrote:

I'm going into this without re-familiarising and looking at some ~v4.1
kernel source, but looking at fs/proc/meminfo.c:meminfo_proc_show, I
think the reason I didn't jump on Available as The Solution previously was:

o It mentions the amount of available RAM before the system starts to
swap - thats not really the question I'm interested in, otherwise you'd
need to factor in vm swappiness in - the question is literally, 'what
free memory is available', so attempting to allocate 1MB with <1MB free
would lead to OOM killing etc.

You're right, the description is not accurate. I believe it is valid for
the worst case scenario with vm swappiness 0. The description says it's
just an estimation, but it's more useful than the pervasive math with
cached, that was used by the users for years :] It's also the reason
why the "-/+ buffers/cache" line was removed from the 'free' tool.

o The pagecache calculation is extremely vague, its literally guessing
here:

'Assume at least half of the page cache, or the
* low watermark worth of cache, needs to stay'

And this goes back to the previous problem - if lots of shmem is used
and is not reclaimable, this is in the page cache and is therefore
invalidly regarded as 'available' - am I reading this wrong?

Yes, you're reading it wrong. Try to create a big file in tmpfs
and then check the MemAvailable. You'll see it decreasing.
AFAIK the unreclaimable parts of cache are excluded and the
MemAvailable only considers parts of the cache which can be
released by the system automatically when the amount of free memory
gets low.

Regards,
Jaromir.

Thanks - I am queuing up another look at this so I can answer stuff
properly - the tmpfs/shmem stuff is based off this blog article:

http://calimeroteknik.free.fr/blag/?article20/really-used-memory-on-gnu-linux

When I reach this I'll recreate what hes doing there.

Right, finally been able to concentrate on this (didn't need to test
with postgres in the end, just diving in the code and playing with
writing files in tmpfs from /dev/urandom etc).

Looking back at meminfo_proc_show, line ~74 of meminfo.c - the comment
says 'at least half of the page cache, or the low watermark worth of
cache, needs to stay' - this naturally made me think that the max of
those two values - so half the page cache - would 'stay', which is a
large value. However the code uses min - so wmark_low will win out in
pretty much any case. I patched meminfo.c to report on this value, and
it is very low - 7157KB currently after 7.5h uptime (32GB RAM), and I
don't expect it to change - so even though wmark_low is removed from
available three times (line ~67, ~76, ~83), thats still a trivial
discrepancy.

On the 'shmem is in cached and therefore is in available' point,
MemAvailable doesn't actually care about cached - it determines
pagecache from:

==================================================

pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE]

==================================================

This is the real cache (i.e. evictable file-backed stuff, from what I
can tell) - and therefore shmem doesn't come into it, since its nothing
to do with real files.

Looking back at cached:

==================================================

global_page_state(NR_FILE_PAGES)

==================================================

Since the 'zone_stat_item' NR_FILE_PAGES includes the word FILE, I
originally thought that this only included file-backed stuff, however
that is wrong. tmpfs is implemented in 'mm/shmem.c' - when tmpfs wants
to fetch a page from swap or allocate a page, it calls
shmem_getpage_gfp, which in turn calls shmem_add_to_page_cache.
Critically this function increments two vmstat counters when it succeeds
in getting a page:

==================================================

__inc_zone_page_state(page, NR_FILE_PAGES);
__inc_zone_page_state(page, NR_SHMEM);

==================================================

The top one therefore includes the new page in 'Cached', and the bottom
one tracks it separately in the usual 'Shmem'. So this proves that
Cached includes pages that are not evictable data, and therefore
shouldn't be counted as 'cache'.

Using the traditional free memory calculation, even taking Shmem away
from Cached in the usual meminfo output does not get a result that
agrees with MemAvailable, although since the discrepancy is below 100MB
its small enough for me:

==================================================

Used memory: 5650.8MB
Based off MemAvailable: 5734.73MB
Discrepancy: 83.9219MB
shmem: 180.969MB
wmark_low: 6.98926MB, 3 wmark_lows are added in MemAvailable: 20.9678MB

==================================================

See attached trivial awk script for quick calculation of this, although
normal people won't have wmark_low available - run it with:

==================================================

awk -f <script path> /proc/meminfo

==================================================

I'm guessing thats demonstrating ~60MB of other stuff that might be
'genuinely used' memory and therefore not cache, but I'm leaving that alone.

Back to my earlier comment of Cached being 'irretrievably broken' - what
I have found backs this up for me (why introduce MemAvailable when you
can fix Cached etc?) - Jaromir, asking you since you've said the bug is
private, firstly can you confirm the definition of cache for computing
as a store of data that is evictable because it can be trivially
recalculated/read in again
(https://en.wikipedia.org/wiki/Cache_%28computing%29)? This is the
'design problem', which probably hinges on the definition of a cache.

The problem with introducing MemAvailable is that nothing knows about
it, and therefore everything has to be changed - everything trusts that
Cached etc is correct.

Thanks

This seems to have been ignored - but finishing off regardless - after a
month of uptime, wmark_low does not change, so the conclusion is it only
results in a very small discrepancy.

To recap, shmem is included in Cached but is 'genuinely used memory',
which is not what cache means. free (in sysinfo.c:meminfo) then uses
Cached as kb_page_cache, and takes it away from kb_main_used -
kb_main_shared should be taken away from kb_page_cache first.

MemAvailable: Still waiting on a response to what I've said before etc.

Attachment: signature.asc
Description: OpenPGP digital signature

[procps] Re: [OmegaPhil@xxxxxxxxxxxxx: Bug#799716: free considers 'cached' to include SUnreclaim]

Other related posts: