[haiku-bugs] Re: [Haiku] #5816: Kernel out of memory

  • From: "bonefish" <trac@xxxxxxxxxxxx>
  • Date: Wed, 16 Jun 2010 23:40:00 -0000

#5816: Kernel out of memory
---------------------------+------------------------------------------------
 Reporter:  axeld          |       Owner:  bonefish      
     Type:  bug            |      Status:  assigned      
 Priority:  normal         |   Milestone:  R1/alpha3     
Component:  System/Kernel  |     Version:  R1/Development
 Keywords:                 |   Blockedby:                
    Patch:  0              |    Platform:  All           
 Blocking:  4963           |  
---------------------------+------------------------------------------------

Comment(by bonefish):

 Just finished running the test again with some more debug output. The low
 resource state transitions are logged and block_cache::_LowMemoryHandler()
 prints how the RemoveUnusedBlocks() changed the number of unused blocks.
 The test started again at about 2-3 MB/s and -- I hadn't noticed this
 before -- after a few seconds reaches even > 10 MB/s for a short time.
 Afterwards it varies from 1 to 10 MB/s until the address space "note" low
 resource state is reached. Then the throughput drops immediately to < 1
 MB, varying from 10 KB/s to a few 100 KB/s, only from time to time
 recovering to 1 or 2 MB/s (sometimes even more) for a few seconds.

 Here's the output from around the point where the low resource limit is
 hit:
 {{{
 heap_add_area: area 3649 added to small heap 0xd090c000 - usable range
 0xc2c07000 - 0xc3000000
 slab memory manager: created area 0xc5800000 (3650)
 slab memory manager: created area 0xc6000000 (3651)
 low resource address space: normal -> note
 block_cache::_LowMemoryHandler(): 0xcd92a198: unused: 25997 -> 25997
 block_cache::_LowMemoryHandler(): 0xcd92a7f8: unused: 3 -> 0
 block_cache::_LowMemoryHandler(): 0xccd27b28: unused: 313434 -> 313384
 block_cache::_LowMemoryHandler(): 0xcfd1f660: unused: 13 -> 0
 block_cache::_LowMemoryHandler(): 0xcd92a198: unused: 2598 -> 2598
 block_cache::_LowMemoryHandler(): 0xcd92a7f8: unused: 0 -> 0
 block_cache::_LowMemoryHandler(): 0xccd27b28: unused: 311848 -> 311798
 block_cache::_LowMemoryHandler(): 0xcfd1f660: unused: 0 -> 0
 block_cache::_LowMemoryHandler(): 0xcd92a198: unused: 22 -> 22
 block_cache::_LowMemoryHandler(): 0xcd92a7f8: unused: 0 -> 0
 block_cache::_LowMemoryHandler(): 0xccd27b28: unused: 311378 -> 311328
 block_cache::_LowMemoryHandler(): 0xcfd1f660: unused: 0 -> 0
 block_cache::_LowMemoryHandler(): 0xcd92a198: unused: 0 -> 0
 block_cache::_LowMemoryHandler(): 0xcd92a7f8: unused: 0 -> 0
 block_cache::_LowMemoryHandler(): 0xccd27b28: unused: 311244 -> 311194
 block_cache::_LowMemoryHandler(): 0xcfd1f660: unused: 0 -> 0
 }}}
 The first block cache (0xcd92a198) is the one of the volume I'm copying,
 the third one (0xccd27b28) from the target volume (both on the same disk).
 As can be seen Tracker's pre-flight phase has let the number of unused
 blocks of the source volume grow to > 300000. The 25997 blocks of the
 target volume are probably mostly the ones that have already been written.

 After only three seconds the target cache has been drained completely.
 RemoveUnusedBlocks() didn't remove a single block, though, since at level
 "note" only blocks older than two minutes are removed. The reason that all
 blocks are removed nonetheless is block_cache::Allocate(), which removes 2
 unused blocks each time it is called. And since the target volume is quite
 active -- lots of writing and transactions going on -- that happens quite
 often. This would also explain the abysmal performance from that point on,
 since all blocks that become unused even for a short time are thrown away
 and have to be re-read from disk when they are needed again. The low
 resource behavior definitely needs to be rethought.

 The source volume cache lost at max a few hundred blocks per second. It
 took about 1:45 h to drop to 0. Interestingly the whole time not a single
 slab are was released -- which must be a bug/leak somewhere -- and the
 heap kept growing -- likely for vnodes, file caches and BFS data
 structures, which don't care about the address space state. After about
 2:20 h (80 % copied) the heap grower failed to allocate a heap area.
 {{{
 kdebug> low_resource
 current state: ---a
   pages:  normal (723271)
   memory: normal (3025899520)
   sems:   normal (64352)
   aspace: note (28450816)

 function    data         resources  prio  function-name
 0x800df218  0x00000000   p m   a         5
 object_cache_low_memory__FPvUll
 0x800bad8c  0x00000000   p m             0
 vnode_low_resource_handler__FPvUll
 0x8003d044  0xcd92a198   p m   a         0
 _LowMemoryHandler__11block_cachePvUll
 0x8003d044  0xcd92a7f8   p m   a         0
 _LowMemoryHandler__11block_cachePvUll
 0x8003d044  0xccd27b28   p m   a         0
 _LowMemoryHandler__11block_cachePvUll
 0x8003d044  0xcfd1f660   p m   a         0
 _LowMemoryHandler__11block_cachePvUll
 }}}
 Since there are still 28 MB address space left (probably fragmented or
 partially reserved), the address space low resource state is still "note".
 "Warning" state is currently reached at 8 MB, "critical" at utterly
 useless 32 KB. This is obviously something that needs to be changed.

 As for the block caches:
 {{{
 kdebug> block_cache 0xcd92a198
 BLOCK CACHE: 0xcd92a198
  fd:           0
  max_blocks:   26214048
  block_size:   2048
  next_transaction_id: 198
  buffer_cache: 0x828a1dc0
  busy_reading: 0, no waiters
  busy_writing: 0, no waiters
  797 blocks total, 311 dirty, 0 discarded, 0 referenced, 0 busy, 486 in
 unused.
 kdebug> block_cache 0xccd27b28
 BLOCK CACHE: 0xccd27b28
  fd:           7
  max_blocks:   26214048
  block_size:   2048
  next_transaction_id: 4
  buffer_cache: 0x828a6a00
  busy_reading: 0, no waiters
  busy_writing: 0, no waiters
  4 blocks total, 3 dirty, 0 discarded, 0 referenced, 0 busy, 1 in unused.
 }}}
 They don't know a lot of blocks.

 The slabs:
 {{{
 kdebug> slabs
    address                   name  objsize    usage  empty  usedobj
 total    flags
 [...]
 0x82808d00          cached blocks       64 12124160    178      814
 189255 20000000
 0x828a1dc0    block cache buffers     2048 122683392    192     1622
 59904 20000000
 0x828a5000    block cache buffers     4096   524288      0        2
 128 20000000
 0x828a6a00    block cache buffers     2048 642777088   1225       11
 313856 20000000
 0x828a6280    block cache buffers     2048   524288      1        0
 256 20000000
 kdebug> slab_cache 0x82808d00
 name:              cached blocks
 lock:              0x82808d28
 object_size:       64
 cache_color_cycle: 0
 total_objects:     189255
 used_count:        814
 empty_count:       178
 pressure:          354
 slab_size:         65536
 usage:             12124160
 maximum:           0
 flags:             0x20000000
 cookie:            0x00000000
 resize entry don't wait: 0x00000000
 resize entry can wait:   0x00000000
   slab        chunk         size   used offset  free
 empty:
   0xe902ffe4  0xe9020000    1023   1023      0  0xe90217fc
   0xe804ffe4  0xe8040000    1023   1023      0  0xe8044afc
   0x82efffe4  0x82ef0000    1023   1023      0  0x82efbebc
 [...]
 partial:
   0xda05ffe4  0xda050000    1023    491      0  0xda057cbc
   0xe085ffe4  0xe0850000    1023   1011      0  0xe085733c
   0x8291ffe4  0x82910000    1023   1010      0  0x8291927c
   0xdf04ffe4  0xdf040000    1023   1018      0  0xdf047ebc
   0xdd81ffe4  0xdd810000    1023    926      0  0xdd8151bc
   0xd905ffe4  0xd9050000    1023    944      0  0xd905eafc
   0xe105ffe4  0xe1050000    1023    947      0  0xe105773c
 depot:
   full:     0x00000000, count 16
   empty:    0x00000000, count 0
   max full: 16
   capacity: 32
   stores:
   [0] loaded:   0x00000000
       previous: 0x00000000
   [1] loaded:   0x00000000
       previous: 0x00000000
   [2] loaded:   0x828ce140
       previous: 0x00000000
   [3] loaded:   0x00000000
       previous: 0x00000000
   [4] loaded:   0x828ce780
       previous: 0x00000000
   [5] loaded:   0x828ceaa0
       previous: 0x00000000
   [6] loaded:   0x00000000
       previous: 0x00000000
   [7] loaded:   0x828ce8c0
       previous: 0x00000000
 kdebug> slab_cache 0x828a6a00
 name:              block cache buffers
 lock:              0x828a6a28
 object_size:       2048
 cache_color_cycle: 0
 total_objects:     313856
 used_count:        11
 empty_count:       1225
 pressure:          2449
 slab_size:         524288
 usage:             642777088
 maximum:           0
 flags:             0x20000000
 cookie:            0x00000000
 resize entry don't wait: 0x00000000
 resize entry can wait:   0x00000000
   slab        chunk         size   used offset  free
 empty:
   0x828af300  0xde200000     256    256      0  0xde27e7fc
   0x828af240  0xde500000     256    256      0  0xde559ffc
   0x828af1c0  0xde700000     256    256      0  0xde730ffc
 [...]
 partial:
   0x828059e0  0x82d00000     256    245      0  0x82d16ffc
 depot:
   full:     0x00000000, count 4
   empty:    0x00000000, count 0
   max full: 4
   capacity: 8
   stores:
   [0] loaded:   0x00000000
       previous: 0x00000000
   [1] loaded:   0x00000000
       previous: 0x00000000
   [2] loaded:   0x00000000
       previous: 0x00000000
   [3] loaded:   0x00000000
       previous: 0x00000000
   [4] loaded:   0x00000000
       previous: 0x00000000
   [5] loaded:   0x00000000
       previous: 0x00000000
   [6] loaded:   0x00000000
       previous: 0x00000000
   [7] loaded:   0x00000000
       previous: 0x00000000
 }}}
 Truncated the free slabs lists -- there are loads of them. Apparently
 something goes wrong in the slab allocator's low resource handler. Will
 look into this tomorrow.

-- 
Ticket URL: <http://dev.haiku-os.org/ticket/5816#comment:5>
Haiku <http://dev.haiku-os.org>
Haiku - the operating system.

Other related posts: