[haiku-gsoc] bfs deadlock in different path of bfs_free_cookie

  • From: "Salvatore Benedetto" <emitrax@xxxxxxxxx>
  • To: haiku-gsoc@xxxxxxxxxxxxx
  • Date: Mon, 18 Aug 2008 17:21:49 +0000

Hi there,

it seems like I discovered a new deadlock in bfs_free_cookie.
Here is the debug output which is better then many words ;-)

kdebug> bt 199
stack trace for thread 199 "tar"
    kernel stack: 0x92111000 to 0x92115000
      user stack: 0x7efef000 to 0x7ffef000
frame            caller     <image>:function + offset
 0 92114bd4 (+  32) 800417ae   <kernel>:context_switch__FP6threadT0 + 0x0026
 1 92114bf4 (+  64) 80041a18   <kernel>:scheduler_reschedule + 0x0248
 2 92114c34 (+  48) 80038ae8   <kernel>:_mutex_lock + 0x0194
 3 92114c64 (+  48) 80037dfb   <kernel>:recursive_lock_lock + 0x0053
 4 92114c94 (+  48) 80597e4b   <bfs>:Lock__7JournalP11Transaction + 0x0023
 5 92114cc4 (+  48) 80598211   <bfs>:Start__11TransactionP6Volumex + 0x0035
 6 92114cf4 (+  64) 8059d1ff
<bfs>:bfs_put_vnode__FP9fs_volumeP8fs_vnodeb + 0x0067
 7 92114d34 (+  48) 80087367   <kernel>:free_vnode__FP5vnodeb + 0x00af
 8 92114d64 (+  48) 80087523
<kernel>:dec_vnode_ref_count__FP5vnodebT1 + 0x013f
 9 92114d94 (+  48) 8008ad5d   <kernel>:put_vnode + 0x0069
10 92114dc4 (+  48) 80598372   <bfs>:_UnlockInodes__11Transaction + 0x008e
11 92114df4 (+ 112) 8059faae
<bfs>:bfs_free_cookie__FP9fs_volumeP8fs_vnodePv + 0x02ea
12 92114e64 (+  48) 8008d3b8
<kernel>:file_free_fd__FP15file_descriptor + 0x0028
13 92114e94 (+  48) 8007cff4   <kernel>:put_fd + 0x0038
14 92114ec4 (+  64) 8007d0de   <kernel>:close_fd_index + 0x0032
15 92114f04 (+  32) 8007dbd9   <kernel>:common_close__Fib + 0x0031
16 92114f24 (+  32) 8007e4b0   <kernel>:_user_close + 0x0018
17 92114f44 (+ 100) 800bae62   <kernel>:pre_syscall_debug_done +
0x0002 (nearest)
user iframe at 0x92114fa8 (end = 0x92115000)
 eax 0x85           ebx 0x2f0d40        ecx 0x7ffeee10   edx 0xffff0104
 esi 0x3cc          edi 0x1802e000      ebp 0x7ffeee3c   esp 0x92114fdc
 eip 0xffff0104  eflags 0x207      user esp 0x7ffeee10
 vector: 0x63, error code: 0x0
18 92114fa8 (+   0) ffff0104
19 7ffeee3c (+  80) 00211f20   <_APP_>:extr_init + 0x0f58 (nearest)
20 7ffeee8c (+  48) 00212c16   <_APP_>:extract_archive + 0x0176
21 7ffeeebc (+ 144) 0021851d   <_APP_>:read_and + 0x01a1
22 7ffeef4c (+  48) 002238c8   <_APP_>:main + 0x0178
23 7ffeef7c (+  48) 00208df7   <_APP_>:_start + 0x005b
24 7ffeefac (+  48) 001008ea   4033:runtime_loader_seg0ro@0x00100000 + 0x8ea
25 7ffeefdc (+   0) 7ffeefec   4032:tar_main_stack@0x7efef000 + 0xffffec

kdebug> thread 199
THREAD: 0x914f0000
id:                 199 (0xc7)
name:               "tar"
all_next:           0x914b0800
team_next:          0x00000000
q_next:             0x80103c40
priority:           10 (next 10)
state:              waiting
next_state:         waiting
cpu:                0x00000000
sig_pending:        0x0 (blocked: 0x0)
in_kernel:          1
waiting for:        mutex 0x90b9734c
fault_handler:      0x00000000
args:               0x90c5bd20 0x00000000
entry:              0x800491e0
team:               0x90b7d174, "tar"
  exit.sem:         2398
  exit.status:      0x0 (No error)
  exit.reason:      0x0
  exit.signal:      0x0
  exit.waiters:
kernel_stack_area:  4030
kernel_stack_base:  0x92111000
user_stack_area:    4032
user_stack_base:    0x7efef000
user_local_storage: 0x7ffef000
kernel_errno:       0x0 (No error)
kernel_time:        25460415
user_time:          5181774
flags:              0x0
architecture dependant section:
    esp: 0x92114b78
    ss: 0x00000010
    fpu_state at 0x914f0380

mutex 0x90b9734c:
  name:            bfs journal
  flags:           0x0
  holder:          226
  waiting threads: 199 5 8 228


kdebug> thread 226
THREAD: 0x91539000
id:                 226 (0xe2)
name:               "mv"
all_next:           0x91515800
team_next:          0x00000000
q_next:             0x914f0000
priority:           10 (next 10)
state:              waiting
next_state:         waiting
cpu:                0x00000000
sig_pending:        0x0 (blocked: 0x0)
in_kernel:          1
waiting for:        rwlock 0x90bd07fc
fault_handler:      0x00000000
args:               0x90cbd528 0x00000000
entry:              0x800491e0
team:               0x90c0d5d0, "mv"
  exit.sem:         2784
  exit.status:      0x0 (No error)
  exit.reason:      0x0
  exit.signal:      0x0
  exit.waiters:
kernel_stack_area:  4549
kernel_stack_base:  0x95131000
user_stack_area:    4551
user_stack_base:    0x7efe7000
user_local_storage: 0x7ffe7000
kernel_errno:       0x0 (No error)
kernel_time:        218515
user_time:          71234
flags:              0x0
architecture dependant section:
    esp: 0x95134bf8
    ss: 0x00000010
    fpu_state at 0x91539380


kdebug> bt 226
stack trace for thread 226 "mv"
    kernel stack: 0x95131000 to 0x95135000
      user stack: 0x7efe7000 to 0x7ffe7000
frame            caller     <image>:function + offset
 0 95134c54 (+  32) 800417ae   <kernel>:context_switch__FP6threadT0 + 0x0026
 1 95134c74 (+  64) 80041a18   <kernel>:scheduler_reschedule + 0x0248
 2 95134cb4 (+  48) 80037f78   <kernel>:rw_lock_wait__FP7rw_lockb + 0x00c4
 3 95134ce4 (+  64) 800384da   <kernel>:rw_lock_write_lock + 0x00b6
 4 95134d24 (+  64) 805982cd   <bfs>:AddInode__11TransactionP5Inode + 0x009d
 5 95134d64 (+  64) 8058ee6f
<bfs>:Update__5IndexR11TransactionPCclPCUcUsT4UsP5Inode + 0x01af
 6 95134da4 (+  96) 8058f245
<bfs>:UpdateSize__5IndexR11TransactionP5Inode + 0x00a1
 7 95134e04 (+  96) 8059f9d1
<bfs>:bfs_free_cookie__FP9fs_volumeP8fs_vnodePv + 0x020d
 8 95134e64 (+  48) 8008d3b8
<kernel>:file_free_fd__FP15file_descriptor + 0x0028
 9 95134e94 (+  48) 8007cff4   <kernel>:put_fd + 0x0038
10 95134ec4 (+  64) 8007d0de   <kernel>:close_fd_index + 0x0032
11 95134f04 (+  32) 8007dbd9   <kernel>:common_close__Fib + 0x0031
12 95134f24 (+  32) 8007e4b0   <kernel>:_user_close + 0x0018
13 95134f44 (+ 100) 800bae62   <kernel>:pre_syscall_debug_done +
0x0002 (nearest)
user iframe at 0x95134fa8 (end = 0x95135000)
 eax 0x85           ebx 0x2bfd40        ecx 0x7ffe6b10   edx 0xffff0104
 esi 0x20           edi 0x0             ebp 0x7ffe6b3c   esp 0x95134fdc
 eip 0xffff0104  eflags 0x207      user esp 0x7ffe6b10
 vector: 0x63, error code: 0x0
14 95134fa8 (+   0) ffff0104
15 7ffe6b3c (+ 256) 00204787   <_APP_>:main + 0x11b7 (nearest)
16 7ffe6c3c (+ 400) 00206047   <_APP_>:src_info_init + 0x12db (nearest)
17 7ffe6dcc (+  80) 002067c8   <_APP_>:copy + 0x0084
18 7ffe6e1c (+  96) 00203357   <_APP_>:_term_after + 0x01d7 (nearest)
19 7ffe6e7c (+  64) 00203465   <_APP_>:_term_after + 0x02e5 (nearest)
20 7ffe6ebc (+ 192) 00203af3   <_APP_>:main + 0x0523
21 7ffe6f7c (+  48) 0020309f   <_APP_>:_start + 0x005b
22 7ffe6fac (+  48) 001008ea   4552:runtime_loader_seg0ro@0x00100000 + 0x8ea
23 7ffe6fdc (+   0) 7ffe6fec   4551:mv_main_stack@0x7efe7000 + 0xffffec

kdebug> rwlock 0x90bd07fc
rw lock 0x90bd07fc:
  name:            bfs inode 0.4260
  holder:          199
  reader count:    0
  writer count:    2
  owner count:      1
  flags:           0x1
  waiting threads: 226/w


By browsing the code the easy way to fix this seems to be
inverting the two lines below in the Transaction destructor.
 _UnlockInodes, in fact, starts a new transaction, thus
trying to reacquire the journal lock after locking the inodes in
write mode.

    ~Transaction()
    {
        if (fJournal != NULL) {
            fJournal->Unlock(this, false);
            _UnlockInodes();
        }
    }

Regards,
-- 
Salvatore Benedetto (a.k.a. emitrax)
Student of Computer Engineer
University of Pisa
www.haiku-os.it

Other related posts: