[haiku-development] Bug in KDL stack trace output (?)

  • From: Julian Harnath <julian.harnath@xxxxxxxxxxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Fri, 01 Jun 2012 22:04:12 +0200

Hello!

While doing some kernel debugging, I was inspecting a memory dump of 
the call stack, but I couldn't match what I saw with the stack trace 
output of KDL. I'm pretty sure this is a bug in the output of KDL stack 
traces: the stack frame and return addresses columns are shifted by one 
line against the function name/parameter column. Since I'm not entirely 
sure that this is not somehow intentional, I write here instead of 
directly filing a bug report.

Let's look at a KDL stack trace. Here's an excerpt of one:
---
15 818a0544 (+  64) 818e3640   <usb_disk> usb_disk_read(0x8281ac00, 
int64: 196102144, 0x81e79000, 0x818a0688) + 0x0048
16 818a0584 (+  64) 800a5266   <kernel_x86> BPrivate::
AbstractModuleDevice<0x8281ac40>::Read(0x8281ac00, int64: 196102144, 
int64: -9112463712666021888, 0x818a0604) + 0x002a
17 818a05c4 (+  64) 800a97f9   <kernel_x86> device_read(0x818a070c, 
int64: 196102144, 0x81e79000, 0x818a0688) + 0x0035
18 818a0604 (+  48) 800e9ec5   <kernel_x86> CallbackIO<0x818a06c4>::
IO(int64: 196102144, 0x81e79000, 0x818a0688) + 0x0025
---

And here a memory dump showing a part of the same stack:
---
[0x818a0540]  ........@xxxxxxx   00000001 818a0584 818e3640 829bfc70
[0x818a0550]  .........6......   00000000 818a0584 818e3606 8281ac00
[0x818a0560]  ....@...F2......   8281ad00 8281ac40 00003246 00000000
[0x818a0570]  `............A..   829bfc60 80f71de8 800fe1b5 82804108
[0x818a0580]  ........fR......   8281b0f0 818a05c4 800a5266 8281ac00
[0x818a0590]  .H..............   0bb04800 00000000 81e79000 818a0688
[0x818a05a0]  ................   00000001 00000000 81e79000 81e79000
[0x818a05b0]  ......../p...A..   801a11a0 818a05e4 800f702f 82804108
[0x818a05c0]  .-..........@...   82002dac 818a0604 800a97f9 8281ac40
[0x818a05d0]  .....H..........   8281ac00 0bb04800 00000000 81e79000
[0x818a05e0]  .........r......   818a0688 818a0604 800f72cd 00001000
[0x818a05f0]  .....!..X.......   00000000 801a21e0 8006c058 00000000
[0x818a0600]  ....4...........   818a0608 818a0634 800e9ec5 818a070c
[0x818a0610]  .H..............   0bb04800 00000000 81e79000 818a0688
[0x818a0620]  ................   00000000 818a068c 8006ba99 8019b3f0
[0x818a0630]  ........Uy......   00000000 818a0694 800e7955 818a06c4
[0x818a0640]  .H..............   0bb04800 00000000 81e79000 818a0688
---

For reference, here is a nice graphic I found on the web showing the 
x86 call stack layout:
http://unixwiz.net/images/stackframe-cdecl.gif
As can be seen, a function's parameters should be found starting at 
EBP+8. Now take a look at the frame pointer and function parameter 
values given by the KDL trace:

 * It says stackframe #15 is at $818a0544. So the first parameter value 
($8281ac00) should be at $818a054c. But it's not there, at its place we 
find a different value: $829bfc70.

 * Now take a look at the address KDL assigns to stackframe #16, and 
add +8 to it: $818a0584 + $8 = $818a058c. And right there at that 
address we find all the parameters of function #15! (warning: looking 
at it is quickly confusing because some parameter values of functions 
15 and 16 are identical.. but other parameter values are different).

* With the address KDL says is for #17: $818a05c4 + $8 = $818a05cc -- 
there are the parameters of #16.

* This "shifted by one" behaviour goes on, we can find the parameters 
of #15 in frame #16, the ones from #16 in #17 etc. So the "wrong" 
function parameter value in #15 we found above is actually the first 
parameter of the function from frame #14. The frame pointer column is 
shifted by one line against the function name/parameter column. I made 
various tests, it's always the same.

The return address column is off by one line as well.. I also verified 
that it really is wrong the same way by writing a simple program calling 
a function, invoking KDL manually and inspecting the stack and 
disassembly.

Here's a tiny patch (just 3 lines changed) that fixes the problem:
http://www.orangejua.de/haiku/fix_kdl_stacktrace.patch

The cause is that print_stack_frame() prints "ebp" as framepointer but 
(correctly) calls print_demangled_call() using "nextEbp + 8". So the 
value it prints on screen is always from one stackframe earlier. For 
the return address it prints "eip" (which is the *current* eip in that 
stackframe, not the return address) instead of looking up the return 
address. It could probably be fixed in a nicer way by refactoring some 
of the functions (the way print_stack_frame() only really uses nextEbp 
could be made nicer, it only uses ebp to calculate the diff value...), 
but that is probably not necessary and the tiny patch works fine here...

So, I wonder, is this a bug like I suspect, or somehow intential 
behaviour? (if yes, why?)

Greets,
Julian

Other related posts: