[haiku-development] Re: debugging memory allocations

  • From: Lucian Adrian Grijincu <lucian.grijincu@xxxxxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Mon, 5 Jul 2010 19:19:17 +0300

On Mon, Jul 5, 2010 at 6:46 PM, Ingo Weinhold <ingo_weinhold@xxxxxx> wrote:
>> == Problem ==
>> LKL boots the Linux kernel and is able to correctly free it's
>> resources when it is unloaded.
>>
>> I managed to mount/unmount an EXT4 partition and list the files in
>> it's root directory (just a silly test to see that things really
>> work).
>>
>> However I've reached a problem: after loading/unloading a few times
>> the Linux kernel and mounting/unmounting the same partition a few
>> times during the Hiaku boot process, I sometimes get a page fault that
>> says that the instruction at address 0x80234e09 (an example) cannot
>> access the memory at address 0x80234e09 (the kernel wants to run some
>> code, but cannot execute that code).
>>
>> I guess this can happen in cases where the code misses execution
>> permissions (as I've said in a previous email I had to hack into
>> Haiku's add-on loader to accommodate a combined .text+.data section as
>> with LKL) or when the code was unmapped from memory.
>
> More likely the latter, since we don't support non-executable mapping on
> any hardware yet.


It was generated by my lkl based file system being unloaded, while
having a pending timer registered.
At some point the timer interrupt fired and tried to run my timer_hook
function and then it obviously failed.

At least I hope this was it, after fixing the responsible part of code
I haven't seen this problem.


>> The instruction is only sometimes from lkl. It happened a few times
>> while in what should be normal performing Haiku code.
>>
>> An illustrative example: it once happened while executing the x86
>> 'halt' asm instruction (on the idle cpu thread).


I was misinterpreting the debug message.

What happened was this:
* I added a timer (add_timer) with a callback from my add-on
* before the timer fired I unloaded the add-on
* sometime while executing some random code the interrupt timer arrives
* it tried to run the routine I asked but failed and hit a page fault
with interrupts disabled

KDL said correctly that it couldn't execute code (code at address
0x8012121212 cannot access address 0x8012121212), but it didn't say it
was the the address of my unloaded timer hook, but the address of the
random code that was running when the timer interrupt hit.


> That sounds somewhat weird. Particularly that you know what instruction it
> was while the kernel failed to execute it.

KDL told me the address of the instruction (e.g. some_function() +
0x3f) and I objdump-ed the .o that defines some_function and saw which
instruction was there.


Anyway thanks for the info.

-- 
 .
..: Lucian

Other related posts: