[ktap] Re: "BUG: unable to handle kernel paging request" after ktap script to trace SyS_write

  • From: Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx>
  • To: ktap@xxxxxxxxxxxxx
  • Date: Mon, 21 Oct 2013 15:31:01 +0800

On Mon, Oct 21, 2013 at 1:15 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote:
> On Mon, Oct 21, 2013 at 7:55 AM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> 
> wrote:
>> On Mon, Oct 21, 2013 at 12:01 AM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote:
>>> On Sun, Oct 20, 2013 at 7:27 PM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> 
>>> wrote:
>>>> On Sun, Oct 20, 2013 at 10:22 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote:
>>>>> On Sun, Oct 20, 2013 at 12:50 PM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> 
>>>>> wrote:
>>>>>> On Sat, Oct 19, 2013 at 5:06 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> 
>>>>>> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I used next script, and after a few seconds of working, I have all ssh
>>>>>>> sessions killed.
>>>>>>> #! /usr/bin/env ktap
>>>>>>> trace probe:SyS_write {
>>>>>>>         print(backtrace(0, 100))
>>>>>>> }
>>>>>>>
>>>>>>> However, I can login after this, and have next messages in dmesg:
>>>>>>>
>>>>>> Hi Azat,
>>>>>
>>>>> Hi Jovi,
>>>>>
>>>>>>
>>>>>> Thanks for your reporting, could you help to check if below two
>>>>>> scripts works correctly?
>>>>>>
>>>>>> #! /usr/bin/env ktap
>>>>>> trace probe:SyS_write {
>>>>>>         print_backtrace(0, 100)
>>>>>> }
>>>>>
>>>>> [  107.416689] BUG: unable to handle kernel paging request at 
>>>>> ffffffff621ee2b5
>>>>> [  107.417774] IP: [<ffffffff621ee2b5>] 0xffffffff621ee2b4
>>>>> [  107.418569] PGD 160c067 PUD 0
>>>>> [  107.419409] Oops: 0010 [#3] PREEMPT SMP
>>>>> [  107.420038] Modules linked in: ktapvm(O) nfsd auth_rpcgss
>>>>> oid_registry nfs_acl nfs lockd fscache sunrpc loop fuse joydev
>>>>> hid_generic usbhid hid snd_pcm microcode snd_page_alloc snd_timer snd
>>>>> soundcore ohci_pci ohci_hcd ehci_hcd psmouse usbcore i2c_piix4
>>>>> processor ac ther
>>>>> mal_sys serio_raw pcspkr parport_pc usb_common evdev parport button
>>>>> i2c_core ext4 crc16 jbd2 mbcache sd_mod sg crc_t10dif sr_mod cdrom
>>>>> crct10dif_common ata_generic ahci libahci ata_piix libata e1000
>>>>> scsi_mod
>>>>> [  107.420038] CPU: 1 PID: 2654 Comm: less Tainted: G      D    O
>>>>> 3.12.0-rc5+ #40
>>>>> [  107.420038] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>>>>> VirtualBox 12/01/2006
>>>>> [  107.420038] task: ffff88007a911790 ti: ffff88007a05c000 task.ti:
>>>>> ffff88007a05c000
>>>>> [  107.420038] RIP: 0010:[<ffffffff621ee2b5>]  [<ffffffff621ee2b5>]
>>>>> 0xffffffff621ee2b4
>>>>> [  107.420038] RSP: 0018:ffff88007a05df78  EFLAGS: 00010293
>>>>> [  107.420038] RAX: 0000000000000001 RBX: 00000000000003ff RCX: 
>>>>> 0000000000000001
>>>>> [  107.420038] RDX: 00000000000003ff RSI: 0000000000628940 RDI: 
>>>>> 0000000000000001
>>>>> [  107.420038] RBP: 000000000000005f R08: 0000000001e49c20 R09: 
>>>>> 0000000000000000
>>>>> [  107.420038] R10: 0000000000000001 R11: 0000000000000246 R12: 
>>>>> 0000000000000000
>>>>> [  107.420038] R13: 0000000000000043 R14: 0000000000000000 R15: 
>>>>> 0000000000000000
>>>>> [  107.420038] FS:  00007ff4a1bda700(0000) GS:ffff88007fd00000(0000)
>>>>> knlGS:0000000000000000
>>>>> [  107.420038] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>> [  107.420038] CR2: ffffffff621ee2b5 CR3: 000000007b91a000 CR4: 
>>>>> 00000000000006e0
>>>>> [  107.420038] Stack:
>>>>> [  107.420038]  000000000000005f ffffffff813b2b22 0000000000000246
>>>>> 0000000000000001
>>>>> [  107.420038]  0000000000000000 0000000001e49c20 0000000000000001
>>>>> 0000000000000023
>>>>> [  107.420038]  00000000000003ff 0000000000628940 0000000000000001
>>>>> 0000000000000001
>>>>> [  107.420038] Call Trace:
>>>>> [  107.420038]  [<ffffffff813b2b22>] ? system_call_fastpath+0x16/0x1b
>>>>> [  107.420038] Code:  Bad RIP value.
>>>>> [  107.420038] RIP  [<ffffffff621ee2b5>] 0xffffffff621ee2b4
>>>>> [  107.420038]  RSP <ffff88007a05df78>
>>>>> [  107.420038] CR2: ffffffff621ee2b5
>>>>> [  107.420038] ---[ end trace fe1cc133ae0119a3 ]---
>>>>>
>>>>> And others.
>>>>>
>>>>>>
>>>>>>
>>>>>> #! /usr/bin/env ktap
>>>>>> trace probe:schedule {
>>>>>>         print(backtrace(0, 100))
>>>>>> }
>>>>>
>>>>> This one printed message about 10 tines
>>>>> and after this system hangs.
>>>>>
>>>>> In debugger attached to kernel:
>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>
>>>>>
>>>> Thanks. It seems the kernel crash doesn't caused by backtrace(), but
>>>> ktapvm core or kernel side.
>>>
>>> Yes, I already test another script (without backtrace), that also have
>>> some problems.
>>> Do you have this issue?
>>>
>>>>
>>>> Would you please check below scripts again?
>>>>
>>>> -------------------------------------------------
>>>> #!/usr/bin/env ktap
>>>>
>>>> trace probe:SyS_write {
>>>>     print("testing")
>>>> }
>>>
>>> page fault.
>>>
>>>> --------------------------------------------------
>>>> #!/usr/bin/env ktap
>>>>
>>>> trace probe:SyS_write {
>>>> }
>>>
>>> page fault.
>>>
>>>> --------------------------------------------------
>>>> #!/usr/bin/env ktap
>>>>
>>>> trace probe:schedule {
>>>> }
>>>
>>> Just after this command, I ran dmesg, and received page fault.
>>> And system hanged.
>>>
>>>> -------------------------------------------------
>>>> #!/usr/bin/env ktap
>>>>
>>>> trace syscalls:sys_write_enter {
>>>> }
>>>
>>> "s/sys_write_enter/sys_enter_write/"
>>> Seems works fine.
>>>
>>>> -------------------------------------------------
>>>> ./ktap test/arith.kp
>>>
>>> No output. No errors in kern.log
>>>
>>>> -------------------------------------------------
>>>>
>>>> And your hardware is x86-64 or ARM-64?
>>>
>>> x86-64
>>>
>> Unfortunately I cannot reproduce the crash in my local x86-64
>> box(whatever any kernel version, include 3.12-rc6).
>>
>> Things to be clear now, kernel will crash only using ktap kprobe,
>> tracepoint and other ktap basic operation is fine.
>> It have a little strange that actually ktapvm doesn't know tracepoint
>> or kprobe in kernel side, ktapvm just treat them as a id,
>> so it's weird that tracepoint is fine but kprobe is not working.
>>
>> I guess you ran vanilla kernel (3.12-rc6) with no additional patch,
>> could you post your kernel config file?
>
> Yes, it is vanilla kernel, but with some debug patches.
> (printk's, and one patch to btrfs, I don't think that this actually
> play some role)
> I will try to reproduce this issue using the latest tar balls version
> tomorrow (I hope).
>
> Here is my kernel config
> https://gist.github.com/azat/7078890
>
Still cannot reproduce kernel crash when using your kernel config.
What's confused me much is that tracepoint is working but kprobe not working.

Can you apply below patch then run script again?
In below patch, code will not jump into ktapvm core when kprobe event fired.

If there have no crash at this time, then it means ktapvm is the problem area;
if there have crash, it means kernel kprobe is the root cause.

Thank you.

diff --git a/interpreter/library/kdebug.c b/interpreter/library/kdebug.c
index c51e7f0..8185c64 100644
--- a/interpreter/library/kdebug.c
+++ b/interpreter/library/kdebug.c
@@ -154,6 +154,8 @@ static void ktap_overflow_callback(struct perf_event *event,
        ktap_state  *ks;
        int rctx;

+       return;
+
        ktap_pevent = event->overflow_handler_context;
        ks = ktap_pevent->ks;

Other related posts: