[ktap] Re: "BUG: unable to handle kernel paging request" after ktap script to trace SyS_write

  • From: Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx>
  • To: ktap@xxxxxxxxxxxxx
  • Date: Mon, 21 Oct 2013 16:50:27 +0800

On Mon, Oct 21, 2013 at 4:13 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote:
> On Mon, Oct 21, 2013 at 11:31 AM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> 
> wrote:
>> On Mon, Oct 21, 2013 at 1:15 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote:
>>> On Mon, Oct 21, 2013 at 7:55 AM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> 
>>> wrote:
>>>> On Mon, Oct 21, 2013 at 12:01 AM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote:
>>>>> On Sun, Oct 20, 2013 at 7:27 PM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> 
>>>>> wrote:
>>>>>> On Sun, Oct 20, 2013 at 10:22 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> 
>>>>>> wrote:
>>>>>>> On Sun, Oct 20, 2013 at 12:50 PM, Jovi Zhangwei 
>>>>>>> <jovi.zhangwei@xxxxxxxxx> wrote:
>>>>>>>> On Sat, Oct 19, 2013 at 5:06 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> 
>>>>>>>> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I used next script, and after a few seconds of working, I have all ssh
>>>>>>>>> sessions killed.
>>>>>>>>> #! /usr/bin/env ktap
>>>>>>>>> trace probe:SyS_write {
>>>>>>>>>         print(backtrace(0, 100))
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> However, I can login after this, and have next messages in dmesg:
>>>>>>>>>
>>>>>>>> Hi Azat,
>>>>>>>
>>>>>>> Hi Jovi,
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for your reporting, could you help to check if below two
>>>>>>>> scripts works correctly?
>>>>>>>>
>>>>>>>> #! /usr/bin/env ktap
>>>>>>>> trace probe:SyS_write {
>>>>>>>>         print_backtrace(0, 100)
>>>>>>>> }
>>>>>>>
>>>>>>> [  107.416689] BUG: unable to handle kernel paging request at 
>>>>>>> ffffffff621ee2b5
>>>>>>> [  107.417774] IP: [<ffffffff621ee2b5>] 0xffffffff621ee2b4
>>>>>>> [  107.418569] PGD 160c067 PUD 0
>>>>>>> [  107.419409] Oops: 0010 [#3] PREEMPT SMP
>>>>>>> [  107.420038] Modules linked in: ktapvm(O) nfsd auth_rpcgss
>>>>>>> oid_registry nfs_acl nfs lockd fscache sunrpc loop fuse joydev
>>>>>>> hid_generic usbhid hid snd_pcm microcode snd_page_alloc snd_timer snd
>>>>>>> soundcore ohci_pci ohci_hcd ehci_hcd psmouse usbcore i2c_piix4
>>>>>>> processor ac ther
>>>>>>> mal_sys serio_raw pcspkr parport_pc usb_common evdev parport button
>>>>>>> i2c_core ext4 crc16 jbd2 mbcache sd_mod sg crc_t10dif sr_mod cdrom
>>>>>>> crct10dif_common ata_generic ahci libahci ata_piix libata e1000
>>>>>>> scsi_mod
>>>>>>> [  107.420038] CPU: 1 PID: 2654 Comm: less Tainted: G      D    O
>>>>>>> 3.12.0-rc5+ #40
>>>>>>> [  107.420038] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
>>>>>>> VirtualBox 12/01/2006
>>>>>>> [  107.420038] task: ffff88007a911790 ti: ffff88007a05c000 task.ti:
>>>>>>> ffff88007a05c000
>>>>>>> [  107.420038] RIP: 0010:[<ffffffff621ee2b5>]  [<ffffffff621ee2b5>]
>>>>>>> 0xffffffff621ee2b4
>>>>>>> [  107.420038] RSP: 0018:ffff88007a05df78  EFLAGS: 00010293
>>>>>>> [  107.420038] RAX: 0000000000000001 RBX: 00000000000003ff RCX: 
>>>>>>> 0000000000000001
>>>>>>> [  107.420038] RDX: 00000000000003ff RSI: 0000000000628940 RDI: 
>>>>>>> 0000000000000001
>>>>>>> [  107.420038] RBP: 000000000000005f R08: 0000000001e49c20 R09: 
>>>>>>> 0000000000000000
>>>>>>> [  107.420038] R10: 0000000000000001 R11: 0000000000000246 R12: 
>>>>>>> 0000000000000000
>>>>>>> [  107.420038] R13: 0000000000000043 R14: 0000000000000000 R15: 
>>>>>>> 0000000000000000
>>>>>>> [  107.420038] FS:  00007ff4a1bda700(0000) GS:ffff88007fd00000(0000)
>>>>>>> knlGS:0000000000000000
>>>>>>> [  107.420038] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>>> [  107.420038] CR2: ffffffff621ee2b5 CR3: 000000007b91a000 CR4: 
>>>>>>> 00000000000006e0
>>>>>>> [  107.420038] Stack:
>>>>>>> [  107.420038]  000000000000005f ffffffff813b2b22 0000000000000246
>>>>>>> 0000000000000001
>>>>>>> [  107.420038]  0000000000000000 0000000001e49c20 0000000000000001
>>>>>>> 0000000000000023
>>>>>>> [  107.420038]  00000000000003ff 0000000000628940 0000000000000001
>>>>>>> 0000000000000001
>>>>>>> [  107.420038] Call Trace:
>>>>>>> [  107.420038]  [<ffffffff813b2b22>] ? system_call_fastpath+0x16/0x1b
>>>>>>> [  107.420038] Code:  Bad RIP value.
>>>>>>> [  107.420038] RIP  [<ffffffff621ee2b5>] 0xffffffff621ee2b4
>>>>>>> [  107.420038]  RSP <ffff88007a05df78>
>>>>>>> [  107.420038] CR2: ffffffff621ee2b5
>>>>>>> [  107.420038] ---[ end trace fe1cc133ae0119a3 ]---
>>>>>>>
>>>>>>> And others.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> #! /usr/bin/env ktap
>>>>>>>> trace probe:schedule {
>>>>>>>>         print(backtrace(0, 100))
>>>>>>>> }
>>>>>>>
>>>>>>> This one printed message about 10 tines
>>>>>>> and after this system hangs.
>>>>>>>
>>>>>>> In debugger attached to kernel:
>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>>
>>>>>>>
>>>>>> Thanks. It seems the kernel crash doesn't caused by backtrace(), but
>>>>>> ktapvm core or kernel side.
>>>>>
>>>>> Yes, I already test another script (without backtrace), that also have
>>>>> some problems.
>>>>> Do you have this issue?
>>>>>
>>>>>>
>>>>>> Would you please check below scripts again?
>>>>>>
>>>>>> -------------------------------------------------
>>>>>> #!/usr/bin/env ktap
>>>>>>
>>>>>> trace probe:SyS_write {
>>>>>>     print("testing")
>>>>>> }
>>>>>
>>>>> page fault.
>>>>>
>>>>>> --------------------------------------------------
>>>>>> #!/usr/bin/env ktap
>>>>>>
>>>>>> trace probe:SyS_write {
>>>>>> }
>>>>>
>>>>> page fault.
>>>>>
>>>>>> --------------------------------------------------
>>>>>> #!/usr/bin/env ktap
>>>>>>
>>>>>> trace probe:schedule {
>>>>>> }
>>>>>
>>>>> Just after this command, I ran dmesg, and received page fault.
>>>>> And system hanged.
>>>>>
>>>>>> -------------------------------------------------
>>>>>> #!/usr/bin/env ktap
>>>>>>
>>>>>> trace syscalls:sys_write_enter {
>>>>>> }
>>>>>
>>>>> "s/sys_write_enter/sys_enter_write/"
>>>>> Seems works fine.
>>>>>
>>>>>> -------------------------------------------------
>>>>>> ./ktap test/arith.kp
>>>>>
>>>>> No output. No errors in kern.log
>>>>>
>>>>>> -------------------------------------------------
>>>>>>
>>>>>> And your hardware is x86-64 or ARM-64?
>>>>>
>>>>> x86-64
>>>>>
>>>> Unfortunately I cannot reproduce the crash in my local x86-64
>>>> box(whatever any kernel version, include 3.12-rc6).
>>>>
>>>> Things to be clear now, kernel will crash only using ktap kprobe,
>>>> tracepoint and other ktap basic operation is fine.
>>>> It have a little strange that actually ktapvm doesn't know tracepoint
>>>> or kprobe in kernel side, ktapvm just treat them as a id,
>>>> so it's weird that tracepoint is fine but kprobe is not working.
>>>>
>>>> I guess you ran vanilla kernel (3.12-rc6) with no additional patch,
>>>> could you post your kernel config file?
>>>
>>> Yes, it is vanilla kernel, but with some debug patches.
>>> (printk's, and one patch to btrfs, I don't think that this actually
>>> play some role)
>>> I will try to reproduce this issue using the latest tar balls version
>>> tomorrow (I hope).
>>>
>>> Here is my kernel config
>>> https://gist.github.com/azat/7078890
>>>
>> Still cannot reproduce kernel crash when using your kernel config.
>
> Did you try to run something while that ktap script is running?
> (another login, dmesg without head/tail)
>
Yes, whatever I do to system, nothing happened.

>> What's confused me much is that tracepoint is working but kprobe not working.
>
> It works first N times however.
>
>>
>> Can you apply below patch then run script again?
>
> I've just ran it, and I also have page fault.
>
Then it's interesting now, it seems caused by kernel kprobe, not ktapvm,
because it never execute into kprobe perf callback in below code.

static void ktap_overflow_callback(struct perf_event *event,
                                   struct perf_sample_data *data,
                                   struct pt_regs *regs)
{
                return;
                ...
}

void kp_perf_event_register(ktap_state *ks, struct perf_event_attr *attr,
                            struct task_struct *task, char *filter,
                            ktap_closure *cl)
{
                ...
                event = perf_event_create_kernel_counter(attr, cpu, task,
                                                         ktap_overflow_callback,
                                                         ktap_pevent);
}

So let's ignore ktap for a while, just check if kprobe works fine or not.
Please try below command:

#echo 'p:kprobes/kp0 SyS_write' > /sys/kernel/debug/tracing/kprobe_events

#echo 1 > /sys/kernel/debug/tracing/events/kprobes/kp0/enable

#cat /sys/kernel/debug/tracing/trace

Other related posts: