[ktap] Re: "BUG: unable to handle kernel paging request" after ktap script to trace SyS_write

  • From: Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx>
  • To: ktap@xxxxxxxxxxxxx
  • Date: Tue, 22 Oct 2013 12:40:51 +0800

On Tue, Oct 22, 2013 at 12:10 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote:
> On Tue, Oct 22, 2013 at 1:00 AM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote:
>> On Mon, Oct 21, 2013 at 1:31 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote:
>>> On Mon, Oct 21, 2013 at 12:50 PM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> 
>>> wrote:
>>>> On Mon, Oct 21, 2013 at 4:13 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote:
>>>>> On Mon, Oct 21, 2013 at 11:31 AM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> 
>>>>> wrote:
>>>>>> On Mon, Oct 21, 2013 at 1:15 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> 
>>>>>> wrote:
>>>>>>> On Mon, Oct 21, 2013 at 7:55 AM, Jovi Zhangwei 
>>>>>>> <jovi.zhangwei@xxxxxxxxx> wrote:
>>>>>>>> On Mon, Oct 21, 2013 at 12:01 AM, Azat Khuzhin <a3at.mail@xxxxxxxxx> 
>>>>>>>> wrote:
>>>>>>>>> On Sun, Oct 20, 2013 at 7:27 PM, Jovi Zhangwei 
>>>>>>>>> <jovi.zhangwei@xxxxxxxxx> wrote:
>>>>>>>>>> On Sun, Oct 20, 2013 at 10:22 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> 
>>>>>>>>>> wrote:
>>>>>>>>>>> On Sun, Oct 20, 2013 at 12:50 PM, Jovi Zhangwei 
>>>>>>>>>>> <jovi.zhangwei@xxxxxxxxx> wrote:
>>>>>>>>>>>> On Sat, Oct 19, 2013 at 5:06 PM, Azat Khuzhin 
>>>>>>>>>>>> <a3at.mail@xxxxxxxxx> wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I used next script, and after a few seconds of working, I have 
>>>>>>>>>>>>> all ssh
>>>>>>>>>>>>> sessions killed.
>>>>>>>>>>>>> #! /usr/bin/env ktap
>>>>>>>>>>>>> trace probe:SyS_write {
>>>>>>>>>>>>>         print(backtrace(0, 100))
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, I can login after this, and have next messages in dmesg:
>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Azat,
>>>>>>>>>>>
>>>>>>>>>>> Hi Jovi,
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your reporting, could you help to check if below two
>>>>>>>>>>>> scripts works correctly?
>>>>>>>>>>>>
>>>>>>>>>>>> #! /usr/bin/env ktap
>>>>>>>>>>>> trace probe:SyS_write {
>>>>>>>>>>>>         print_backtrace(0, 100)
>>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> [  107.416689] BUG: unable to handle kernel paging request at 
>>>>>>>>>>> ffffffff621ee2b5
>>>>>>>>>>> [  107.417774] IP: [<ffffffff621ee2b5>] 0xffffffff621ee2b4
>>>>>>>>>>> [  107.418569] PGD 160c067 PUD 0
>>>>>>>>>>> [  107.419409] Oops: 0010 [#3] PREEMPT SMP
>>>>>>>>>>> [  107.420038] Modules linked in: ktapvm(O) nfsd auth_rpcgss
>>>>>>>>>>> oid_registry nfs_acl nfs lockd fscache sunrpc loop fuse joydev
>>>>>>>>>>> hid_generic usbhid hid snd_pcm microcode snd_page_alloc snd_timer 
>>>>>>>>>>> snd
>>>>>>>>>>> soundcore ohci_pci ohci_hcd ehci_hcd psmouse usbcore i2c_piix4
>>>>>>>>>>> processor ac ther
>>>>>>>>>>> mal_sys serio_raw pcspkr parport_pc usb_common evdev parport button
>>>>>>>>>>> i2c_core ext4 crc16 jbd2 mbcache sd_mod sg crc_t10dif sr_mod cdrom
>>>>>>>>>>> crct10dif_common ata_generic ahci libahci ata_piix libata e1000
>>>>>>>>>>> scsi_mod
>>>>>>>>>>> [  107.420038] CPU: 1 PID: 2654 Comm: less Tainted: G      D    O
>>>>>>>>>>> 3.12.0-rc5+ #40
>>>>>>>>>>> [  107.420038] Hardware name: innotek GmbH VirtualBox/VirtualBox, 
>>>>>>>>>>> BIOS
>>>>>>>>>>> VirtualBox 12/01/2006
>>>>>>>>>>> [  107.420038] task: ffff88007a911790 ti: ffff88007a05c000 task.ti:
>>>>>>>>>>> ffff88007a05c000
>>>>>>>>>>> [  107.420038] RIP: 0010:[<ffffffff621ee2b5>]  [<ffffffff621ee2b5>]
>>>>>>>>>>> 0xffffffff621ee2b4
>>>>>>>>>>> [  107.420038] RSP: 0018:ffff88007a05df78  EFLAGS: 00010293
>>>>>>>>>>> [  107.420038] RAX: 0000000000000001 RBX: 00000000000003ff RCX: 
>>>>>>>>>>> 0000000000000001
>>>>>>>>>>> [  107.420038] RDX: 00000000000003ff RSI: 0000000000628940 RDI: 
>>>>>>>>>>> 0000000000000001
>>>>>>>>>>> [  107.420038] RBP: 000000000000005f R08: 0000000001e49c20 R09: 
>>>>>>>>>>> 0000000000000000
>>>>>>>>>>> [  107.420038] R10: 0000000000000001 R11: 0000000000000246 R12: 
>>>>>>>>>>> 0000000000000000
>>>>>>>>>>> [  107.420038] R13: 0000000000000043 R14: 0000000000000000 R15: 
>>>>>>>>>>> 0000000000000000
>>>>>>>>>>> [  107.420038] FS:  00007ff4a1bda700(0000) GS:ffff88007fd00000(0000)
>>>>>>>>>>> knlGS:0000000000000000
>>>>>>>>>>> [  107.420038] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>>>>>>> [  107.420038] CR2: ffffffff621ee2b5 CR3: 000000007b91a000 CR4: 
>>>>>>>>>>> 00000000000006e0
>>>>>>>>>>> [  107.420038] Stack:
>>>>>>>>>>> [  107.420038]  000000000000005f ffffffff813b2b22 0000000000000246
>>>>>>>>>>> 0000000000000001
>>>>>>>>>>> [  107.420038]  0000000000000000 0000000001e49c20 0000000000000001
>>>>>>>>>>> 0000000000000023
>>>>>>>>>>> [  107.420038]  00000000000003ff 0000000000628940 0000000000000001
>>>>>>>>>>> 0000000000000001
>>>>>>>>>>> [  107.420038] Call Trace:
>>>>>>>>>>> [  107.420038]  [<ffffffff813b2b22>] ? 
>>>>>>>>>>> system_call_fastpath+0x16/0x1b
>>>>>>>>>>> [  107.420038] Code:  Bad RIP value.
>>>>>>>>>>> [  107.420038] RIP  [<ffffffff621ee2b5>] 0xffffffff621ee2b4
>>>>>>>>>>> [  107.420038]  RSP <ffff88007a05df78>
>>>>>>>>>>> [  107.420038] CR2: ffffffff621ee2b5
>>>>>>>>>>> [  107.420038] ---[ end trace fe1cc133ae0119a3 ]---
>>>>>>>>>>>
>>>>>>>>>>> And others.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> #! /usr/bin/env ktap
>>>>>>>>>>>> trace probe:schedule {
>>>>>>>>>>>>         print(backtrace(0, 100))
>>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> This one printed message about 10 tines
>>>>>>>>>>> and after this system hangs.
>>>>>>>>>>>
>>>>>>>>>>> In debugger attached to kernel:
>>>>>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Thanks. It seems the kernel crash doesn't caused by backtrace(), but
>>>>>>>>>> ktapvm core or kernel side.
>>>>>>>>>
>>>>>>>>> Yes, I already test another script (without backtrace), that also have
>>>>>>>>> some problems.
>>>>>>>>> Do you have this issue?
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Would you please check below scripts again?
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------
>>>>>>>>>> #!/usr/bin/env ktap
>>>>>>>>>>
>>>>>>>>>> trace probe:SyS_write {
>>>>>>>>>>     print("testing")
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> page fault.
>>>>>>>>>
>>>>>>>>>> --------------------------------------------------
>>>>>>>>>> #!/usr/bin/env ktap
>>>>>>>>>>
>>>>>>>>>> trace probe:SyS_write {
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> page fault.
>>>>>>>>>
>>>>>>>>>> --------------------------------------------------
>>>>>>>>>> #!/usr/bin/env ktap
>>>>>>>>>>
>>>>>>>>>> trace probe:schedule {
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Just after this command, I ran dmesg, and received page fault.
>>>>>>>>> And system hanged.
>>>>>>>>>
>>>>>>>>>> -------------------------------------------------
>>>>>>>>>> #!/usr/bin/env ktap
>>>>>>>>>>
>>>>>>>>>> trace syscalls:sys_write_enter {
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> "s/sys_write_enter/sys_enter_write/"
>>>>>>>>> Seems works fine.
>>>>>>>>>
>>>>>>>>>> -------------------------------------------------
>>>>>>>>>> ./ktap test/arith.kp
>>>>>>>>>
>>>>>>>>> No output. No errors in kern.log
>>>>>>>>>
>>>>>>>>>> -------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> And your hardware is x86-64 or ARM-64?
>>>>>>>>>
>>>>>>>>> x86-64
>>>>>>>>>
>>>>>>>> Unfortunately I cannot reproduce the crash in my local x86-64
>>>>>>>> box(whatever any kernel version, include 3.12-rc6).
>>>>>>>>
>>>>>>>> Things to be clear now, kernel will crash only using ktap kprobe,
>>>>>>>> tracepoint and other ktap basic operation is fine.
>>>>>>>> It have a little strange that actually ktapvm doesn't know tracepoint
>>>>>>>> or kprobe in kernel side, ktapvm just treat them as a id,
>>>>>>>> so it's weird that tracepoint is fine but kprobe is not working.
>>>>>>>>
>>>>>>>> I guess you ran vanilla kernel (3.12-rc6) with no additional patch,
>>>>>>>> could you post your kernel config file?
>>>>>>>
>>>>>>> Yes, it is vanilla kernel, but with some debug patches.
>>>>>>> (printk's, and one patch to btrfs, I don't think that this actually
>>>>>>> play some role)
>>>>>>> I will try to reproduce this issue using the latest tar balls version
>>>>>>> tomorrow (I hope).
>>>>>>>
>>>>>>> Here is my kernel config
>>>>>>> https://gist.github.com/azat/7078890
>>>>>>>
>>>>>> Still cannot reproduce kernel crash when using your kernel config.
>>>>>
>>>>> Did you try to run something while that ktap script is running?
>>>>> (another login, dmesg without head/tail)
>>>>>
>>>> Yes, whatever I do to system, nothing happened.
>>>>
>>>>>> What's confused me much is that tracepoint is working but kprobe not 
>>>>>> working.
>>>>>
>>>>> It works first N times however.
>>>>>
>>>>>>
>>>>>> Can you apply below patch then run script again?
>>>>>
>>>>> I've just ran it, and I also have page fault.
>>>>>
>>>> Then it's interesting now, it seems caused by kernel kprobe, not ktapvm,
>>>> because it never execute into kprobe perf callback in below code.
>>>>
>>>> static void ktap_overflow_callback(struct perf_event *event,
>>>>                                    struct perf_sample_data *data,
>>>>                                    struct pt_regs *regs)
>>>> {
>>>>                 return;
>>>>                 ...
>>>> }
>>>>
>>>> void kp_perf_event_register(ktap_state *ks, struct perf_event_attr *attr,
>>>>                             struct task_struct *task, char *filter,
>>>>                             ktap_closure *cl)
>>>> {
>>>>                 ...
>>>>                 event = perf_event_create_kernel_counter(attr, cpu, task,
>>>>                                                          
>>>> ktap_overflow_callback,
>>>>                                                          ktap_pevent);
>>>> }
>>>>
>>>> So let's ignore ktap for a while, just check if kprobe works fine or not.
>>>> Please try below command:
>>>>
>>>> #echo 'p:kprobes/kp0 SyS_write' > /sys/kernel/debug/tracing/kprobe_events
>>>>
>>>> #echo 1 > /sys/kernel/debug/tracing/events/kprobes/kp0/enable
>>>>
>>>> #cat /sys/kernel/debug/tracing/trace
>>>
>>> root:~# mount -t debugfs debugfs /sys/kernel/debug/
>>> root:~# echo 'p:kprobes/kp0 SyS_write' >|
>>> /sys/kernel/debug/tracing/kprobe_events
>>> root:~# cat /sys/kernel/debug/tracing/kprobe_events
>>> p:kprobes/kp0 SyS_write
>>> root:~# cat /sys/kernel/debug/tracing/kprobe_profile
>>>   kp0                                                        0              
>>>  0
>>> root:~# echo 1 >| /sys/kernel/debug/tracing/events/kprobes/kp0/enable
>>> root:~# cat /sys/kernel/debug/tracing/kprobe_profile
>>>   kp0                                                       82              
>>>  0
>>> root:~# cat /sys/kernel/debug/tracing/kprobe_profile
>>>   kp0                                                       97              
>>>  0
>>> root:~# cat /sys/kernel/debug/tracing/trace
>>> root:~# cat /sys/kernel/debug/tracing/trace
>>> # tracer: nop
>>> #
>>> # entries-in-buffer/entries-written: 175/175   #P:1
>>> #
>>> #                              _-----=> irqs-off
>>> #                             / _----=> need-resched
>>> #                            | / _---=> hardirq/softirq
>>> #                            || / _--=> preempt-depth
>>> #                            ||| /     delay
>>> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>> #              | |       |   ||||       |         |
>>>             bash-2209  [000] d..1   121.488682: kp0: (SyS_write+0x0/0x74)
>>>            which-2475  [000] d..1   121.491036: kp0: (SyS_write+0x0/0x74)
>>>              git-2478  [000] d..1   121.494698: kp0: (SyS_write+0x0/0x74)
>>>              git-2481  [000] d..1   121.497838: kp0: (SyS_write+0x0/0x74)
>>>               wc-2486  [000] d..1   121.501778: kp0: (SyS_write+0x0/0x74)
>>>             bash-2209  [000] d..1   121.502462: kp0: (SyS_write+0x0/0x74)
>>>             sshd-2067  [000] d..1   121.502516: kp0: (SyS_write+0x0/0x74)
>>>             sshd-2067  [000] d..1   126.008965: kp0: (SyS_write+0x0/0x74)
>>>             bash-2209  [000] d..1   126.009187: kp0: (SyS_write+0x0/0x74)
>>>             sshd-2067  [000] d..1   126.009238: kp0: (SyS_write+0x0/0x74)
>>>             sshd-2067  [000] d..1   126.240849: kp0: (SyS_write+0x0/0x74)
>>> .................
>>>
>>> In another console:
>>> dmesg > /dev/null # fine
>>> dmesg # all sessions logged out
>>>
>>> azat$ ssh -p2204 root@localhost # vm
>>> root@localhost's password:
>>> .....
>>> Connection to localhost closed.
>>>
>>> Hm.. that is very interesting. I will try to debug this in evening/tomorrow.
>>> Thanks!
>>
>> I've just tested on _another_ vm (fedora 19)  with
>> 3.11.4-201.fc19.x86_64.debug and ktap works correct.
>>
>> Right now I'm building kernel from -
>> https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.11.6.tar.xz
>> I will let you know.
>
> This one also trigger page fault.
>
Is your environment could test mainline kernel (3.12-rc6) without any
addition patch?
if it's doesn't work on your hardware, then we can report this to LKML,
if it works, perhaps then we can use git-bisect to find root cause.

(kprobe of mainline kernel 3.12-rc6 works fine in my box)

Jovi

Other related posts: