On Tue, Oct 22, 2013 at 12:10 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: > On Tue, Oct 22, 2013 at 1:00 AM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: >> On Mon, Oct 21, 2013 at 1:31 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: >>> On Mon, Oct 21, 2013 at 12:50 PM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> >>> wrote: >>>> On Mon, Oct 21, 2013 at 4:13 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: >>>>> On Mon, Oct 21, 2013 at 11:31 AM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> >>>>> wrote: >>>>>> On Mon, Oct 21, 2013 at 1:15 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> >>>>>> wrote: >>>>>>> On Mon, Oct 21, 2013 at 7:55 AM, Jovi Zhangwei >>>>>>> <jovi.zhangwei@xxxxxxxxx> wrote: >>>>>>>> On Mon, Oct 21, 2013 at 12:01 AM, Azat Khuzhin <a3at.mail@xxxxxxxxx> >>>>>>>> wrote: >>>>>>>>> On Sun, Oct 20, 2013 at 7:27 PM, Jovi Zhangwei >>>>>>>>> <jovi.zhangwei@xxxxxxxxx> wrote: >>>>>>>>>> On Sun, Oct 20, 2013 at 10:22 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> >>>>>>>>>> wrote: >>>>>>>>>>> On Sun, Oct 20, 2013 at 12:50 PM, Jovi Zhangwei >>>>>>>>>>> <jovi.zhangwei@xxxxxxxxx> wrote: >>>>>>>>>>>> On Sat, Oct 19, 2013 at 5:06 PM, Azat Khuzhin >>>>>>>>>>>> <a3at.mail@xxxxxxxxx> wrote: >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> I used next script, and after a few seconds of working, I have >>>>>>>>>>>>> all ssh >>>>>>>>>>>>> sessions killed. >>>>>>>>>>>>> #! /usr/bin/env ktap >>>>>>>>>>>>> trace probe:SyS_write { >>>>>>>>>>>>> print(backtrace(0, 100)) >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> However, I can login after this, and have next messages in dmesg: >>>>>>>>>>>>> >>>>>>>>>>>> Hi Azat, >>>>>>>>>>> >>>>>>>>>>> Hi Jovi, >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks for your reporting, could you help to check if below two >>>>>>>>>>>> scripts works correctly? >>>>>>>>>>>> >>>>>>>>>>>> #! /usr/bin/env ktap >>>>>>>>>>>> trace probe:SyS_write { >>>>>>>>>>>> print_backtrace(0, 100) >>>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> [ 107.416689] BUG: unable to handle kernel paging request at >>>>>>>>>>> ffffffff621ee2b5 >>>>>>>>>>> [ 107.417774] IP: [<ffffffff621ee2b5>] 0xffffffff621ee2b4 >>>>>>>>>>> [ 107.418569] PGD 160c067 PUD 0 >>>>>>>>>>> [ 107.419409] Oops: 0010 [#3] PREEMPT SMP >>>>>>>>>>> [ 107.420038] Modules linked in: ktapvm(O) nfsd auth_rpcgss >>>>>>>>>>> oid_registry nfs_acl nfs lockd fscache sunrpc loop fuse joydev >>>>>>>>>>> hid_generic usbhid hid snd_pcm microcode snd_page_alloc snd_timer >>>>>>>>>>> snd >>>>>>>>>>> soundcore ohci_pci ohci_hcd ehci_hcd psmouse usbcore i2c_piix4 >>>>>>>>>>> processor ac ther >>>>>>>>>>> mal_sys serio_raw pcspkr parport_pc usb_common evdev parport button >>>>>>>>>>> i2c_core ext4 crc16 jbd2 mbcache sd_mod sg crc_t10dif sr_mod cdrom >>>>>>>>>>> crct10dif_common ata_generic ahci libahci ata_piix libata e1000 >>>>>>>>>>> scsi_mod >>>>>>>>>>> [ 107.420038] CPU: 1 PID: 2654 Comm: less Tainted: G D O >>>>>>>>>>> 3.12.0-rc5+ #40 >>>>>>>>>>> [ 107.420038] Hardware name: innotek GmbH VirtualBox/VirtualBox, >>>>>>>>>>> BIOS >>>>>>>>>>> VirtualBox 12/01/2006 >>>>>>>>>>> [ 107.420038] task: ffff88007a911790 ti: ffff88007a05c000 task.ti: >>>>>>>>>>> ffff88007a05c000 >>>>>>>>>>> [ 107.420038] RIP: 0010:[<ffffffff621ee2b5>] [<ffffffff621ee2b5>] >>>>>>>>>>> 0xffffffff621ee2b4 >>>>>>>>>>> [ 107.420038] RSP: 0018:ffff88007a05df78 EFLAGS: 00010293 >>>>>>>>>>> [ 107.420038] RAX: 0000000000000001 RBX: 00000000000003ff RCX: >>>>>>>>>>> 0000000000000001 >>>>>>>>>>> [ 107.420038] RDX: 00000000000003ff RSI: 0000000000628940 RDI: >>>>>>>>>>> 0000000000000001 >>>>>>>>>>> [ 107.420038] RBP: 000000000000005f R08: 0000000001e49c20 R09: >>>>>>>>>>> 0000000000000000 >>>>>>>>>>> [ 107.420038] R10: 0000000000000001 R11: 0000000000000246 R12: >>>>>>>>>>> 0000000000000000 >>>>>>>>>>> [ 107.420038] R13: 0000000000000043 R14: 0000000000000000 R15: >>>>>>>>>>> 0000000000000000 >>>>>>>>>>> [ 107.420038] FS: 00007ff4a1bda700(0000) GS:ffff88007fd00000(0000) >>>>>>>>>>> knlGS:0000000000000000 >>>>>>>>>>> [ 107.420038] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>>>>>> [ 107.420038] CR2: ffffffff621ee2b5 CR3: 000000007b91a000 CR4: >>>>>>>>>>> 00000000000006e0 >>>>>>>>>>> [ 107.420038] Stack: >>>>>>>>>>> [ 107.420038] 000000000000005f ffffffff813b2b22 0000000000000246 >>>>>>>>>>> 0000000000000001 >>>>>>>>>>> [ 107.420038] 0000000000000000 0000000001e49c20 0000000000000001 >>>>>>>>>>> 0000000000000023 >>>>>>>>>>> [ 107.420038] 00000000000003ff 0000000000628940 0000000000000001 >>>>>>>>>>> 0000000000000001 >>>>>>>>>>> [ 107.420038] Call Trace: >>>>>>>>>>> [ 107.420038] [<ffffffff813b2b22>] ? >>>>>>>>>>> system_call_fastpath+0x16/0x1b >>>>>>>>>>> [ 107.420038] Code: Bad RIP value. >>>>>>>>>>> [ 107.420038] RIP [<ffffffff621ee2b5>] 0xffffffff621ee2b4 >>>>>>>>>>> [ 107.420038] RSP <ffff88007a05df78> >>>>>>>>>>> [ 107.420038] CR2: ffffffff621ee2b5 >>>>>>>>>>> [ 107.420038] ---[ end trace fe1cc133ae0119a3 ]--- >>>>>>>>>>> >>>>>>>>>>> And others. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> #! /usr/bin/env ktap >>>>>>>>>>>> trace probe:schedule { >>>>>>>>>>>> print(backtrace(0, 100)) >>>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> This one printed message about 10 tines >>>>>>>>>>> and after this system hangs. >>>>>>>>>>> >>>>>>>>>>> In debugger attached to kernel: >>>>>>>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Thanks. It seems the kernel crash doesn't caused by backtrace(), but >>>>>>>>>> ktapvm core or kernel side. >>>>>>>>> >>>>>>>>> Yes, I already test another script (without backtrace), that also have >>>>>>>>> some problems. >>>>>>>>> Do you have this issue? >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Would you please check below scripts again? >>>>>>>>>> >>>>>>>>>> ------------------------------------------------- >>>>>>>>>> #!/usr/bin/env ktap >>>>>>>>>> >>>>>>>>>> trace probe:SyS_write { >>>>>>>>>> print("testing") >>>>>>>>>> } >>>>>>>>> >>>>>>>>> page fault. >>>>>>>>> >>>>>>>>>> -------------------------------------------------- >>>>>>>>>> #!/usr/bin/env ktap >>>>>>>>>> >>>>>>>>>> trace probe:SyS_write { >>>>>>>>>> } >>>>>>>>> >>>>>>>>> page fault. >>>>>>>>> >>>>>>>>>> -------------------------------------------------- >>>>>>>>>> #!/usr/bin/env ktap >>>>>>>>>> >>>>>>>>>> trace probe:schedule { >>>>>>>>>> } >>>>>>>>> >>>>>>>>> Just after this command, I ran dmesg, and received page fault. >>>>>>>>> And system hanged. >>>>>>>>> >>>>>>>>>> ------------------------------------------------- >>>>>>>>>> #!/usr/bin/env ktap >>>>>>>>>> >>>>>>>>>> trace syscalls:sys_write_enter { >>>>>>>>>> } >>>>>>>>> >>>>>>>>> "s/sys_write_enter/sys_enter_write/" >>>>>>>>> Seems works fine. >>>>>>>>> >>>>>>>>>> ------------------------------------------------- >>>>>>>>>> ./ktap test/arith.kp >>>>>>>>> >>>>>>>>> No output. No errors in kern.log >>>>>>>>> >>>>>>>>>> ------------------------------------------------- >>>>>>>>>> >>>>>>>>>> And your hardware is x86-64 or ARM-64? >>>>>>>>> >>>>>>>>> x86-64 >>>>>>>>> >>>>>>>> Unfortunately I cannot reproduce the crash in my local x86-64 >>>>>>>> box(whatever any kernel version, include 3.12-rc6). >>>>>>>> >>>>>>>> Things to be clear now, kernel will crash only using ktap kprobe, >>>>>>>> tracepoint and other ktap basic operation is fine. >>>>>>>> It have a little strange that actually ktapvm doesn't know tracepoint >>>>>>>> or kprobe in kernel side, ktapvm just treat them as a id, >>>>>>>> so it's weird that tracepoint is fine but kprobe is not working. >>>>>>>> >>>>>>>> I guess you ran vanilla kernel (3.12-rc6) with no additional patch, >>>>>>>> could you post your kernel config file? >>>>>>> >>>>>>> Yes, it is vanilla kernel, but with some debug patches. >>>>>>> (printk's, and one patch to btrfs, I don't think that this actually >>>>>>> play some role) >>>>>>> I will try to reproduce this issue using the latest tar balls version >>>>>>> tomorrow (I hope). >>>>>>> >>>>>>> Here is my kernel config >>>>>>> https://gist.github.com/azat/7078890 >>>>>>> >>>>>> Still cannot reproduce kernel crash when using your kernel config. >>>>> >>>>> Did you try to run something while that ktap script is running? >>>>> (another login, dmesg without head/tail) >>>>> >>>> Yes, whatever I do to system, nothing happened. >>>> >>>>>> What's confused me much is that tracepoint is working but kprobe not >>>>>> working. >>>>> >>>>> It works first N times however. >>>>> >>>>>> >>>>>> Can you apply below patch then run script again? >>>>> >>>>> I've just ran it, and I also have page fault. >>>>> >>>> Then it's interesting now, it seems caused by kernel kprobe, not ktapvm, >>>> because it never execute into kprobe perf callback in below code. >>>> >>>> static void ktap_overflow_callback(struct perf_event *event, >>>> struct perf_sample_data *data, >>>> struct pt_regs *regs) >>>> { >>>> return; >>>> ... >>>> } >>>> >>>> void kp_perf_event_register(ktap_state *ks, struct perf_event_attr *attr, >>>> struct task_struct *task, char *filter, >>>> ktap_closure *cl) >>>> { >>>> ... >>>> event = perf_event_create_kernel_counter(attr, cpu, task, >>>> >>>> ktap_overflow_callback, >>>> ktap_pevent); >>>> } >>>> >>>> So let's ignore ktap for a while, just check if kprobe works fine or not. >>>> Please try below command: >>>> >>>> #echo 'p:kprobes/kp0 SyS_write' > /sys/kernel/debug/tracing/kprobe_events >>>> >>>> #echo 1 > /sys/kernel/debug/tracing/events/kprobes/kp0/enable >>>> >>>> #cat /sys/kernel/debug/tracing/trace >>> >>> root:~# mount -t debugfs debugfs /sys/kernel/debug/ >>> root:~# echo 'p:kprobes/kp0 SyS_write' >| >>> /sys/kernel/debug/tracing/kprobe_events >>> root:~# cat /sys/kernel/debug/tracing/kprobe_events >>> p:kprobes/kp0 SyS_write >>> root:~# cat /sys/kernel/debug/tracing/kprobe_profile >>> kp0 0 >>> 0 >>> root:~# echo 1 >| /sys/kernel/debug/tracing/events/kprobes/kp0/enable >>> root:~# cat /sys/kernel/debug/tracing/kprobe_profile >>> kp0 82 >>> 0 >>> root:~# cat /sys/kernel/debug/tracing/kprobe_profile >>> kp0 97 >>> 0 >>> root:~# cat /sys/kernel/debug/tracing/trace >>> root:~# cat /sys/kernel/debug/tracing/trace >>> # tracer: nop >>> # >>> # entries-in-buffer/entries-written: 175/175 #P:1 >>> # >>> # _-----=> irqs-off >>> # / _----=> need-resched >>> # | / _---=> hardirq/softirq >>> # || / _--=> preempt-depth >>> # ||| / delay >>> # TASK-PID CPU# |||| TIMESTAMP FUNCTION >>> # | | | |||| | | >>> bash-2209 [000] d..1 121.488682: kp0: (SyS_write+0x0/0x74) >>> which-2475 [000] d..1 121.491036: kp0: (SyS_write+0x0/0x74) >>> git-2478 [000] d..1 121.494698: kp0: (SyS_write+0x0/0x74) >>> git-2481 [000] d..1 121.497838: kp0: (SyS_write+0x0/0x74) >>> wc-2486 [000] d..1 121.501778: kp0: (SyS_write+0x0/0x74) >>> bash-2209 [000] d..1 121.502462: kp0: (SyS_write+0x0/0x74) >>> sshd-2067 [000] d..1 121.502516: kp0: (SyS_write+0x0/0x74) >>> sshd-2067 [000] d..1 126.008965: kp0: (SyS_write+0x0/0x74) >>> bash-2209 [000] d..1 126.009187: kp0: (SyS_write+0x0/0x74) >>> sshd-2067 [000] d..1 126.009238: kp0: (SyS_write+0x0/0x74) >>> sshd-2067 [000] d..1 126.240849: kp0: (SyS_write+0x0/0x74) >>> ................. >>> >>> In another console: >>> dmesg > /dev/null # fine >>> dmesg # all sessions logged out >>> >>> azat$ ssh -p2204 root@localhost # vm >>> root@localhost's password: >>> ..... >>> Connection to localhost closed. >>> >>> Hm.. that is very interesting. I will try to debug this in evening/tomorrow. >>> Thanks! >> >> I've just tested on _another_ vm (fedora 19) with >> 3.11.4-201.fc19.x86_64.debug and ktap works correct. >> >> Right now I'm building kernel from - >> https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.11.6.tar.xz >> I will let you know. > > This one also trigger page fault. > Is your environment could test mainline kernel (3.12-rc6) without any addition patch? if it's doesn't work on your hardware, then we can report this to LKML, if it works, perhaps then we can use git-bisect to find root cause. (kprobe of mainline kernel 3.12-rc6 works fine in my box) Jovi