On Mon, Oct 21, 2013 at 4:13 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: > On Mon, Oct 21, 2013 at 11:31 AM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> > wrote: >> On Mon, Oct 21, 2013 at 1:15 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: >>> On Mon, Oct 21, 2013 at 7:55 AM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> >>> wrote: >>>> On Mon, Oct 21, 2013 at 12:01 AM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: >>>>> On Sun, Oct 20, 2013 at 7:27 PM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> >>>>> wrote: >>>>>> On Sun, Oct 20, 2013 at 10:22 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> >>>>>> wrote: >>>>>>> On Sun, Oct 20, 2013 at 12:50 PM, Jovi Zhangwei >>>>>>> <jovi.zhangwei@xxxxxxxxx> wrote: >>>>>>>> On Sat, Oct 19, 2013 at 5:06 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> >>>>>>>> wrote: >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I used next script, and after a few seconds of working, I have all ssh >>>>>>>>> sessions killed. >>>>>>>>> #! /usr/bin/env ktap >>>>>>>>> trace probe:SyS_write { >>>>>>>>> print(backtrace(0, 100)) >>>>>>>>> } >>>>>>>>> >>>>>>>>> However, I can login after this, and have next messages in dmesg: >>>>>>>>> >>>>>>>> Hi Azat, >>>>>>> >>>>>>> Hi Jovi, >>>>>>> >>>>>>>> >>>>>>>> Thanks for your reporting, could you help to check if below two >>>>>>>> scripts works correctly? >>>>>>>> >>>>>>>> #! /usr/bin/env ktap >>>>>>>> trace probe:SyS_write { >>>>>>>> print_backtrace(0, 100) >>>>>>>> } >>>>>>> >>>>>>> [ 107.416689] BUG: unable to handle kernel paging request at >>>>>>> ffffffff621ee2b5 >>>>>>> [ 107.417774] IP: [<ffffffff621ee2b5>] 0xffffffff621ee2b4 >>>>>>> [ 107.418569] PGD 160c067 PUD 0 >>>>>>> [ 107.419409] Oops: 0010 [#3] PREEMPT SMP >>>>>>> [ 107.420038] Modules linked in: ktapvm(O) nfsd auth_rpcgss >>>>>>> oid_registry nfs_acl nfs lockd fscache sunrpc loop fuse joydev >>>>>>> hid_generic usbhid hid snd_pcm microcode snd_page_alloc snd_timer snd >>>>>>> soundcore ohci_pci ohci_hcd ehci_hcd psmouse usbcore i2c_piix4 >>>>>>> processor ac ther >>>>>>> mal_sys serio_raw pcspkr parport_pc usb_common evdev parport button >>>>>>> i2c_core ext4 crc16 jbd2 mbcache sd_mod sg crc_t10dif sr_mod cdrom >>>>>>> crct10dif_common ata_generic ahci libahci ata_piix libata e1000 >>>>>>> scsi_mod >>>>>>> [ 107.420038] CPU: 1 PID: 2654 Comm: less Tainted: G D O >>>>>>> 3.12.0-rc5+ #40 >>>>>>> [ 107.420038] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS >>>>>>> VirtualBox 12/01/2006 >>>>>>> [ 107.420038] task: ffff88007a911790 ti: ffff88007a05c000 task.ti: >>>>>>> ffff88007a05c000 >>>>>>> [ 107.420038] RIP: 0010:[<ffffffff621ee2b5>] [<ffffffff621ee2b5>] >>>>>>> 0xffffffff621ee2b4 >>>>>>> [ 107.420038] RSP: 0018:ffff88007a05df78 EFLAGS: 00010293 >>>>>>> [ 107.420038] RAX: 0000000000000001 RBX: 00000000000003ff RCX: >>>>>>> 0000000000000001 >>>>>>> [ 107.420038] RDX: 00000000000003ff RSI: 0000000000628940 RDI: >>>>>>> 0000000000000001 >>>>>>> [ 107.420038] RBP: 000000000000005f R08: 0000000001e49c20 R09: >>>>>>> 0000000000000000 >>>>>>> [ 107.420038] R10: 0000000000000001 R11: 0000000000000246 R12: >>>>>>> 0000000000000000 >>>>>>> [ 107.420038] R13: 0000000000000043 R14: 0000000000000000 R15: >>>>>>> 0000000000000000 >>>>>>> [ 107.420038] FS: 00007ff4a1bda700(0000) GS:ffff88007fd00000(0000) >>>>>>> knlGS:0000000000000000 >>>>>>> [ 107.420038] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>> [ 107.420038] CR2: ffffffff621ee2b5 CR3: 000000007b91a000 CR4: >>>>>>> 00000000000006e0 >>>>>>> [ 107.420038] Stack: >>>>>>> [ 107.420038] 000000000000005f ffffffff813b2b22 0000000000000246 >>>>>>> 0000000000000001 >>>>>>> [ 107.420038] 0000000000000000 0000000001e49c20 0000000000000001 >>>>>>> 0000000000000023 >>>>>>> [ 107.420038] 00000000000003ff 0000000000628940 0000000000000001 >>>>>>> 0000000000000001 >>>>>>> [ 107.420038] Call Trace: >>>>>>> [ 107.420038] [<ffffffff813b2b22>] ? system_call_fastpath+0x16/0x1b >>>>>>> [ 107.420038] Code: Bad RIP value. >>>>>>> [ 107.420038] RIP [<ffffffff621ee2b5>] 0xffffffff621ee2b4 >>>>>>> [ 107.420038] RSP <ffff88007a05df78> >>>>>>> [ 107.420038] CR2: ffffffff621ee2b5 >>>>>>> [ 107.420038] ---[ end trace fe1cc133ae0119a3 ]--- >>>>>>> >>>>>>> And others. >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> #! /usr/bin/env ktap >>>>>>>> trace probe:schedule { >>>>>>>> print(backtrace(0, 100)) >>>>>>>> } >>>>>>> >>>>>>> This one printed message about 10 tines >>>>>>> and after this system hangs. >>>>>>> >>>>>>> In debugger attached to kernel: >>>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>>> >>>>>>> >>>>>> Thanks. It seems the kernel crash doesn't caused by backtrace(), but >>>>>> ktapvm core or kernel side. >>>>> >>>>> Yes, I already test another script (without backtrace), that also have >>>>> some problems. >>>>> Do you have this issue? >>>>> >>>>>> >>>>>> Would you please check below scripts again? >>>>>> >>>>>> ------------------------------------------------- >>>>>> #!/usr/bin/env ktap >>>>>> >>>>>> trace probe:SyS_write { >>>>>> print("testing") >>>>>> } >>>>> >>>>> page fault. >>>>> >>>>>> -------------------------------------------------- >>>>>> #!/usr/bin/env ktap >>>>>> >>>>>> trace probe:SyS_write { >>>>>> } >>>>> >>>>> page fault. >>>>> >>>>>> -------------------------------------------------- >>>>>> #!/usr/bin/env ktap >>>>>> >>>>>> trace probe:schedule { >>>>>> } >>>>> >>>>> Just after this command, I ran dmesg, and received page fault. >>>>> And system hanged. >>>>> >>>>>> ------------------------------------------------- >>>>>> #!/usr/bin/env ktap >>>>>> >>>>>> trace syscalls:sys_write_enter { >>>>>> } >>>>> >>>>> "s/sys_write_enter/sys_enter_write/" >>>>> Seems works fine. >>>>> >>>>>> ------------------------------------------------- >>>>>> ./ktap test/arith.kp >>>>> >>>>> No output. No errors in kern.log >>>>> >>>>>> ------------------------------------------------- >>>>>> >>>>>> And your hardware is x86-64 or ARM-64? >>>>> >>>>> x86-64 >>>>> >>>> Unfortunately I cannot reproduce the crash in my local x86-64 >>>> box(whatever any kernel version, include 3.12-rc6). >>>> >>>> Things to be clear now, kernel will crash only using ktap kprobe, >>>> tracepoint and other ktap basic operation is fine. >>>> It have a little strange that actually ktapvm doesn't know tracepoint >>>> or kprobe in kernel side, ktapvm just treat them as a id, >>>> so it's weird that tracepoint is fine but kprobe is not working. >>>> >>>> I guess you ran vanilla kernel (3.12-rc6) with no additional patch, >>>> could you post your kernel config file? >>> >>> Yes, it is vanilla kernel, but with some debug patches. >>> (printk's, and one patch to btrfs, I don't think that this actually >>> play some role) >>> I will try to reproduce this issue using the latest tar balls version >>> tomorrow (I hope). >>> >>> Here is my kernel config >>> https://gist.github.com/azat/7078890 >>> >> Still cannot reproduce kernel crash when using your kernel config. > > Did you try to run something while that ktap script is running? > (another login, dmesg without head/tail) > Yes, whatever I do to system, nothing happened. >> What's confused me much is that tracepoint is working but kprobe not working. > > It works first N times however. > >> >> Can you apply below patch then run script again? > > I've just ran it, and I also have page fault. > Then it's interesting now, it seems caused by kernel kprobe, not ktapvm, because it never execute into kprobe perf callback in below code. static void ktap_overflow_callback(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs) { return; ... } void kp_perf_event_register(ktap_state *ks, struct perf_event_attr *attr, struct task_struct *task, char *filter, ktap_closure *cl) { ... event = perf_event_create_kernel_counter(attr, cpu, task, ktap_overflow_callback, ktap_pevent); } So let's ignore ktap for a while, just check if kprobe works fine or not. Please try below command: #echo 'p:kprobes/kp0 SyS_write' > /sys/kernel/debug/tracing/kprobe_events #echo 1 > /sys/kernel/debug/tracing/events/kprobes/kp0/enable #cat /sys/kernel/debug/tracing/trace