On Mon, Oct 21, 2013 at 11:31 AM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> wrote: > On Mon, Oct 21, 2013 at 1:15 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: >> On Mon, Oct 21, 2013 at 7:55 AM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> >> wrote: >>> On Mon, Oct 21, 2013 at 12:01 AM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: >>>> On Sun, Oct 20, 2013 at 7:27 PM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> >>>> wrote: >>>>> On Sun, Oct 20, 2013 at 10:22 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> >>>>> wrote: >>>>>> On Sun, Oct 20, 2013 at 12:50 PM, Jovi Zhangwei >>>>>> <jovi.zhangwei@xxxxxxxxx> wrote: >>>>>>> On Sat, Oct 19, 2013 at 5:06 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> >>>>>>> wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I used next script, and after a few seconds of working, I have all ssh >>>>>>>> sessions killed. >>>>>>>> #! /usr/bin/env ktap >>>>>>>> trace probe:SyS_write { >>>>>>>> print(backtrace(0, 100)) >>>>>>>> } >>>>>>>> >>>>>>>> However, I can login after this, and have next messages in dmesg: >>>>>>>> >>>>>>> Hi Azat, >>>>>> >>>>>> Hi Jovi, >>>>>> >>>>>>> >>>>>>> Thanks for your reporting, could you help to check if below two >>>>>>> scripts works correctly? >>>>>>> >>>>>>> #! /usr/bin/env ktap >>>>>>> trace probe:SyS_write { >>>>>>> print_backtrace(0, 100) >>>>>>> } >>>>>> >>>>>> [ 107.416689] BUG: unable to handle kernel paging request at >>>>>> ffffffff621ee2b5 >>>>>> [ 107.417774] IP: [<ffffffff621ee2b5>] 0xffffffff621ee2b4 >>>>>> [ 107.418569] PGD 160c067 PUD 0 >>>>>> [ 107.419409] Oops: 0010 [#3] PREEMPT SMP >>>>>> [ 107.420038] Modules linked in: ktapvm(O) nfsd auth_rpcgss >>>>>> oid_registry nfs_acl nfs lockd fscache sunrpc loop fuse joydev >>>>>> hid_generic usbhid hid snd_pcm microcode snd_page_alloc snd_timer snd >>>>>> soundcore ohci_pci ohci_hcd ehci_hcd psmouse usbcore i2c_piix4 >>>>>> processor ac ther >>>>>> mal_sys serio_raw pcspkr parport_pc usb_common evdev parport button >>>>>> i2c_core ext4 crc16 jbd2 mbcache sd_mod sg crc_t10dif sr_mod cdrom >>>>>> crct10dif_common ata_generic ahci libahci ata_piix libata e1000 >>>>>> scsi_mod >>>>>> [ 107.420038] CPU: 1 PID: 2654 Comm: less Tainted: G D O >>>>>> 3.12.0-rc5+ #40 >>>>>> [ 107.420038] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS >>>>>> VirtualBox 12/01/2006 >>>>>> [ 107.420038] task: ffff88007a911790 ti: ffff88007a05c000 task.ti: >>>>>> ffff88007a05c000 >>>>>> [ 107.420038] RIP: 0010:[<ffffffff621ee2b5>] [<ffffffff621ee2b5>] >>>>>> 0xffffffff621ee2b4 >>>>>> [ 107.420038] RSP: 0018:ffff88007a05df78 EFLAGS: 00010293 >>>>>> [ 107.420038] RAX: 0000000000000001 RBX: 00000000000003ff RCX: >>>>>> 0000000000000001 >>>>>> [ 107.420038] RDX: 00000000000003ff RSI: 0000000000628940 RDI: >>>>>> 0000000000000001 >>>>>> [ 107.420038] RBP: 000000000000005f R08: 0000000001e49c20 R09: >>>>>> 0000000000000000 >>>>>> [ 107.420038] R10: 0000000000000001 R11: 0000000000000246 R12: >>>>>> 0000000000000000 >>>>>> [ 107.420038] R13: 0000000000000043 R14: 0000000000000000 R15: >>>>>> 0000000000000000 >>>>>> [ 107.420038] FS: 00007ff4a1bda700(0000) GS:ffff88007fd00000(0000) >>>>>> knlGS:0000000000000000 >>>>>> [ 107.420038] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>> [ 107.420038] CR2: ffffffff621ee2b5 CR3: 000000007b91a000 CR4: >>>>>> 00000000000006e0 >>>>>> [ 107.420038] Stack: >>>>>> [ 107.420038] 000000000000005f ffffffff813b2b22 0000000000000246 >>>>>> 0000000000000001 >>>>>> [ 107.420038] 0000000000000000 0000000001e49c20 0000000000000001 >>>>>> 0000000000000023 >>>>>> [ 107.420038] 00000000000003ff 0000000000628940 0000000000000001 >>>>>> 0000000000000001 >>>>>> [ 107.420038] Call Trace: >>>>>> [ 107.420038] [<ffffffff813b2b22>] ? system_call_fastpath+0x16/0x1b >>>>>> [ 107.420038] Code: Bad RIP value. >>>>>> [ 107.420038] RIP [<ffffffff621ee2b5>] 0xffffffff621ee2b4 >>>>>> [ 107.420038] RSP <ffff88007a05df78> >>>>>> [ 107.420038] CR2: ffffffff621ee2b5 >>>>>> [ 107.420038] ---[ end trace fe1cc133ae0119a3 ]--- >>>>>> >>>>>> And others. >>>>>> >>>>>>> >>>>>>> >>>>>>> #! /usr/bin/env ktap >>>>>>> trace probe:schedule { >>>>>>> print(backtrace(0, 100)) >>>>>>> } >>>>>> >>>>>> This one printed message about 10 tines >>>>>> and after this system hangs. >>>>>> >>>>>> In debugger attached to kernel: >>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>> >>>>>> >>>>> Thanks. It seems the kernel crash doesn't caused by backtrace(), but >>>>> ktapvm core or kernel side. >>>> >>>> Yes, I already test another script (without backtrace), that also have >>>> some problems. >>>> Do you have this issue? >>>> >>>>> >>>>> Would you please check below scripts again? >>>>> >>>>> ------------------------------------------------- >>>>> #!/usr/bin/env ktap >>>>> >>>>> trace probe:SyS_write { >>>>> print("testing") >>>>> } >>>> >>>> page fault. >>>> >>>>> -------------------------------------------------- >>>>> #!/usr/bin/env ktap >>>>> >>>>> trace probe:SyS_write { >>>>> } >>>> >>>> page fault. >>>> >>>>> -------------------------------------------------- >>>>> #!/usr/bin/env ktap >>>>> >>>>> trace probe:schedule { >>>>> } >>>> >>>> Just after this command, I ran dmesg, and received page fault. >>>> And system hanged. >>>> >>>>> ------------------------------------------------- >>>>> #!/usr/bin/env ktap >>>>> >>>>> trace syscalls:sys_write_enter { >>>>> } >>>> >>>> "s/sys_write_enter/sys_enter_write/" >>>> Seems works fine. >>>> >>>>> ------------------------------------------------- >>>>> ./ktap test/arith.kp >>>> >>>> No output. No errors in kern.log >>>> >>>>> ------------------------------------------------- >>>>> >>>>> And your hardware is x86-64 or ARM-64? >>>> >>>> x86-64 >>>> >>> Unfortunately I cannot reproduce the crash in my local x86-64 >>> box(whatever any kernel version, include 3.12-rc6). >>> >>> Things to be clear now, kernel will crash only using ktap kprobe, >>> tracepoint and other ktap basic operation is fine. >>> It have a little strange that actually ktapvm doesn't know tracepoint >>> or kprobe in kernel side, ktapvm just treat them as a id, >>> so it's weird that tracepoint is fine but kprobe is not working. >>> >>> I guess you ran vanilla kernel (3.12-rc6) with no additional patch, >>> could you post your kernel config file? >> >> Yes, it is vanilla kernel, but with some debug patches. >> (printk's, and one patch to btrfs, I don't think that this actually >> play some role) >> I will try to reproduce this issue using the latest tar balls version >> tomorrow (I hope). >> >> Here is my kernel config >> https://gist.github.com/azat/7078890 >> > Still cannot reproduce kernel crash when using your kernel config. Did you try to run something while that ktap script is running? (another login, dmesg without head/tail) > What's confused me much is that tracepoint is working but kprobe not working. It works first N times however. > > Can you apply below patch then run script again? I've just ran it, and I also have page fault. > In below patch, code will not jump into ktapvm core when kprobe event fired. > > If there have no crash at this time, then it means ktapvm is the problem area; > if there have crash, it means kernel kprobe is the root cause. > > Thank you. > > diff --git a/interpreter/library/kdebug.c b/interpreter/library/kdebug.c > index c51e7f0..8185c64 100644 > --- a/interpreter/library/kdebug.c > +++ b/interpreter/library/kdebug.c > @@ -154,6 +154,8 @@ static void ktap_overflow_callback(struct perf_event > *event, > ktap_state *ks; > int rctx; > > + return; > + > ktap_pevent = event->overflow_handler_context; > ks = ktap_pevent->ks; > -- Respectfully Azat Khuzhin