On 22 Oct 2013 08:40, "Jovi Zhangwei" <jovi.zhangwei@xxxxxxxxx> wrote: > > On Tue, Oct 22, 2013 at 12:10 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: > > On Tue, Oct 22, 2013 at 1:00 AM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: > >> On Mon, Oct 21, 2013 at 1:31 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: > >>> On Mon, Oct 21, 2013 at 12:50 PM, Jovi Zhangwei < jovi.zhangwei@xxxxxxxxx> wrote: > >>>> On Mon, Oct 21, 2013 at 4:13 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: > >>>>> On Mon, Oct 21, 2013 at 11:31 AM, Jovi Zhangwei < jovi.zhangwei@xxxxxxxxx> wrote: > >>>>>> On Mon, Oct 21, 2013 at 1:15 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote: > >>>>>>> On Mon, Oct 21, 2013 at 7:55 AM, Jovi Zhangwei < jovi.zhangwei@xxxxxxxxx> wrote: > >>>>>>>> On Mon, Oct 21, 2013 at 12:01 AM, Azat Khuzhin < a3at.mail@xxxxxxxxx> wrote: > >>>>>>>>> On Sun, Oct 20, 2013 at 7:27 PM, Jovi Zhangwei < jovi.zhangwei@xxxxxxxxx> wrote: > >>>>>>>>>> On Sun, Oct 20, 2013 at 10:22 PM, Azat Khuzhin < a3at.mail@xxxxxxxxx> wrote: > >>>>>>>>>>> On Sun, Oct 20, 2013 at 12:50 PM, Jovi Zhangwei < jovi.zhangwei@xxxxxxxxx> wrote: > >>>>>>>>>>>> On Sat, Oct 19, 2013 at 5:06 PM, Azat Khuzhin < a3at.mail@xxxxxxxxx> wrote: > >>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I used next script, and after a few seconds of working, I have all ssh > >>>>>>>>>>>>> sessions killed. > >>>>>>>>>>>>> #! /usr/bin/env ktap > >>>>>>>>>>>>> trace probe:SyS_write { > >>>>>>>>>>>>> print(backtrace(0, 100)) > >>>>>>>>>>>>> } > >>>>>>>>>>>>> > >>>>>>>>>>>>> However, I can login after this, and have next messages in dmesg: > >>>>>>>>>>>>> > >>>>>>>>>>>> Hi Azat, > >>>>>>>>>>> > >>>>>>>>>>> Hi Jovi, > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks for your reporting, could you help to check if below two > >>>>>>>>>>>> scripts works correctly? > >>>>>>>>>>>> > >>>>>>>>>>>> #! /usr/bin/env ktap > >>>>>>>>>>>> trace probe:SyS_write { > >>>>>>>>>>>> print_backtrace(0, 100) > >>>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> [ 107.416689] BUG: unable to handle kernel paging request at ffffffff621ee2b5 > >>>>>>>>>>> [ 107.417774] IP: [<ffffffff621ee2b5>] 0xffffffff621ee2b4 > >>>>>>>>>>> [ 107.418569] PGD 160c067 PUD 0 > >>>>>>>>>>> [ 107.419409] Oops: 0010 [#3] PREEMPT SMP > >>>>>>>>>>> [ 107.420038] Modules linked in: ktapvm(O) nfsd auth_rpcgss > >>>>>>>>>>> oid_registry nfs_acl nfs lockd fscache sunrpc loop fuse joydev > >>>>>>>>>>> hid_generic usbhid hid snd_pcm microcode snd_page_alloc snd_timer snd > >>>>>>>>>>> soundcore ohci_pci ohci_hcd ehci_hcd psmouse usbcore i2c_piix4 > >>>>>>>>>>> processor ac ther > >>>>>>>>>>> mal_sys serio_raw pcspkr parport_pc usb_common evdev parport button > >>>>>>>>>>> i2c_core ext4 crc16 jbd2 mbcache sd_mod sg crc_t10dif sr_mod cdrom > >>>>>>>>>>> crct10dif_common ata_generic ahci libahci ata_piix libata e1000 > >>>>>>>>>>> scsi_mod > >>>>>>>>>>> [ 107.420038] CPU: 1 PID: 2654 Comm: less Tainted: G D O > >>>>>>>>>>> 3.12.0-rc5+ #40 > >>>>>>>>>>> [ 107.420038] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS > >>>>>>>>>>> VirtualBox 12/01/2006 > >>>>>>>>>>> [ 107.420038] task: ffff88007a911790 ti: ffff88007a05c000 task.ti: > >>>>>>>>>>> ffff88007a05c000 > >>>>>>>>>>> [ 107.420038] RIP: 0010:[<ffffffff621ee2b5>] [<ffffffff621ee2b5>] > >>>>>>>>>>> 0xffffffff621ee2b4 > >>>>>>>>>>> [ 107.420038] RSP: 0018:ffff88007a05df78 EFLAGS: 00010293 > >>>>>>>>>>> [ 107.420038] RAX: 0000000000000001 RBX: 00000000000003ff RCX: 0000000000000001 > >>>>>>>>>>> [ 107.420038] RDX: 00000000000003ff RSI: 0000000000628940 RDI: 0000000000000001 > >>>>>>>>>>> [ 107.420038] RBP: 000000000000005f R08: 0000000001e49c20 R09: 0000000000000000 > >>>>>>>>>>> [ 107.420038] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 > >>>>>>>>>>> [ 107.420038] R13: 0000000000000043 R14: 0000000000000000 R15: 0000000000000000 > >>>>>>>>>>> [ 107.420038] FS: 00007ff4a1bda700(0000) GS:ffff88007fd00000(0000) > >>>>>>>>>>> knlGS:0000000000000000 > >>>>>>>>>>> [ 107.420038] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > >>>>>>>>>>> [ 107.420038] CR2: ffffffff621ee2b5 CR3: 000000007b91a000 CR4: 00000000000006e0 > >>>>>>>>>>> [ 107.420038] Stack: > >>>>>>>>>>> [ 107.420038] 000000000000005f ffffffff813b2b22 0000000000000246 > >>>>>>>>>>> 0000000000000001 > >>>>>>>>>>> [ 107.420038] 0000000000000000 0000000001e49c20 0000000000000001 > >>>>>>>>>>> 0000000000000023 > >>>>>>>>>>> [ 107.420038] 00000000000003ff 0000000000628940 0000000000000001 > >>>>>>>>>>> 0000000000000001 > >>>>>>>>>>> [ 107.420038] Call Trace: > >>>>>>>>>>> [ 107.420038] [<ffffffff813b2b22>] ? system_call_fastpath+0x16/0x1b > >>>>>>>>>>> [ 107.420038] Code: Bad RIP value. > >>>>>>>>>>> [ 107.420038] RIP [<ffffffff621ee2b5>] 0xffffffff621ee2b4 > >>>>>>>>>>> [ 107.420038] RSP <ffff88007a05df78> > >>>>>>>>>>> [ 107.420038] CR2: ffffffff621ee2b5 > >>>>>>>>>>> [ 107.420038] ---[ end trace fe1cc133ae0119a3 ]--- > >>>>>>>>>>> > >>>>>>>>>>> And others. > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> #! /usr/bin/env ktap > >>>>>>>>>>>> trace probe:schedule { > >>>>>>>>>>>> print(backtrace(0, 100)) > >>>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> This one printed message about 10 tines > >>>>>>>>>>> and after this system hangs. > >>>>>>>>>>> > >>>>>>>>>>> In debugger attached to kernel: > >>>>>>>>>>> Program received signal SIGSEGV, Segmentation fault. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> Thanks. It seems the kernel crash doesn't caused by backtrace(), but > >>>>>>>>>> ktapvm core or kernel side. > >>>>>>>>> > >>>>>>>>> Yes, I already test another script (without backtrace), that also have > >>>>>>>>> some problems. > >>>>>>>>> Do you have this issue? > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Would you please check below scripts again? > >>>>>>>>>> > >>>>>>>>>> ------------------------------------------------- > >>>>>>>>>> #!/usr/bin/env ktap > >>>>>>>>>> > >>>>>>>>>> trace probe:SyS_write { > >>>>>>>>>> print("testing") > >>>>>>>>>> } > >>>>>>>>> > >>>>>>>>> page fault. > >>>>>>>>> > >>>>>>>>>> -------------------------------------------------- > >>>>>>>>>> #!/usr/bin/env ktap > >>>>>>>>>> > >>>>>>>>>> trace probe:SyS_write { > >>>>>>>>>> } > >>>>>>>>> > >>>>>>>>> page fault. > >>>>>>>>> > >>>>>>>>>> -------------------------------------------------- > >>>>>>>>>> #!/usr/bin/env ktap > >>>>>>>>>> > >>>>>>>>>> trace probe:schedule { > >>>>>>>>>> } > >>>>>>>>> > >>>>>>>>> Just after this command, I ran dmesg, and received page fault. > >>>>>>>>> And system hanged. > >>>>>>>>> > >>>>>>>>>> ------------------------------------------------- > >>>>>>>>>> #!/usr/bin/env ktap > >>>>>>>>>> > >>>>>>>>>> trace syscalls:sys_write_enter { > >>>>>>>>>> } > >>>>>>>>> > >>>>>>>>> "s/sys_write_enter/sys_enter_write/" > >>>>>>>>> Seems works fine. > >>>>>>>>> > >>>>>>>>>> ------------------------------------------------- > >>>>>>>>>> ./ktap test/arith.kp > >>>>>>>>> > >>>>>>>>> No output. No errors in kern.log > >>>>>>>>> > >>>>>>>>>> ------------------------------------------------- > >>>>>>>>>> > >>>>>>>>>> And your hardware is x86-64 or ARM-64? > >>>>>>>>> > >>>>>>>>> x86-64 > >>>>>>>>> > >>>>>>>> Unfortunately I cannot reproduce the crash in my local x86-64 > >>>>>>>> box(whatever any kernel version, include 3.12-rc6). > >>>>>>>> > >>>>>>>> Things to be clear now, kernel will crash only using ktap kprobe, > >>>>>>>> tracepoint and other ktap basic operation is fine. > >>>>>>>> It have a little strange that actually ktapvm doesn't know tracepoint > >>>>>>>> or kprobe in kernel side, ktapvm just treat them as a id, > >>>>>>>> so it's weird that tracepoint is fine but kprobe is not working. > >>>>>>>> > >>>>>>>> I guess you ran vanilla kernel (3.12-rc6) with no additional patch, > >>>>>>>> could you post your kernel config file? > >>>>>>> > >>>>>>> Yes, it is vanilla kernel, but with some debug patches. > >>>>>>> (printk's, and one patch to btrfs, I don't think that this actually > >>>>>>> play some role) > >>>>>>> I will try to reproduce this issue using the latest tar balls version > >>>>>>> tomorrow (I hope). > >>>>>>> > >>>>>>> Here is my kernel config > >>>>>>> https://gist.github.com/azat/7078890 > >>>>>>> > >>>>>> Still cannot reproduce kernel crash when using your kernel config. > >>>>> > >>>>> Did you try to run something while that ktap script is running? > >>>>> (another login, dmesg without head/tail) > >>>>> > >>>> Yes, whatever I do to system, nothing happened. > >>>> > >>>>>> What's confused me much is that tracepoint is working but kprobe not working. > >>>>> > >>>>> It works first N times however. > >>>>> > >>>>>> > >>>>>> Can you apply below patch then run script again? > >>>>> > >>>>> I've just ran it, and I also have page fault. > >>>>> > >>>> Then it's interesting now, it seems caused by kernel kprobe, not ktapvm, > >>>> because it never execute into kprobe perf callback in below code. > >>>> > >>>> static void ktap_overflow_callback(struct perf_event *event, > >>>> struct perf_sample_data *data, > >>>> struct pt_regs *regs) > >>>> { > >>>> return; > >>>> ... > >>>> } > >>>> > >>>> void kp_perf_event_register(ktap_state *ks, struct perf_event_attr *attr, > >>>> struct task_struct *task, char *filter, > >>>> ktap_closure *cl) > >>>> { > >>>> ... > >>>> event = perf_event_create_kernel_counter(attr, cpu, task, > >>>> ktap_overflow_callback, > >>>> ktap_pevent); > >>>> } > >>>> > >>>> So let's ignore ktap for a while, just check if kprobe works fine or not. > >>>> Please try below command: > >>>> > >>>> #echo 'p:kprobes/kp0 SyS_write' > /sys/kernel/debug/tracing/kprobe_events > >>>> > >>>> #echo 1 > /sys/kernel/debug/tracing/events/kprobes/kp0/enable > >>>> > >>>> #cat /sys/kernel/debug/tracing/trace > >>> > >>> root:~# mount -t debugfs debugfs /sys/kernel/debug/ > >>> root:~# echo 'p:kprobes/kp0 SyS_write' >| > >>> /sys/kernel/debug/tracing/kprobe_events > >>> root:~# cat /sys/kernel/debug/tracing/kprobe_events > >>> p:kprobes/kp0 SyS_write > >>> root:~# cat /sys/kernel/debug/tracing/kprobe_profile > >>> kp0 0 0 > >>> root:~# echo 1 >| /sys/kernel/debug/tracing/events/kprobes/kp0/enable > >>> root:~# cat /sys/kernel/debug/tracing/kprobe_profile > >>> kp0 82 0 > >>> root:~# cat /sys/kernel/debug/tracing/kprobe_profile > >>> kp0 97 0 > >>> root:~# cat /sys/kernel/debug/tracing/trace > >>> root:~# cat /sys/kernel/debug/tracing/trace > >>> # tracer: nop > >>> # > >>> # entries-in-buffer/entries-written: 175/175 #P:1 > >>> # > >>> # _-----=> irqs-off > >>> # / _----=> need-resched > >>> # | / _---=> hardirq/softirq > >>> # || / _--=> preempt-depth > >>> # ||| / delay > >>> # TASK-PID CPU# |||| TIMESTAMP FUNCTION > >>> # | | | |||| | | > >>> bash-2209 [000] d..1 121.488682: kp0: (SyS_write+0x0/0x74) > >>> which-2475 [000] d..1 121.491036: kp0: (SyS_write+0x0/0x74) > >>> git-2478 [000] d..1 121.494698: kp0: (SyS_write+0x0/0x74) > >>> git-2481 [000] d..1 121.497838: kp0: (SyS_write+0x0/0x74) > >>> wc-2486 [000] d..1 121.501778: kp0: (SyS_write+0x0/0x74) > >>> bash-2209 [000] d..1 121.502462: kp0: (SyS_write+0x0/0x74) > >>> sshd-2067 [000] d..1 121.502516: kp0: (SyS_write+0x0/0x74) > >>> sshd-2067 [000] d..1 126.008965: kp0: (SyS_write+0x0/0x74) > >>> bash-2209 [000] d..1 126.009187: kp0: (SyS_write+0x0/0x74) > >>> sshd-2067 [000] d..1 126.009238: kp0: (SyS_write+0x0/0x74) > >>> sshd-2067 [000] d..1 126.240849: kp0: (SyS_write+0x0/0x74) > >>> ................. > >>> > >>> In another console: > >>> dmesg > /dev/null # fine > >>> dmesg # all sessions logged out > >>> > >>> azat$ ssh -p2204 root@localhost # vm > >>> root@localhost's password: > >>> ..... > >>> Connection to localhost closed. > >>> > >>> Hm.. that is very interesting. I will try to debug this in evening/tomorrow. > >>> Thanks! > >> > >> I've just tested on _another_ vm (fedora 19) with > >> 3.11.4-201.fc19.x86_64.debug and ktap works correct. > >> > >> Right now I'm building kernel from - > >> https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.11.6.tar.xz > >> I will let you know. > > > > This one also trigger page fault. > > > Is your environment could test mainline kernel (3.12-rc6) without any > addition patch? > if it's doesn't work on your hardware, then we can report this to LKML, > if it works, perhaps then we can use git-bisect to find root cause. This was the kernel without any additional patches. I also tried to unload all unusable modules, it didn't help. I want to test it with another config, if it doesn't help, than we'll report this. > > (kprobe of mainline kernel 3.12-rc6 works fine in my box) > > Jovi >