Igor,
Two questions:
- What are these other problems? We want to know!
- It seems to me that the overhead of a side trace entry might be too
high with your approach. Side-trace overhead is already a major
problem, and regressing it is a bad idea.
While implementing a length decoder sounds like a difficult problem, I
suspect that much of it can be generated automatically. Also, a full
disassembler is not needed -- just a length decoder.
Demi
On Tue, 2016-04-05 at 13:06 +0300, Igor Ehrlich wrote:
Peter,
I understand your idea, but I don't think that this will be the best
solution. First of all, introducing C-level x86 decoder just for this
issue seems like an overkill. Also, as far as I can see, Mike wanted
to make trace patching as rapid as possible - that's why he
introduced such simplified algorithm. Who knows, maybe additional
runtime overhead here might affect execution performance in some
applications. Finally, you write:
Then again, perhaps a length decoder for the small subset of
instructions emitted by the JIT compiler isn't too horrific.
Well, let's assume that LuaJIT evolves further and starts to emit
more and more instructions - will the maintenance overhead still not
be horrific from your POV?
In fact, I have an idea regarding this case. It requires quite some
changes in how side exits are handled in the traces. However, the
changes are quite local, it is even possible to contain them within
x86 target. Briefly, it introduces additional code section right
underneath the end of each trace. Such section only consists of
jumps, one jump per exitno. Conditional branches inside the code
should always point at this section. Jump targets in this section are
either proper exit stubs (once the trace is generated), or entry
points of side traces (if any). In this case, trace patching will be
simplified greatly, since variable target for each side exit will
only be present once in a trace at the known place. Consider the
following trace with two exitno (as of now):
=== TRACE START ===
...
cmp <...>; jne _exitstub_0
...
cmp <...>; jne _exitstub_0
...
cmp <...>; jae _exitstub_0
...
cmp <...>; jne _exitstub_1
...
cmp <...>; jae _exitstub_1
=== TRACE END ===
In this example, "_exitstub_*" labels are the actual addresses of
exit stubs (these are accessed through exitstub_addr() @
lj_target.h). As of now, once we compile a side trace for the current
trace, we have to scan the whole mcode in order to find corresponding
side exits and patch them and get the following:
=== TRACE START ===
...
cmp <...>; jne _sidetrace_entry
...
cmp <...>; jne _sidetrace_entry
...
cmp <...>; jae _sidetrace_entry
...
cmp <...>; jne _exitstub_1
...
cmp <...>; jae _exitstub_1
=== TRACE END ===
Here's where the initial issue occurs. My proposal here is to change
the code layout in the following way:
=== TRACE START ===
...
cmp <...>; jne _trampoline_0
...
cmp <...>; jne _trampoline_0
...
cmp <...>; jae _trampoline_0
...
cmp <...>; jne _trampoline_1
...
cmp <...>; jae _trampoline_1
=== TRACE END ===
=== TRAMPOLINES ===
_trampoline_0:
jmp _exitstub_0
_trampoline_1:
jmp _exitstub_1
=== REAL TRACE END ===
At this point, address of "_exitstub_0" is only present once in trace
mcode. At this point, if we decide to patch exitno 0, all we need to
do is to go past the end of "regular" mcode (which is, "T->mcode + T-
szmcode"), treat the following memory as "jmp rel32" and change thetarget. In general, if we want to patch exit group N, we go past the
end of "regular" mcode, skip N*sizeof("jmp rel32") bytes (sorry for
that, but once again, you get the idea) and apply the patch to jump
target:
=== TRACE START ===
...
cmp <...>; jne _trampoline_0
...
cmp <...>; jne _trampoline_0
...
cmp <...>; jae _trampoline_0
...
cmp <...>; jne _trampoline_1
...
cmp <...>; jae _trampoline_1
=== TRACE END ===
=== TRAMPOLINES ===
_trampoline_0:
jmp _sidetrace_entry
_trampoline_1:
jmp _exitstub_1
=== REAL TRACE END ===
As you may notice, the "regular" mcode (and the conditional jumps
that are patched as of now) do not change in this schema. The
patching algorithm is short, simple and expressive. The initial issue
is solved. Drawbacks? Well, there are some:
1. Memory overhead of "5*exitno" bytes per trace - that's where
trampolines are placed.
2. Additional jump per __any__ side exit. I don't know, but my guess
is that this will work good enough.
I hope any of this makes sense. Feel free to correct me, as I'm very
new to luajit development. Also, any feedback is highly appreciated.
P.S.: Also, I think it's the right step in order to fix some other
issues on x86, which are much less believable than this one, but
still present as well.
Best regards,
Igor A. Ehrlich
On Tue, Apr 5, 2016 at 12:39 AM, Peter Cawley <corsix@xxxxxxxxxx>
wrote:
Nice write-up.
Unfortunately, the least bad solution which comes to mind involves
writing an x86 instruction length decoder and plugging it in to
lj_asm_patchexit (this would then result in the LuaJIT repo
containing
both two x86 encoders - in dasm_x86.lua and lj_emit_x86.h - and two
x86 decoders - in dis_x86.lua and lj_asm_x86.h). Then again,
perhaps a
length decoder for the small subset of instructions emitted by the
JIT
compiler isn't too horrific.
On Mon, Apr 4, 2016 at 10:35 AM, Igor Ehrlich <igor.a.ehrlich@gmail
.com> wrote:
Dear all,found an
during the investigation of random core dumps in LuaJIT 2.0.4, I
issue that's present in both master and 2.1 branches, which maylead to
random failures with arbitrary symptoms on x86 and x86_64. Theissue affects
x86 and x86_64 targets, GNU/Linux confirmed, other OSes are mostprobably
affected too. Such failures are extremely hard to reproduce, soI'll just
dump the results of the investigation below.the
First of all, during the compilation of a trace emitter produces
following mcode:instruction of
0xfa98b57: jne 0x1efbca7b
# 0f 85 1e 3f 52 0f
0xfa98b5d: cmp DWORD PTR [rdx+0x274],0xfffffffb
# 83 ba 74 02 00 00 fb
FTR, these are the last instruction of HREFK and the first
HLOAD with "str" output type. Everything works fine until thecompiler
generates a side trace for this trace and starts patching theparent. The
function lj_asm_patchexit scans parent trace byte-by-byte andeventually
comes to the address 0xfa98b5c in attempt to find a six-byte sideexit jcc
pattern. As you may see, this is not an address of any particularvalid
instruction. The six-byte pattern here overlaps the last byte ofjne and
five bytes of cmp, looking like "0f 83 ba 74 02 00". Patchertries to
interpret this as a jcc instruction, and it __succeeds__, as "0f83" works
for it as an opcode and 0x274ba is the __exact relative offset__from
0xfa98b5c to the stub of the side exit being patched (plus-minus6, hope you
get the idea).0xfa98b5f are
As a result of this misinterpretation, bytes 0xfa98b5c through
patched with the offset to the newly created side trace,resulting in...
the point
0xfa98b57: jne 0x1efbca7b
# 0f 85 1e 3f 52 0f
0xfa98b5d: and DWORD PTR [rsi],0x1b
# 83 26 1b
0xfa98b60: rex.WX (bad)
# 00
0xfa98b63: sti
# fb
...and the application segfaults on 0xfa98b5d. Actually, that's
where the investigation was started.error here.
As it was said earlier, technically, you can get virtually any
This might be a SIGSEGV, or SIGILL. If you are lucky enough toencounter
such an unlikely issue, you might even get no error with anunpredictable
outcome.
Best regards,
Igor A. Ehrlich