Again, thanks a lot for the responses.
Let's put aside the thread affinity approach, as it's more complex. You canNoted, thanks for the warning.
diagnose the issue quite rapidly with the unprotected mcode approach: in
'src/lj_mcode.h', find "LUAJIT_UNPROTECT_MCODE" symbol. It's a define, that's
not set by default. Set it unconditionally above the first usage, re-compile
and re-deploy. If it stops crashing, that's our bug.
Please read and understand carefully what this define means (right after the
first usage) and use it at your own risk. As a side note, it does not affect
stability, but brings a very-very potential vulnerability in your service.
Like, very-very potentinal. I think it's the best way to use this approach
for diagnostic purpose only and, if it helps, return here for more solid
solution (for example, the one proposed by Peter).
An alternative hypothesis is that the memory is seen as executable,Interesting idea indeed - this got me checking some other core dumps available
but the contents of the memory is still seen as 00 00 00 00 by the
executing CPU, which decodes as `add [eax], ax`, and would segfault
given that rax contained 0xC. That said, rapidly flipping page
protection causing confusion seems a more likely hypothesis.