On Tue, Sep 09, 2014 at 01:46:25AM +0200, Mike Pall wrote: > Initialization of nested aggregates is not compiled. This is far > from trivial in the general case. You can send a patch, if you > really want to dive into this (function crec_alloc). Please see the attached patch against LuaJIT 2.1. It is a first attempt, but the test case already compiles: [TRACE 1 union.lua:19 loop] [TRACE --- union.lua:12 -- NYI: return to lower frame at union.lua:24] [TRACE 2 union.lua:23 loop] Could you give hints on how to improve the patch? > IMHO it makes sense to enforce a common notation. Otherwise people > will have a hard time to understand each other's modules. A simple > struct is much easier on the compiler, too. The OpenCL C specification permits both (and more) notations for accessing vector components in device code, so for consistency the host code should support both notations, too. The choice of notation depends on how a vector type is used: x,y,z is suited for physical vectors, s0,s1,s2,…,sA,…,sF for any aggregates up to 16 components. To my surprise a plain struct versus a union with nested struct perform equally well with the attached patch. The code transforms and averages the velocities of 10⁵ solvent particles to obtain a flow field. The allocation sinking in LuaJIT 2.1 is impressive. Thanks, Peter
diff --git a/src/lj_crecord.c b/src/lj_crecord.c index acd786f..85144c2 100644 --- a/src/lj_crecord.c +++ b/src/lj_crecord.c @@ -969,6 +969,10 @@ static void crec_alloc(jit_State *J, RecordFFData *rd, CTypeID id) MSize i = 1; while (fid) { CType *df = ctype_get(cts, fid); + if (ctype_isxattrib(df->info, CTA_SUBTYPE)) { + fid = ctype_rawchild(cts, df)->sib; + continue; + } fid = df->sib; if (ctype_isfield(df->info)) { CType *dc;
local ffi = require("ffi") ffi.cdef[[ typedef union { struct { double x, y, z, w; }; struct { double s0, s1, s2, s3; }; } cl_double4; ]] local double4 = ffi.typeof("cl_double4") ffi.metatype(double4, { __add = function(a, b) return double4(a.x + b.x, a.y + b.y, a.z + b.z, a.w + b.w) end, }) local N = 100000 local v = ffi.new("cl_double4[?]", N) for i = 0, N - 1 do v[i] = double4(4, 3, 2, 1) end local x = double4(1, 2, 3, 4) for i = 0, N - 1 do x = x + (v[i] + v[i]) end assert(x.x == 1 + N * 8) assert(x.y == 2 + N * 6) assert(x.z == 3 + N * 4) assert(x.w == 4 + N * 2)