Tomas Lundell wrote: > I don't recall we tested the dual-number mode, so I don't know if it works > on the consoles or what the performance would be like. You tested dual-number mode. > Double arithmetic isn't *that* slow on consoles, so I would > hazard to guess the winner is whichever mode incurs the least > load-hit-stores. PPC has strictly segregated integer and FP register banks. You can only transfer from one to the other via memory. This means all double-to-integer conversions have to go through memory and incur the dreaded 40 cycle penalty. Yes, fourty cycles! Let's take a look at a trivial loop that goes through an array and how it's run with an interpreter: -- [... code for filling the array omitted ...] local x = 0 for i=1,100 do x = x + a[i] end This will incur an extra 40 cycle delay if 'i' is a double that needs to be converted to an integer for array indexing. That's obviously the case for single-number mode. In dual-number mode 'i' is kept as an integer, so there's no extra penalty. You still pay for the other l-h-s penalties, but they can overlap a bit, so this probably costs only 2*40 cycles: ::loop:: _________ +40 | V | tmp = a[i] -- +40 for single-number mode, +0 for dual-number mode | \_______ | _____+40 \ +40 | | V V | | x = x + tmp | |_| | |_____ +40 | V | i = i + 1 | if i <= 100 then goto ::loop:: end |_| Then there's also the indirect branch prediction capability for bytecode dispatch (different between Cell and Xenon). That's a penalty of up to 3*20 cycles for this loop. I guess this means the loop runs at 2*40+3*20 = 140 cycles per iteration in the worst case. The few cycles for the actual computations and the overhead of the bytecode dispatch don't matter that much in comparison. Yes, this is all due to the @$%&)" design of these chips, that penalizes all interpreters. On top of this, the console manufacturers had the brilliant idea to ban JIT compilers, which wouldn't suffer from all of this. :-/ Now you know why a Lua interpreter runs dog-slow on consoles, even though they run at around 3GHz, which is similar to your desktop PC. --Mike