I see no fundamental errors in what is written, but I would be cautious
about the jump table assumptions (as regards both generated instructions
and the actual performance) without thorough benchmarking.
Personally I worry more about maintainability of the Interchange file,
which right now is over 8k lines and not short on magic numbers (albeit
separately documented ones) which are mirrored in the interface code (for
the GUI, and I assume for the CLI). If you don't already know where things
are, and how the numbers fit together (so to speak) it is quite difficult
to navigate and hunt potential bugs.
2018-06-13 18:00 GMT+02:00 Will Godfrey <willgodfrey@xxxxxxxxxxxxxxx>:
I would like people to check through the following please before going
really public with it.
Does it make sense?
Is there some fundamental error in what I wrote?
The tortoise and the hare
Something came up at this year's LAC that highlights some of the dangers of
making timing comparisons. When doing this it is important to know what
priorities and trade-offs are being made, otherwise such comparisons become
It occurred to me that with Yoshimi these issues haven't really been
and I admit to being guilty of being a bit lazy by not having written them
up in the 'dev-notes' directory (which of course is the first place any
developer should look for such information). My defense is that Yoshimi is
still very much in transition :@)
Anyway, here goes (and apologies to those who already know this):
Yoshimi very heavily uses switch statements, some of which are truly
This might raise some alarm when viewed but is quite deliberate. Modern
compilers are very smart when they see these. The traditonal impression is
perform a lot of if-then tests but this is only true of small case groups,
after that they make lookup tables which are much faster.
But that isn't the end of it. If the the structure is big enough and dense
enough the compiler will instead fill the spaces with NOPs and turn the
thing into a jump table. This is magic, because any case can then be
exactly three machine instructions (two in the case of an ARM processor)
A left shift to turn incoming switch values into pointer sized steps.
An offset fetch from the jump table.
A branch to the required code.
ARM combines the first two.
The result for Yoshimi is that *any* data element can be read in less than
on a processor running at 3.1G - and not all the switches are yet dense
for the jump table type.
In order to make all of this accessible to developers there is a
'dev-notes' with details and status of every command currently implemented
a few being considered).
However, that's not all.
Yoshimi has different priorities for access to the structure based on the
Don't handle data that's not actually wanted.
Limits and defaults are static so can be read directly at any time with a
cut-down version of the overall structure. e.g. all part top level volume
controls have the same max, min, and default, so no need to test part and
Other reads may be wanted in bulk from time to time (such as updating a GUI
when a new patch set is loaded) so should be as fast as possible, but at
same time must wait briefly if a parameter is being changed. The downside
that the checks for this nearly double the access time - but hey, < 40nS is
still pretty slick. I think the oscillator window has the most controls -
under 200, so that means the whole lot can be fetched in less that 8uS.
Writes are much more complicated. In the first place, we have made them all
serial and synchronous with the audio thread (if they are not already) so
do you realistically time something going through a ring buffer,
with other 'somethings'?
On the bright side, writes are the only thing that can make reads wait,
a result these reads will *always* be seeing valid data.
The final wrinkle with writes is that they don't always do what they seem
Loading an instrument patch for example, just sets a flag to mute the part,
then passes the data to a low priority thread that can take its time
(as nothing can read it) then clear the flag again.
Bringing this to a close, there is one final potential gotcha when using
multiple calls to get more a accurate average, and minimise the set-up
overhead. If your data structure is small enough to fit enirely within the
processor cache, you'll end up just popping out the same stored value, not
recalculating it. Sometime I should probably re-test with 5 or 6 quite
different calls in the loop.
Will J Godfrey
Say you have a poem and I have a tune.
Exchange them and we can both have a poem, a tune, and a song.
Yoshimi source code is available from either: https://sourceforge.net/
Our list archive is at: https://www.freelists.org/archive/yoshimi
To post, email to yoshimi@xxxxxxxxxxxxx