#### Re: run flags vs active intervals

Chris Foster <chri...@...>

Hi guys,

Nice job with the analysis Larry. It's interesting to see the kinds of
runstates which come out of your renderer, especially the apparent
predominance of really sparse states (at a quick eyeball). Based on the
spareseness, my analysis from a week ago would suggest that the indices are
going to win handily, and your data definitely confirms that!

To find out exactly what the average sparseness is, here's a script.

--------------------------------------------------
#!/usr/bin/python

from __future__ import division

f = open('rf.txt')

fractionOn = 0
totStates = 0

line = line.split()
count = int(line[0])
rf = [int(f) for f in line[2:]]
nFlags = len(rf)
nOn = len(filter(lambda x: x == 1, rf))
fractionOn += count * nOn/nFlags
totStates += count

fractionOn /= totStates

print fractionOn
--------------------------------------------------

The output for your data - excluding the 4285 non-batched points for which the
indices win anyway - is an average fraction of 0.31 states turned on which
matches with the impression you get from looking at the states by eye.

Almost the only cases where the spans method wins definitively are the all-on
case, so I agree that a special case optimization is in order. I found the
timings for the following mostly-on state particularly interesting:

runflags 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 0 0 0 1 1 1 1 1 0 1 1 1
span 0 3 4 11 12 14 15 37 38 42 43 52 53 68 72 77 78 81
indices 0 1 2 4 5 6 7 8 9 10 12 13 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 32 33 34 35 36 38 39 40 41 43 44 45 46 47 48 49 50
51 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 72 73 74 75 76 78 79
80

runflags: 16.87
spans: 14.48
indices: 14.78

My intuition _was_ that the spans method would win quite comfortably here (given
that it appears to win handily in the all-on case), but no, it doesn't!

Regarding implementation, it would be nice to find a way to specify the
iteration strategy in a single place in the codebase. In the absence of
non-clumsy closure support in the C++, maybe a macro is in order?

#define RUNSTATE_LOOP(runstate, currIdx, loopBody) \
if(runstate.nactive == runstate.npoints) {
/* optimization for all active points */
for(int currIdx = 0; currIdx < runstate.nactive; ++currIdx) {
loopBody
}
}
else {
for(int j = 0; j < runstate.nactive; ++j) {
int currIdx = runstate.active[j];
loopBody
}
}

This would replace current stuff like

for (int i = beginpoint; i < endpoint; ++i)
if (runflags[i])
function (result[i], a[i], b[i], c[i], d[i]);

with

RUNSTATE_LOOP(runstate, i,
function (result[i], a[i], b[i], c[i], d[i]);
)

Ok, I don't like having to use a macro, but IMHO this is probably more
maintainable than the alternatives. Is it general enough?

Note that I've assumed that your runstates are passed around as a
self-contained but lightweight struct rather than the (runflags, beginactive,
endactive) triple currently used.

~Chris.

Join osl-dev@lists.aswf.io to automatically receive all group messages.