Re: run flags vs active intervals


Larry Gritz <l...@...>
 

I want to point out that Chris F tested out the "all on", "random on", and "alternating on/off" cases. There's one more case that may be important, which is a few isolated "on" points with big "off" gaps in between -- and in that case, I expect Chris F's solution to perform even better (compared to what we have now) than the other cases.

I'm not looking forward to doing this overhaul (only because it's tedious and extensive, there's nothing hard about it), but I think it's potentially a big win. Thanks, Chris!

-- lg


On Jan 21, 2010, at 10:49 AM, Christopher wrote:

I like this idea too. What we were discussing yesterday was something
like:

index: [ 0 1 2 3 4 5 6 7 8 9 ]
flags: [ 0 0 1 1 1 1 0 0 1 0 ]
active_points: [ 2 3 4 5 8 ]

for (int i = 0; i < num_active; i++)
do_something(active_points[i]);

This would be slightly more efficient if you had a single point active
(or isolated single points), but it requires a lot more indirections
in the common case.

So I'm all in favor of trying this out - though it is a pretty big
overhaul ...

I would maintain these ranges as little int arrays on the stack just
like we maintain the runflags now. The upper bound for the
active_range array size is just npoints (run flags: on off on off on
off ...) - flip any bit and you get fewer "runs".

-Chris


On Jan 21, 9:16 am, Larry Gritz <l...@...> wrote:
Awesome, Chris. Would you believe we were just talking internally about this topic yesterday? We were considering the amount of waste if there were big gaps of "off" points in the middle. But I think your solution is quite a bit more elegant than what we were discussing (a list of "on" points). I like how it devolves into (begin,end] for the common case of all points on.

It's a pretty big overhaul, touches every shadeop and a lot of templates, and the call signatures have to be changed (to what, exactly? passing a std::vector<int>& ? or a combo of int *begend and int segments?). Ugh, and we may wish to change/add OIIO texture routines that take runflags, too. But your experiment is quite convincing.

What does everybody else think? Chris/Cliff/Alex? (I'm happy to do the coding, but I want consensus because it touches so much.)

-- lg

On Jan 21, 2010, at 8:21 AM, Chris Foster wrote:



Hi all,
I've been looking through the OSL source a little, and I'm interested to see
that you're using runflags for the SIMD state. I know that's a really
conventional solution, but there's an alternative representation which I reckon
is significantly faster, and I wondered if you considered it.
Imagine a SIMD runstate as follows
index: [ 0 1 2 3 4 5 6 7 8 9 ]
flags: [ 0 0 1 1 1 1 0 0 1 0 ]
An alternative to the flags is to represent this state as a list of active
intervals. As a set of active start/stop pairs, the state looks like
active = [ 2 6 8 9 ]
ie,
state: [ 0 0 1 1 1 1 0 0 1 0 ]
^ v ^ v
2 6 8 9
The advantage of doing this is that the inner SIMD loops become tighter since
there's one less test. Instead of
for (int i = 0; i < len; ++i)
{
if (state[i])
do_something(i);
}
we'd have
for (int j = 0; j < nActiveTimes2; j+=2)
{
for (int i = active[j]; i < active[j+1]; ++i)
do_something(i);
}
and the inner loop is now completely coherent.
I can see you have the beginnings of this idea in the current OSL code, since
you pass beginpoint and endpoint along with the runflags everywhere. However,
why not take it to its logical conclusion?
Given that you already have beginpoint and endpoint, the particularly big win
here is when most of the flags are turned on. If do_something() is a simple
arithmetic operation (eg, float addition) the difference between the two
formulations can be a factor of two in speed.
I'm attaching some basic test code which I whipped up. Timings on my core2 duo
with gcc -O3 look like:
All flags on (completely coherent):
run flags: 1.310s
active intervals: 0.690s
Random flags on (completely incoherent):
run flags: 5.440s
active intervals: 3.310s
Alternate flags on (maximum number of active intervals -> worst case):
run flags: 1.500s
active intervals: 2.150s
Thoughts?
~Chris.
<ATT00001..txt><runstate.cpp>
--
Larry Gritz
l...@...
<ATT00001..txt>
--
Larry Gritz
l...@...

Join osl-dev@lists.aswf.io to automatically receive all group messages.