Pre-review: run state overhaul (issue206045)


larry...@...
 

Reviewers: ,

Description:
We've done extensive analysis showing that for typical shading batches
we see in the wild, our method of having an array of true/false runflags
is very wasteful, and despite the extra indirection involved, having a
flat list of the "on" points is much more efficient (5x or more speed
gain on tight inner loops). We're not sure yet exactly what this will
translate to in overall runtime gains, but it's got to help.

This is not a final review, just showing progress along the way.

The code in this pre-review adds the index list to the Runstate, and
maintains it properly. I validate this by checking at every instruction
that the runflags and indices, and all tests pass! So I know for sure
that the indices are being set up properly.

The whole point is that now I can convert the shadeops one by one, at
all times keeping the renderer working, and then only when everything is
converted rip out the old runflags entirely.


Please review this at http://codereview.appspot.com/206045/show

Affected files:
src/liboslexec/exec.cpp
src/liboslexec/opcontrol.cpp
src/liboslexec/oslexec_pvt.h


larry...@...
 

Incidentally, how much slower do we render by keeping both the runflags
and indices in place? Only 2%.

So this makes me think that quite soon into changing the individual
shadeops, we'll break even and can start deploying a half-completed
solution that's a performance improvement, even long before we've
finished the whole thing and are ready to rip out the runflags once and
for all.



http://codereview.appspot.com/206045/show


aco...@...
 


cku...@...
 

LGTM2


http://codereview.appspot.com/206045/diff/1/4
File src/liboslexec/exec.cpp (right):

http://codereview.appspot.com/206045/diff/1/4#newcode583
src/liboslexec/exec.cpp:583: #if 1
Should this be #if 0'd ? Or is it really only 2% slower with this in
place?

http://codereview.appspot.com/206045/show


Larry Gritz <l...@...>
 

No, it's 2% slower with it '#if 0'.

I'm not going to commit in this state, I just wanted to show everybody how I'd verified that I'm maintaining the index list correctly and the general scheme of what we're doing.

-- lg

On Feb 9, 2010, at 10:16 AM, <cku...@...> wrote:

LGTM2


http://codereview.appspot.com/206045/diff/1/4
File src/liboslexec/exec.cpp (right):

http://codereview.appspot.com/206045/diff/1/4#newcode583
src/liboslexec/exec.cpp:583: #if 1
Should this be #if 0'd ? Or is it really only 2% slower with this in
place?

http://codereview.appspot.com/206045/show
--
Larry Gritz
l...@...


larry...@...
 

OK, I just updated the patch set again to keep up with the trunk, in
particular Chris Kulla's recent changes to reorder points to make
packets with contiguous "on" points.

I'm afraid it's still the case that I get essentially identical timings
with runflags, indices, and spans when testing on a production frame.
(Actually, runflags seems ever so slightly slower than the other two,
but not by enough to be confident that it isn't measurement error.)

I'm not sure what else to do in this area. Suggestions appreciated.

I think we should still commit this, it's not much uglier than before
(and the ugliness is confined to two files), and it may be handy to
switch between rf/ind/span in the future or do more experiments.


http://codereview.appspot.com/206045/show


cku...@...
 

LGTM


I agree that it would be good to commit this as it cleans things up and
lets us play with this topic some more in the future.

Have you been able to duplicate the effects we saw in the benchmark
inside of testshade?


http://codereview.appspot.com/206045/diff/2001/2018
File src/liboslexec/exec.cpp (right):

http://codereview.appspot.com/206045/diff/2001/2018#newcode542
src/liboslexec/exec.cpp:542: // Passed indices -- need to convert to
runflags
Why would we ever pass indices if we are using runflags? Is this
left-over code from when they were all tracked together?

http://codereview.appspot.com/206045/show


Larry Gritz <l...@...>
 

On Feb 22, 2010, at 6:39 PM, <cku...@...> <cku...@...> wrote:

http://codereview.appspot.com/206045/diff/2001/2018#newcode542
src/liboslexec/exec.cpp:542: // Passed indices -- need to convert to
runflags
Why would we ever pass indices if we are using runflags? Is this
left-over code from when they were all tracked together?
The idea is that the top level "run" command could be passed either runflags or indices, and would convert if necessary. That way, we can throw the switch on the osl internals without changing the calling app.

-- lg

--
Larry Gritz
l...@...


chri...@...
 

LGTM. Too bad you didn't see a performance improvement with this. Very
perplexing.


http://codereview.appspot.com/206045/diff/2001/2040
File src/liboslexec/opinteger.cpp (right):

http://codereview.appspot.com/206045/diff/2001/2040#newcode242
src/liboslexec/opinteger.cpp:242: //
runflags, beginpoint_, endpoint_);
Is the commented out stuff in OP_compl dead code?

http://codereview.appspot.com/206045/show


cku...@...
 

I replicated the results wherein spans worked slightly fastest when
all points
were on (which is what testshade does).
I didn't rig testshade to do sparse sets. Then again, with your
recent change,

I was thinking you could do something like:
if (cellnoise(100*u,100*v) > 0.9) {
// lots of arithmetic ops
}

to emulate the sparse points (you can vary the amount of "sparseness" by
changing the constants).

But it is true that we don't really have the sparse input batches
anymore, so perhaps the whole exercise is now only relevant for varying
conditionals (which may be noticeable someday, or for other types of
applications with very large batches).

http://codereview.appspot.com/206045/show