Review: runtime optimizations, part 2 (issue639042)


larry...@...
 

Reviewers: ,

Description:
Significant additional work on "runtime code optimization" introduced
recently. Quick run-down:

* New constant folding for additional ops: neg, aref (was previously
commented out due to bugs), compref, strlen, endswith, concat, clamp,
sqrt, matrix, transform[nv].

* All previous optimizations had been ones where no *new* constant
values were needed (we could do new constant symbols, but only using
VALUES that were stored statically or were the defaults of parameters
stored in the master). Now we can make up brand new constants with new
values, which are stored in the ShadingSystem via the helper class
ConstantPool. This allows add, sub, mul, and div where both operands
are constant will just turn into a single constant, as well as several
of the other new constant folding routines to work.

* Refactor to eliminate the if-else-else-else of constant folding
routine selection, in favor of a hash map acting as a dispatch table.

* Allow "local aliasing" within basic blocks -- previously, "A=const"
would remember that A had the constant value (so subsequent C=A*B, say,
would treat A as a constant) but only if the statement was in the global
scope and A was never subsequently reassigned. Now we can remember that
A has that value, but just for the basic block it's in and/or until it's
reassigned. This allows lots of additional constant folding. When each
basic block is done, its local aliases are forgotten.

* Some preliminary groundwork for tracking which params will be used by
downstream layers, so I can take advantage of that layer. (Apologies
for the incomplete nature of that.)

* Add -O0, -O1, -O2 optional flags to "testshade" to specify level of
runtime optimization.

* New statistics track how many actual instructions we execute over a
render, and how much total time we spend doing the runtime optimization.

* A couple minor renames and cleanups I promised Chris I'd do in the
last review, but somehow botched the checkin and didn't get them all
committed.

* IMPORTANT bug fix to the last checkin, in constfold_add -- lines 340
and 347 got the argument numbers mixed up, resulting in turning "A = B +
0" into "A=0" rather than "A=B". Oops!

So, the net result of all this is summarized by the following stats from
a production asset we like to use as a benchmark:

optimize=0 5:35 2046 MB peak 1,806,370,431 instructions run
optimize=1 4:32 1403 MB peak 1,046,991,997 instructions run

As for how we can reduce the number of instructions executed by almost
45% but only reduce runtime by 20%, let's just say there are other
bottlenecks in our renderer we're working on and interpreter overhead is
obviously not the major (or at least not the sole) bottleneck.


Please review this at http://codereview.appspot.com/639042/show

Affected files:
src/include/osl_pvt.h
src/liboslcomp/oslcomp.cpp
src/liboslexec/constantpool.h
src/liboslexec/context.cpp
src/liboslexec/exec.cpp
src/liboslexec/instance.cpp
src/liboslexec/oslexec_pvt.h
src/liboslexec/oslops.h
src/liboslexec/runtimeoptimize.cpp
src/liboslexec/shadingsys.cpp
src/testshade/testshade.cpp


larry...@...
 

So sorry, those stats I mentioned at the end are for optimize = 1 and 2,
respectively. (Not 0 and 1.)

http://codereview.appspot.com/639042/show


cku...@...
 

http://codereview.appspot.com/639042/diff/1/11
File src/liboslexec/runtimeoptimize.cpp (right):

http://codereview.appspot.com/639042/diff/1/11#newcode36
src/liboslexec/runtimeoptimize.cpp:36: #include "OpenImageIO/timer.h"
Shouldn't this be included with angle brackets?

http://codereview.appspot.com/639042/diff/1/11#newcode339
src/liboslexec/runtimeoptimize.cpp:339: // R = 0 + B => R = 0
comment is still wrong isn't it ?

http://codereview.appspot.com/639042/diff/1/11#newcode346
src/liboslexec/runtimeoptimize.cpp:346: // R = A + 0 => R = 0
This one too.

http://codereview.appspot.com/639042/diff/1/11#newcode351
src/liboslexec/runtimeoptimize.cpp:351: if (A.is_constant() &&
B.is_constant()) {
Should this be checked first?

If you have 0 + 0, the current code looks like it will need to make two
passes.

http://codereview.appspot.com/639042/diff/1/11#newcode383
src/liboslexec/runtimeoptimize.cpp:383: if (A.is_constant() &&
B.is_constant()) {
Same comment here (and for mul/div as well).

http://codereview.appspot.com/639042/diff/1/11#newcode1250
src/liboslexec/runtimeoptimize.cpp:1250: // Odd but common case: two
instructions in a row assign
Just curious - what leads to this?

http://codereview.appspot.com/639042/show


Larry Gritz <l...@...>
 

On Mar 23, 2010, at 7:54 AM, <cku...@...> <cku...@...> wrote:

src/liboslexec/runtimeoptimize.cpp:36: #include "OpenImageIO/timer.h"
Shouldn't this be included with angle brackets?
Yep, done.

http://codereview.appspot.com/639042/diff/1/11#newcode339
src/liboslexec/runtimeoptimize.cpp:339: // R = 0 + B => R = 0
comment is still wrong isn't it ?
src/liboslexec/runtimeoptimize.cpp:346: // R = A + 0 => R = 0
This one too.
Right you are. Bad cut and paste from multiply. Probably the same source of bug as the wrong result I fixed.


http://codereview.appspot.com/639042/diff/1/11#newcode351
src/liboslexec/runtimeoptimize.cpp:351: if (A.is_constant() &&
B.is_constant()) {
Should this be checked first?

If you have 0 + 0, the current code looks like it will need to make two
passes.

No. It first checks R=A+0 and R=0+A and in either case replaces it with R=A. (Even if A happens to be a constant and is zero.) Then it checks R=A+B if both A and B are constants, and if so, changes it to R=C where C=A+B.

You get the same result no matter which order you do the check. It would be correct the way you suggest, too, but it doesn't change the results or take any less work. If you prefer it stylistically the other way, I can do it. The code is simply in the order that I wrote it.


http://codereview.appspot.com/639042/diff/1/11#newcode1250
src/liboslexec/runtimeoptimize.cpp:1250: // Odd but common case: two
instructions in a row assign
Just curious - what leads to this?
Here's an example situation that would lead to it:

A = B;
if (cond) { // turns out to be false
C = A;
}
A = D;

It doesn't have to be a conditional -- all sorts of situations can lead to dead code being removed between two assignments to the same variable, so the first one is no longer needed. In theory, I could do more sophisticated live variable analysis, but when examining the post-optimized output instructions for our production shaders, this case popped up very frequently.

Here's another case:

A = A * B;
A = A + C;

If B==0, this would first reduce to

A = 0
A = C // because it knows A = 0

and then it would notice that the first assignment is no longer needed.

--
Larry Gritz
l...@...


cku...@...
 

Ah I see - that all makes sense.

LGTM!

http://codereview.appspot.com/639042/show