Review: more runtime optimizations (issue833046)


larry...@...
 

Reviewers: ,

Description:
This round:

* Constant assignment where the types didn't match (like R_float =
A_int_const) were previously a dead end, since it couldn't directly
alias R to A. But now we coerce constants at runtime, so, for example,
that would turn into R_float = A_float_const, which can then lead to
further optimizations.

* Output params assigned constants unconditionally will alias just like
regular params, and furthermore, if they are only written that one time
and not previously read, the assignment is removed entirely and the
default value of the output param changed to its eventual final value.

* ANY ops, not just assignment, and even if not involving constants, are
removed if none of their written arguments are ever used (because other
optimizations have removed them).

* I found that sometimes it's helpful to do a few more passes even
BEYOND when a pass is completed that appears to not have changed any
ops. This is because even without changing ops, it may propagate some
aliasing that can lead to more optimization in later passes.

* Remove 'if' statements (even if not constant) if they are empty (which
can happen if their entire contents, but not the 'if' itself, has been
removed by other optimizations).

* Locals or temps that are assigned but never read (effectively dead)
turn the assignment into a nop.

Also some miscellaneous refactoring:

* The status messages that say how the optimization is going now happens
per group (network), rather than per instance. Same basic information,
less clutter.

* Split collapse into separate ops and syms collapse routines.

* Split the collapsing out into a separate loop over the shader group
(rather than for each instance right after optimizing it) as a precursor
to a future round of "inter-group optimization".


These things all together reduce the number of instructions executed at
runtime by almost an additional 20% (compared to the runtime
optimizations we were already doing). The turns out to reduce runtime
in our renderer by another 10% for production frames.



Please review this at http://codereview.appspot.com/833046/show

Affected files:
src/include/osl_pvt.h
src/liboslcomp/oslcomp.cpp
src/liboslcomp/oslcomp_pvt.h
src/liboslexec/context.cpp
src/liboslexec/oslexec_pvt.h
src/liboslexec/runtimeoptimize.cpp


cku...@...