Rutime code optimization (issue448041)


larry...@...
 

Reviewers: ,

Description:
First stab at runtime optimization of OSL bytecode, based on the actual
instance parameters. This is a pretty big refactor, in that it moves
quite a bit of compiler infrastructure (like lifetime analysis and
temporary coalescing) to runtime rather than in oslc.

The basic idea is that we allow a hint in the shader parameter metadata,
[[ int lockgeom=1|0 ]] that says whether or not that parameter is
"locked" with respect to the geometry (if locked, its value cannot be
overridden by interpolated geometric primitive variables). This means
that by the time we shade, many of the parameters (that are not
connected or overridden on the geom) have known values, and we can
constant-fold the crap out of the code for each instance. For example,
multiplication by 1 is eliminated, "if (foo)" can end up removing large
swaths of code if foo's value is known, etc.

I've implemented only a very few obvious optimizations, but it's already
showing a 25% speedup overall in our renderer (even counting the extra
runtime we spend optimizing shader code). I expect this to improve
speed even more dramatically as I continue to add other optimizations as
well as make the analysis more sophisticated (for example, it currently
doesn't consider connections at all, even though it could often know the
connected value and treat that, too, as a constant).

I'm aware that many parts of this are rough around the edges. It's a
work in progress. But the basic infrastructure is working, and there's
a definite speed gain, so I'd like to commit this even though it's an
ongoing project. Then subsequent additional optimizations and
refactorings will be smaller changes that we can do incrementally.

There are new ShadingSystem options "lockgeom" that gives the default
value for "lockgeom" (so you can "opt in" or "opt out", depending on
your taste), and for the optimization level to perform at runtime.


Please review this at http://codereview.appspot.com/448041/show

Affected files:
src/include/osl_pvt.h
src/liboslcomp/codegen.cpp
src/liboslcomp/oslcomp.cpp
src/liboslcomp/oslcomp_pvt.h
src/liboslcomp/symtab.cpp
src/liboslcomp/symtab.h
src/liboslexec/CMakeLists.txt
src/liboslexec/context.cpp
src/liboslexec/exec.cpp
src/liboslexec/instance.cpp
src/liboslexec/loadshader.cpp
src/liboslexec/master.cpp
src/liboslexec/opstring.cpp
src/liboslexec/oslexec_pvt.h
src/liboslexec/osolex.l
src/liboslexec/runtimeoptimize.cpp
src/liboslexec/shadingsys.cpp
src/testshade/testshade.cpp


cku...@...
 

http://codereview.appspot.com/448041/diff/3001/4001
File src/include/osl_pvt.h (right):

http://codereview.appspot.com/448041/diff/3001/4001#newcode632
src/include/osl_pvt.h:632: bool is_const () const { return symtype() ==
SymTypeConst; }
I think is_constant() and SymTypeConstant would be more clear

It would avoid confusion with the C++ meaning of const.

http://codereview.appspot.com/448041/diff/3001/4001#newcode687
src/include/osl_pvt.h:687: void reset (ustring opname, OpImpl impl,
size_t nargs) {
Why is this called reset? The function looks like a "set" to me.

http://codereview.appspot.com/448041/diff/3001/4001#newcode817
src/include/osl_pvt.h:817: unsigned int argread_bits () const { return
m_argread; }
Why do the raw bits need to get exposed? Its leaking an implementation
detail ...

http://codereview.appspot.com/448041/diff/3001/4006
File src/liboslcomp/oslcomp.cpp (right):

http://codereview.appspot.com/448041/diff/3001/4006#newcode779
src/liboslcomp/oslcomp.cpp:779: // For each op, mark its arguments as
being used at that op
This comment doesn't seem to match what it looks like this function is
doing.

http://codereview.appspot.com/448041/diff/3001/4016
File src/liboslexec/CMakeLists.txt (right):

http://codereview.appspot.com/448041/diff/3001/4016#newcode18
src/liboslexec/CMakeLists.txt:18: ../liboslcomp/ast.cpp
../liboslcomp/codegen.cpp
This is ugly. Can we just merge the libs? It doesn't seem like we are
gaining anything by keeping them separate.

http://codereview.appspot.com/448041/diff/3001/4014
File src/liboslexec/exec.cpp (right):

http://codereview.appspot.com/448041/diff/3001/4014#newcode374
src/liboslexec/exec.cpp:374: sym.initialized (true); //FIXME --
shouldn't be necessary utl_spi_float_v1
Can you elaborate on this a little?

http://codereview.appspot.com/448041/diff/3001/4009
File src/liboslexec/instance.cpp (right):

http://codereview.appspot.com/448041/diff/3001/4009#newcode158
src/liboslexec/instance.cpp:158: spin_lock lock (heap_size_mutex);
Do we need this lock anymore?

This only happens once, and so does shader-specialization. Can't we fold
the two steps into the same "pre-process" stage?

http://codereview.appspot.com/448041/diff/3001/4009#newcode316
src/liboslexec/instance.cpp:316: m_instsymbols.reserve
(m_instsymbols.size() + moresyms + 10);
Why +10 ?

http://codereview.appspot.com/448041/diff/3001/4009#newcode322
src/liboslexec/instance.cpp:322: ShaderInstance::print ()
Doesn't this code already exist somewhere else for .oso file output?

Could that be re-used or do we really need this new method?

http://codereview.appspot.com/448041/show


cku...@...
 

http://codereview.appspot.com/448041/diff/3001/4001
File src/include/osl_pvt.h (right):

http://codereview.appspot.com/448041/diff/3001/4001#newcode817
src/include/osl_pvt.h:817: unsigned int argread_bits () const { return
m_argread; }
On 2010/03/12 23:58:30, ckulla wrote:
Why do the raw bits need to get exposed? Its leaking an implementation
detail
...
Nevermind this comment - seeing how its used in practice, I can't really
suggest a better name ...

http://codereview.appspot.com/448041/show


Larry Gritz <l...@...>
 

On Mar 12, 2010, at 3:58 PM, <cku...@...> wrote:

http://codereview.appspot.com/448041/diff/3001/4001
File src/include/osl_pvt.h (right):

http://codereview.appspot.com/448041/diff/3001/4001#newcode632
src/include/osl_pvt.h:632: bool is_const () const { return symtype() ==
SymTypeConst; }
I think is_constant() and SymTypeConstant would be more clear

It would avoid confusion with the C++ meaning of const.
It's been SymTypeConst for a long time. If you feel strongly about it, I'll change it, but let me address that in a different checkin, it's all over the place. I'll change is_constant now, though.


http://codereview.appspot.com/448041/diff/3001/4001#newcode687
src/include/osl_pvt.h:687: void reset (ustring opname, OpImpl impl,
size_t nargs) {
Why is this called reset? The function looks like a "set" to me.
It doesn't set everything. "Reset" implies that it has been set, and we are now adjusting what was there. It would not be wise to call this routine without its being correctly set up once.


http://codereview.appspot.com/448041/diff/3001/4006#newcode779
src/liboslcomp/oslcomp.cpp:779: // For each op, mark its arguments as
being used at that op
This comment doesn't seem to match what it looks like this function is
doing.
Bad cut and paste. Thanks, will fix.


http://codereview.appspot.com/448041/diff/3001/4016#newcode18
src/liboslexec/CMakeLists.txt:18: ../liboslcomp/ast.cpp
../liboslcomp/codegen.cpp
This is ugly. Can we just merge the libs? It doesn't seem like we are
gaining anything by keeping them separate.
That's probably the right thing to do. Can we save for a separate review/checkin?


http://codereview.appspot.com/448041/diff/3001/4014#newcode374
src/liboslexec/exec.cpp:374: sym.initialized (true); //FIXME --
shouldn't be necessary utl_spi_float_v1
Can you elaborate on this a little?
Sorry, the comment was a note to myself, no longer relevant. Will remove.


http://codereview.appspot.com/448041/diff/3001/4009#newcode158
src/liboslexec/instance.cpp:158: spin_lock lock (heap_size_mutex);
Do we need this lock anymore?

This only happens once, and so does shader-specialization. Can't we fold
the two steps into the same "pre-process" stage?
Yeah. Let me do that in the next refactor.


http://codereview.appspot.com/448041/diff/3001/4009#newcode316
src/liboslexec/instance.cpp:316: m_instsymbols.reserve
(m_instsymbols.size() + moresyms + 10);
Why +10 ?
It'll need to add a symbol from time to time, probably a few for each shader. I don't want to just push_back, because then it'll DOUBLE the allocation, ick. I wanted to grow by a measured amount so it usually only has to realloc once-ish per shader, but not make too much room.


http://codereview.appspot.com/448041/diff/3001/4009#newcode322
src/liboslexec/instance.cpp:322: ShaderInstance::print ()
Doesn't this code already exist somewhere else for .oso file output?

Could that be re-used or do we really need this new method?
Something similar lives in the compiler, though it does some different things. Yeah, maybe they could be merged at some point. This is just to help me debug for now.

--
Larry Gritz
l...@...


cku...@...
 

It's been SymTypeConst for a long time. If you feel strongly about
it, I'll
change it, but let me address that in a different checkin, it's all
over the
place. I'll change is_constant now, though.


I prefer SymTypeConstant - but that's fine for a future checkin.





It doesn't set everything. "Reset" implies that it has been set, and
we are now
adjusting what was there. It would not be wise to call this routine
without its
being correctly set up once.

Yep - reset() is fine.




http://codereview.appspot.com/448041/diff/3001/4016#newcode18
src/liboslexec/CMakeLists.txt:18: ../liboslcomp/ast.cpp
../liboslcomp/codegen.cpp
This is ugly. Can we just merge the libs? It doesn't seem like we
are
gaining anything by keeping them separate.
That's probably the right thing to do. Can we save for a separate
review/checkin?


Sure - this might mean adjustments (simplifications) on the renderer
build / release side too.



http://codereview.appspot.com/448041/diff/3001/4014#newcode374
src/liboslexec/exec.cpp:374: sym.initialized (true); //FIXME --
shouldn't be necessary utl_spi_float_v1
Can you elaborate on this a little?
Sorry, the comment was a note to myself, no longer relevant. Will
remove.


Which part is irrelevant? The comment or moving the setting of
"initialized" to the top of the function? The symbol gets flagged again
at the end of the function. Only once should be needed.




http://codereview.appspot.com/448041/diff/3001/4009#newcode316
src/liboslexec/instance.cpp:316: m_instsymbols.reserve
(m_instsymbols.size() + moresyms + 10);
Why +10 ?
It'll need to add a symbol from time to time, probably a few for each
shader. I
don't want to just push_back, because then it'll DOUBLE the
allocation, ick. I
wanted to grow by a measured amount so it usually only has to realloc
once-ish
per shader, but not make too much room.

Which operation add symbols?

Do you "shrink-to-fit" the vector after the optimization passes are
done?

http://codereview.appspot.com/448041/show


Larry Gritz <l...@...>
 

On Mar 12, 2010, at 4:55 PM, <cku...@...> wrote:
http://codereview.appspot.com/448041/diff/3001/4014#newcode374
src/liboslexec/exec.cpp:374: sym.initialized (true); //FIXME --
shouldn't be necessary utl_spi_float_v1
Can you elaborate on this a little?
Sorry, the comment was a note to myself, no longer relevant. Will
remove.


Which part is irrelevant? The comment or moving the setting of
"initialized" to the top of the function? The symbol gets flagged again
at the end of the function. Only once should be needed.
I want to keep it at the top, that prevents any unintended recursion if init ops somehow think they need to modify the value. Will remove the other one at the bottom.


It'll need to add a symbol from time to time, probably a few for each
shader. I
don't want to just push_back, because then it'll DOUBLE the
allocation, ick. I
wanted to grow by a measured amount so it usually only has to realloc
once-ish
per shader, but not make too much room.

Which operation add symbols?
Any of the constant folding ones may need to add syms (new constant syms).


Do you "shrink-to-fit" the vector after the optimization passes are
done?
Yes, after.


--
Larry Gritz
l...@...


Larry Gritz <l...@...>
 

On Mar 12, 2010, at 5:09 PM, Larry Gritz wrote:

On Mar 12, 2010, at 4:55 PM, <cku...@...> wrote:
http://codereview.appspot.com/448041/diff/3001/4014#newcode374
src/liboslexec/exec.cpp:374: sym.initialized (true); //FIXME --
shouldn't be necessary utl_spi_float_v1
Can you elaborate on this a little?
Sorry, the comment was a note to myself, no longer relevant. Will
remove.


Which part is irrelevant? The comment or moving the setting of
"initialized" to the top of the function? The symbol gets flagged again
at the end of the function. Only once should be needed.
I want to keep it at the top, that prevents any unintended recursion if init ops somehow think they need to modify the value. Will remove the other one at the bottom.
To elaborate further: correct code generation shouldn't yield this case. Every once in a while, I tweak something wrong and ended up with a recursive case. It's technically wrong but takes FOREVER to debug because of the weird failure mode (stack overflow and resulting chaos nowhere near the actual problem). Removing that failure mode helps me keep sane. I don't see how setting 'initialized=true' at the top can harm anything.

--
Larry Gritz
l...@...


cku...@...
 

To elaborate further: correct code generation shouldn't yield this
case. Every
once in a while, I tweak something wrong and ended up with a recursive
case.
It's technically wrong but takes FOREVER to debug because of the weird
failure
mode (stack overflow and resulting chaos nowhere near the actual
problem).
Removing that failure mode helps me keep sane. I don't see how
setting
'initialized=true' at the top can harm anything.


Makes sense - now it should hit the assert right away if you hit that
kind of a bug.



http://codereview.appspot.com/448041/show


larry...@...
 

So... where are we? LGTY, or are there additional concerns for this
first checkin?


http://codereview.appspot.com/448041/show


cku...@...
 

On 2010/03/13 07:44:31, larrygritz wrote:
So... where are we? LGTY, or are there additional concerns for this
first
checkin?
LGTM!

http://codereview.appspot.com/448041/show