Date
1 - 10 of 10
Rutime code optimization (issue448041)
larry...@...
Reviewers: ,
Description:
First stab at runtime optimization of OSL bytecode, based on the actual
instance parameters. This is a pretty big refactor, in that it moves
quite a bit of compiler infrastructure (like lifetime analysis and
temporary coalescing) to runtime rather than in oslc.
The basic idea is that we allow a hint in the shader parameter metadata,
[[ int lockgeom=1|0 ]] that says whether or not that parameter is
"locked" with respect to the geometry (if locked, its value cannot be
overridden by interpolated geometric primitive variables). This means
that by the time we shade, many of the parameters (that are not
connected or overridden on the geom) have known values, and we can
constant-fold the crap out of the code for each instance. For example,
multiplication by 1 is eliminated, "if (foo)" can end up removing large
swaths of code if foo's value is known, etc.
I've implemented only a very few obvious optimizations, but it's already
showing a 25% speedup overall in our renderer (even counting the extra
runtime we spend optimizing shader code). I expect this to improve
speed even more dramatically as I continue to add other optimizations as
well as make the analysis more sophisticated (for example, it currently
doesn't consider connections at all, even though it could often know the
connected value and treat that, too, as a constant).
I'm aware that many parts of this are rough around the edges. It's a
work in progress. But the basic infrastructure is working, and there's
a definite speed gain, so I'd like to commit this even though it's an
ongoing project. Then subsequent additional optimizations and
refactorings will be smaller changes that we can do incrementally.
There are new ShadingSystem options "lockgeom" that gives the default
value for "lockgeom" (so you can "opt in" or "opt out", depending on
your taste), and for the optimization level to perform at runtime.
Please review this at http://codereview.appspot.com/448041/show
Affected files:
src/include/osl_pvt.h
src/liboslcomp/codegen.cpp
src/liboslcomp/oslcomp.cpp
src/liboslcomp/oslcomp_pvt.h
src/liboslcomp/symtab.cpp
src/liboslcomp/symtab.h
src/liboslexec/CMakeLists.txt
src/liboslexec/context.cpp
src/liboslexec/exec.cpp
src/liboslexec/instance.cpp
src/liboslexec/loadshader.cpp
src/liboslexec/master.cpp
src/liboslexec/opstring.cpp
src/liboslexec/oslexec_pvt.h
src/liboslexec/osolex.l
src/liboslexec/runtimeoptimize.cpp
src/liboslexec/shadingsys.cpp
src/testshade/testshade.cpp
Description:
First stab at runtime optimization of OSL bytecode, based on the actual
instance parameters. This is a pretty big refactor, in that it moves
quite a bit of compiler infrastructure (like lifetime analysis and
temporary coalescing) to runtime rather than in oslc.
The basic idea is that we allow a hint in the shader parameter metadata,
[[ int lockgeom=1|0 ]] that says whether or not that parameter is
"locked" with respect to the geometry (if locked, its value cannot be
overridden by interpolated geometric primitive variables). This means
that by the time we shade, many of the parameters (that are not
connected or overridden on the geom) have known values, and we can
constant-fold the crap out of the code for each instance. For example,
multiplication by 1 is eliminated, "if (foo)" can end up removing large
swaths of code if foo's value is known, etc.
I've implemented only a very few obvious optimizations, but it's already
showing a 25% speedup overall in our renderer (even counting the extra
runtime we spend optimizing shader code). I expect this to improve
speed even more dramatically as I continue to add other optimizations as
well as make the analysis more sophisticated (for example, it currently
doesn't consider connections at all, even though it could often know the
connected value and treat that, too, as a constant).
I'm aware that many parts of this are rough around the edges. It's a
work in progress. But the basic infrastructure is working, and there's
a definite speed gain, so I'd like to commit this even though it's an
ongoing project. Then subsequent additional optimizations and
refactorings will be smaller changes that we can do incrementally.
There are new ShadingSystem options "lockgeom" that gives the default
value for "lockgeom" (so you can "opt in" or "opt out", depending on
your taste), and for the optimization level to perform at runtime.
Please review this at http://codereview.appspot.com/448041/show
Affected files:
src/include/osl_pvt.h
src/liboslcomp/codegen.cpp
src/liboslcomp/oslcomp.cpp
src/liboslcomp/oslcomp_pvt.h
src/liboslcomp/symtab.cpp
src/liboslcomp/symtab.h
src/liboslexec/CMakeLists.txt
src/liboslexec/context.cpp
src/liboslexec/exec.cpp
src/liboslexec/instance.cpp
src/liboslexec/loadshader.cpp
src/liboslexec/master.cpp
src/liboslexec/opstring.cpp
src/liboslexec/oslexec_pvt.h
src/liboslexec/osolex.l
src/liboslexec/runtimeoptimize.cpp
src/liboslexec/shadingsys.cpp
src/testshade/testshade.cpp
cku...@...
http://codereview.appspot.com/448041/diff/3001/4001
File src/include/osl_pvt.h (right):
http://codereview.appspot.com/448041/diff/3001/4001#newcode632
src/include/osl_pvt.h:632: bool is_const () const { return symtype() ==
SymTypeConst; }
I think is_constant() and SymTypeConstant would be more clear
It would avoid confusion with the C++ meaning of const.
http://codereview.appspot.com/448041/diff/3001/4001#newcode687
src/include/osl_pvt.h:687: void reset (ustring opname, OpImpl impl,
size_t nargs) {
Why is this called reset? The function looks like a "set" to me.
http://codereview.appspot.com/448041/diff/3001/4001#newcode817
src/include/osl_pvt.h:817: unsigned int argread_bits () const { return
m_argread; }
Why do the raw bits need to get exposed? Its leaking an implementation
detail ...
http://codereview.appspot.com/448041/diff/3001/4006
File src/liboslcomp/oslcomp.cpp (right):
http://codereview.appspot.com/448041/diff/3001/4006#newcode779
src/liboslcomp/oslcomp.cpp:779: // For each op, mark its arguments as
being used at that op
This comment doesn't seem to match what it looks like this function is
doing.
http://codereview.appspot.com/448041/diff/3001/4016
File src/liboslexec/CMakeLists.txt (right):
http://codereview.appspot.com/448041/diff/3001/4016#newcode18
src/liboslexec/CMakeLists.txt:18: ../liboslcomp/ast.cpp
../liboslcomp/codegen.cpp
This is ugly. Can we just merge the libs? It doesn't seem like we are
gaining anything by keeping them separate.
http://codereview.appspot.com/448041/diff/3001/4014
File src/liboslexec/exec.cpp (right):
http://codereview.appspot.com/448041/diff/3001/4014#newcode374
src/liboslexec/exec.cpp:374: sym.initialized (true); //FIXME --
shouldn't be necessary utl_spi_float_v1
Can you elaborate on this a little?
http://codereview.appspot.com/448041/diff/3001/4009
File src/liboslexec/instance.cpp (right):
http://codereview.appspot.com/448041/diff/3001/4009#newcode158
src/liboslexec/instance.cpp:158: spin_lock lock (heap_size_mutex);
Do we need this lock anymore?
This only happens once, and so does shader-specialization. Can't we fold
the two steps into the same "pre-process" stage?
http://codereview.appspot.com/448041/diff/3001/4009#newcode316
src/liboslexec/instance.cpp:316: m_instsymbols.reserve
(m_instsymbols.size() + moresyms + 10);
Why +10 ?
http://codereview.appspot.com/448041/diff/3001/4009#newcode322
src/liboslexec/instance.cpp:322: ShaderInstance::print ()
Doesn't this code already exist somewhere else for .oso file output?
Could that be re-used or do we really need this new method?
http://codereview.appspot.com/448041/show
File src/include/osl_pvt.h (right):
http://codereview.appspot.com/448041/diff/3001/4001#newcode632
src/include/osl_pvt.h:632: bool is_const () const { return symtype() ==
SymTypeConst; }
I think is_constant() and SymTypeConstant would be more clear
It would avoid confusion with the C++ meaning of const.
http://codereview.appspot.com/448041/diff/3001/4001#newcode687
src/include/osl_pvt.h:687: void reset (ustring opname, OpImpl impl,
size_t nargs) {
Why is this called reset? The function looks like a "set" to me.
http://codereview.appspot.com/448041/diff/3001/4001#newcode817
src/include/osl_pvt.h:817: unsigned int argread_bits () const { return
m_argread; }
Why do the raw bits need to get exposed? Its leaking an implementation
detail ...
http://codereview.appspot.com/448041/diff/3001/4006
File src/liboslcomp/oslcomp.cpp (right):
http://codereview.appspot.com/448041/diff/3001/4006#newcode779
src/liboslcomp/oslcomp.cpp:779: // For each op, mark its arguments as
being used at that op
This comment doesn't seem to match what it looks like this function is
doing.
http://codereview.appspot.com/448041/diff/3001/4016
File src/liboslexec/CMakeLists.txt (right):
http://codereview.appspot.com/448041/diff/3001/4016#newcode18
src/liboslexec/CMakeLists.txt:18: ../liboslcomp/ast.cpp
../liboslcomp/codegen.cpp
This is ugly. Can we just merge the libs? It doesn't seem like we are
gaining anything by keeping them separate.
http://codereview.appspot.com/448041/diff/3001/4014
File src/liboslexec/exec.cpp (right):
http://codereview.appspot.com/448041/diff/3001/4014#newcode374
src/liboslexec/exec.cpp:374: sym.initialized (true); //FIXME --
shouldn't be necessary utl_spi_float_v1
Can you elaborate on this a little?
http://codereview.appspot.com/448041/diff/3001/4009
File src/liboslexec/instance.cpp (right):
http://codereview.appspot.com/448041/diff/3001/4009#newcode158
src/liboslexec/instance.cpp:158: spin_lock lock (heap_size_mutex);
Do we need this lock anymore?
This only happens once, and so does shader-specialization. Can't we fold
the two steps into the same "pre-process" stage?
http://codereview.appspot.com/448041/diff/3001/4009#newcode316
src/liboslexec/instance.cpp:316: m_instsymbols.reserve
(m_instsymbols.size() + moresyms + 10);
Why +10 ?
http://codereview.appspot.com/448041/diff/3001/4009#newcode322
src/liboslexec/instance.cpp:322: ShaderInstance::print ()
Doesn't this code already exist somewhere else for .oso file output?
Could that be re-used or do we really need this new method?
http://codereview.appspot.com/448041/show
cku...@...
http://codereview.appspot.com/448041/diff/3001/4001
File src/include/osl_pvt.h (right):
http://codereview.appspot.com/448041/diff/3001/4001#newcode817
src/include/osl_pvt.h:817: unsigned int argread_bits () const { return
m_argread; }
On 2010/03/12 23:58:30, ckulla wrote:
suggest a better name ...
http://codereview.appspot.com/448041/show
File src/include/osl_pvt.h (right):
http://codereview.appspot.com/448041/diff/3001/4001#newcode817
src/include/osl_pvt.h:817: unsigned int argread_bits () const { return
m_argread; }
On 2010/03/12 23:58:30, ckulla wrote:
Why do the raw bits need to get exposed? Its leaking an implementationdetail
...Nevermind this comment - seeing how its used in practice, I can't really
suggest a better name ...
http://codereview.appspot.com/448041/show
Larry Gritz <l...@...>
On Mar 12, 2010, at 3:58 PM, <cku...@...> wrote:
--
Larry Gritz
l...@...
It's been SymTypeConst for a long time. If you feel strongly about it, I'll change it, but let me address that in a different checkin, it's all over the place. I'll change is_constant now, though.
http://codereview.appspot.com/448041/diff/3001/4001
File src/include/osl_pvt.h (right):
http://codereview.appspot.com/448041/diff/3001/4001#newcode632
src/include/osl_pvt.h:632: bool is_const () const { return symtype() ==
SymTypeConst; }
I think is_constant() and SymTypeConstant would be more clear
It would avoid confusion with the C++ meaning of const.
http://codereview.appspot.com/448041/diff/3001/4001#newcode687It doesn't set everything. "Reset" implies that it has been set, and we are now adjusting what was there. It would not be wise to call this routine without its being correctly set up once.
src/include/osl_pvt.h:687: void reset (ustring opname, OpImpl impl,
size_t nargs) {
Why is this called reset? The function looks like a "set" to me.
http://codereview.appspot.com/448041/diff/3001/4006#newcode779Bad cut and paste. Thanks, will fix.
src/liboslcomp/oslcomp.cpp:779: // For each op, mark its arguments as
being used at that op
This comment doesn't seem to match what it looks like this function is
doing.
http://codereview.appspot.com/448041/diff/3001/4016#newcode18That's probably the right thing to do. Can we save for a separate review/checkin?
src/liboslexec/CMakeLists.txt:18: ../liboslcomp/ast.cpp
../liboslcomp/codegen.cpp
This is ugly. Can we just merge the libs? It doesn't seem like we are
gaining anything by keeping them separate.
http://codereview.appspot.com/448041/diff/3001/4014#newcode374Sorry, the comment was a note to myself, no longer relevant. Will remove.
src/liboslexec/exec.cpp:374: sym.initialized (true); //FIXME --
shouldn't be necessary utl_spi_float_v1
Can you elaborate on this a little?
http://codereview.appspot.com/448041/diff/3001/4009#newcode158Yeah. Let me do that in the next refactor.
src/liboslexec/instance.cpp:158: spin_lock lock (heap_size_mutex);
Do we need this lock anymore?
This only happens once, and so does shader-specialization. Can't we fold
the two steps into the same "pre-process" stage?
http://codereview.appspot.com/448041/diff/3001/4009#newcode316It'll need to add a symbol from time to time, probably a few for each shader. I don't want to just push_back, because then it'll DOUBLE the allocation, ick. I wanted to grow by a measured amount so it usually only has to realloc once-ish per shader, but not make too much room.
src/liboslexec/instance.cpp:316: m_instsymbols.reserve
(m_instsymbols.size() + moresyms + 10);
Why +10 ?
http://codereview.appspot.com/448041/diff/3001/4009#newcode322Something similar lives in the compiler, though it does some different things. Yeah, maybe they could be merged at some point. This is just to help me debug for now.
src/liboslexec/instance.cpp:322: ShaderInstance::print ()
Doesn't this code already exist somewhere else for .oso file output?
Could that be re-used or do we really need this new method?
--
Larry Gritz
l...@...
cku...@...
It's been SymTypeConst for a long time. If you feel strongly aboutit, I'll
change it, but let me address that in a different checkin, it's allover the
place. I'll change is_constant now, though.
I prefer SymTypeConstant - but that's fine for a future checkin.
It doesn't set everything. "Reset" implies that it has been set, andwe are now
adjusting what was there. It would not be wise to call this routinewithout its
being correctly set up once.
Yep - reset() is fine.
arehttp://codereview.appspot.com/448041/diff/3001/4016#newcode18
src/liboslexec/CMakeLists.txt:18: ../liboslcomp/ast.cpp
../liboslcomp/codegen.cpp
This is ugly. Can we just merge the libs? It doesn't seem like we
gaining anything by keeping them separate.
That's probably the right thing to do. Can we save for a separate
review/checkin?
Sure - this might mean adjustments (simplifications) on the renderer
build / release side too.
http://codereview.appspot.com/448041/diff/3001/4014#newcode374
src/liboslexec/exec.cpp:374: sym.initialized (true); //FIXME --
shouldn't be necessary utl_spi_float_v1
Can you elaborate on this a little?
Sorry, the comment was a note to myself, no longer relevant. Willremove.
Which part is irrelevant? The comment or moving the setting of
"initialized" to the top of the function? The symbol gets flagged again
at the end of the function. Only once should be needed.
http://codereview.appspot.com/448041/diff/3001/4009#newcode316
src/liboslexec/instance.cpp:316: m_instsymbols.reserve
(m_instsymbols.size() + moresyms + 10);
Why +10 ?
It'll need to add a symbol from time to time, probably a few for eachshader. I
don't want to just push_back, because then it'll DOUBLE theallocation, ick. I
wanted to grow by a measured amount so it usually only has to realloconce-ish
per shader, but not make too much room.
Which operation add symbols?
Do you "shrink-to-fit" the vector after the optimization passes are
done?
http://codereview.appspot.com/448041/show
Larry Gritz <l...@...>
On Mar 12, 2010, at 4:55 PM, <cku...@...> wrote:
--
Larry Gritz
l...@...
I want to keep it at the top, that prevents any unintended recursion if init ops somehow think they need to modify the value. Will remove the other one at the bottom.http://codereview.appspot.com/448041/diff/3001/4014#newcode374
src/liboslexec/exec.cpp:374: sym.initialized (true); //FIXME --
shouldn't be necessary utl_spi_float_v1
Can you elaborate on this a little?Sorry, the comment was a note to myself, no longer relevant. Willremove.
Which part is irrelevant? The comment or moving the setting of
"initialized" to the top of the function? The symbol gets flagged again
at the end of the function. Only once should be needed.
Any of the constant folding ones may need to add syms (new constant syms).It'll need to add a symbol from time to time, probably a few for eachshader. Idon't want to just push_back, because then it'll DOUBLE theallocation, ick. Iwanted to grow by a measured amount so it usually only has to realloconce-ishper shader, but not make too much room.
Which operation add symbols?
Do you "shrink-to-fit" the vector after the optimization passes areYes, after.
done?
--
Larry Gritz
l...@...
Larry Gritz <l...@...>
On Mar 12, 2010, at 5:09 PM, Larry Gritz wrote:
--
Larry Gritz
l...@...
On Mar 12, 2010, at 4:55 PM, <cku...@...> wrote:To elaborate further: correct code generation shouldn't yield this case. Every once in a while, I tweak something wrong and ended up with a recursive case. It's technically wrong but takes FOREVER to debug because of the weird failure mode (stack overflow and resulting chaos nowhere near the actual problem). Removing that failure mode helps me keep sane. I don't see how setting 'initialized=true' at the top can harm anything.I want to keep it at the top, that prevents any unintended recursion if init ops somehow think they need to modify the value. Will remove the other one at the bottom.http://codereview.appspot.com/448041/diff/3001/4014#newcode374
src/liboslexec/exec.cpp:374: sym.initialized (true); //FIXME --
shouldn't be necessary utl_spi_float_v1
Can you elaborate on this a little?Sorry, the comment was a note to myself, no longer relevant. Willremove.
Which part is irrelevant? The comment or moving the setting of
"initialized" to the top of the function? The symbol gets flagged again
at the end of the function. Only once should be needed.
--
Larry Gritz
l...@...
cku...@...
To elaborate further: correct code generation shouldn't yield thiscase. Every
once in a while, I tweak something wrong and ended up with a recursivecase.
It's technically wrong but takes FOREVER to debug because of the weirdfailure
mode (stack overflow and resulting chaos nowhere near the actualproblem).
Removing that failure mode helps me keep sane. I don't see howsetting
'initialized=true' at the top can harm anything.
Makes sense - now it should hit the assert right away if you hit that
kind of a bug.
http://codereview.appspot.com/448041/show
larry...@...
So... where are we? LGTY, or are there additional concerns for this
first checkin?
http://codereview.appspot.com/448041/show
first checkin?
http://codereview.appspot.com/448041/show
cku...@...
On 2010/03/13 07:44:31, larrygritz wrote:
http://codereview.appspot.com/448041/show
So... where are we? LGTY, or are there additional concerns for thisfirst
checkin?LGTM!
http://codereview.appspot.com/448041/show