Review: LLVM optimization pass diddling (issue1696060)


cku...@...
 

LGTM, just a few small comments.


The level of compatibility between interpreter and LLVM will have to be
really high to pull off the delayed optimization well without obscure
bugs. Did you gather stats about shaders in typical scenes to see if we
have any shaders that are executed only a handful of times? What is your
intuition about the percentage of shaders that only get evaluated a few
times? If this percentage is low, there may not be much mileage to be
had from delaying optimization.


http://codereview.appspot.com/1696060/diff/1/6
File src/liboslexec/instance.cpp (right):

http://codereview.appspot.com/1696060/diff/1/6#newcode436
src/liboslexec/instance.cpp:436: m_executions = 0;

Why is this field initialized here instead of in the initialization list
with the other member variables?

http://codereview.appspot.com/1696060/diff/1/2
File src/liboslexec/llvm_instance.cpp (right):

http://codereview.appspot.com/1696060/diff/1/2#newcode3350
src/liboslexec/llvm_instance.cpp:3350: if (layer == (nlayers-1)) {

How about using a variable here to clarify the code:

bool do_interproc = layer == (nlayers-1);

http://codereview.appspot.com/1696060/show


Larry Gritz <l...@...>
 

On Aug 2, 2010, at 6:11 PM, <cku...@...> <cku...@...> wrote:

The level of compatibility between interpreter and LLVM will have to be
really high to pull off the delayed optimization well without obscure
bugs. Did you gather stats about shaders in typical scenes to see if we
have any shaders that are executed only a handful of times? What is your
intuition about the percentage of shaders that only get evaluated a few
times? If this percentage is low, there may not be much mileage to be
had from delaying optimization.
For exactly these reasons, I'm not having this reviewed yet. I don't know if the strategy is very helpful, not least because it requires the interpreter to run flawlessly and interchangeably. Though another strategy is to never rely on the interpreter, always JIT, but JIT quick and easy first, and then after enough runs JIT again, long and hard.

I haven't tested a wide range of scenes, but on 1/4 res Tweedle 10-25% of shader groups only dozens to hundreds of times, versus others that run tens of thousands of times. Maybe it's not worth extra mechanism to cut out perhaps 1/4 of the optimization time. But I'm worried about short frames (low-res test frames, etc.) for which perhaps most or all shader groups aren't run enough for the optimization to pay off. Currently, we spend about a minute on Tweedle. That's nothing for a long render, but what if there were 10x as many shader groups and it was just a quick lighting check? So I am very concerned about finding ways to cut out the optimization time.

I really have only three ideas for how to do it:

1. Keep randomly walking through optimization pass combinatorics and hope to find a sequence that is low overhead and still does a good job speeding up the code.

2. Only spend time optimizing the groups that will run enough for it to pay off.

3. Cache the post-optimized IR to disk for subsequent runs (and coincidentally identical shader groups within the same run). This will probably be tricky to determine absolutely positively that the cached IR on disk corresponds EXACTLY to the unoptimized one in memory. Exactly what do we hash to ensure that?

I have no other ideas at the moment. Anybody else?


src/liboslexec/instance.cpp:436: m_executions = 0;

Why is this field initialized here instead of in the initialization list
with the other member variables?
Because it's an atomic int. I'm not sure all our atomic implementations on all platforms support initialization, I don't think, but they all allow assignment.


src/liboslexec/llvm_instance.cpp:3350: if (layer == (nlayers-1)) {

How about using a variable here to clarify the code:

bool do_interproc = layer == (nlayers-1);
Sure, will do.


--
Larry Gritz
l...@...