Re: run flags vs active intervals

Larry Gritz <l...@...>

The most important data point is that runflags are a big loser for the vast majority of run states that we see. So moving to one of the others (almost certainly indices) is at the top of my plate.

By the way, when we first opened OSL a few weeks back, we noted that we were working hard on optimization, still being 5-10x slower than our old C shaders. We've been working with a target production frame that was a bit of a worst-case scenario, which was slightly more that 15x slower than the equivalent frame with our old C shaders as of 3 weeks ago. At the beginning of this week, we had closed the gap on this scene to 3.5x (not counting changes from this week). Which we think means that less worst-case scenarios are probably getting very close to parity, and we are still working on more optimization, as you can see from this thread.

Some of that speedup was due to changes on the OSL side (all of which you have seen played out in reviews and checkins), but the lion's share was due to changing the renderer to batch the shading requests, including secondary rays. There are two big takeaways for you, dear readers:

1. Tales of OSL's inherent slowness where highly exaggerated. All along, most of the performance problem was due to our renderer's habit of shading one point at a time (despite liboslexec's being designed around batches). When we finally batched the points (especially secondary rays, even when we could only batch a few), and rewrote our integrator to be batch-oriented, we squeezed out a factor of 5 quite easily.

2. For those of you integrating OSL into other renderers, please learn from our mistake and use batches from the start!

-- lg

On Feb 4, 2010, at 11:42 AM, Larry Gritz wrote:

On Feb 4, 2010, at 11:32 AM, Christopher Kulla wrote:

My interpretation of these results is that indices always win - unless
we have a single span.
Spans only beat indices if there are is a single large span of 1's, or a large batch that is almost all 1's. (That's most primary batches, but nothing else.)

It might be worth using indices but having an
"all_points_sequential" optimization for simple ops where the
indirection overhead would be noticeable (all the templated ops can
get this for free)
That may be a good compromise. I think indices are a clear winner in all other cases, so perhaps this will eliminate the overhead in the few cases that spans seem to win, and then we're golden.

By the way - I don't really understand how you got any runflag
sequences that started with 0. Did you always dump out "npoints" or
only "begin/end" ?
I dumped out "npoints" but for the index test, only loop on [begin,end), as we do in the real shadeops. So there is no penalty for strings of 0's on either end. So primary batches (or conditional modifications to primary batches) can start with 0. Non-conditional secondary batches always start with 1 (because of the way our renderer sends them to OSL).
Larry Gritz

Join to automatically receive all group messages.