Re: run flags vs active intervals


Chris Foster <chri...@...>
 

On Sat, Jan 23, 2010 at 2:19 PM, Wormszer <worm...@...> wrote:


Not unrolling, vectorizing - the way I wrote the iterator appears to
prevent
the compiler vectorizing the loop using SSE.
I guess i was thinking of the vectorization being a type of unrolling, not
really in the correct sense i guess.
Fair enough. Vectorization does imply unrolling the loop (by a factor of
4 for SSE), though loop unrolling can sometimes be a useful optimization
without hardware vectorization.

The active index idea does permit loop unrolling, but not necessarily
vectorization.

If hardware (SSE) vectorization isn't going to be on the cards for most
operations, I think the active index method is looking like a winner.
Generally speaking it seems to have more reliable performance
characteristics,
especially in the face of incoherence.
As for this and the rest, I don't know enough about the system yet and how
it actually works. I was basing it more on your test code and some of the
earlier discussion on SIMD shaders.
After looking at your numbers more if the only case that performed better
was the all on, because of the vectorization. Then monitoring and predicting
wouldn't help.
Because it would just be a simple check # = N. And from your example i was
thinking might have two code paths.
So for your add it would be like

if(nActive==nTotal)  //vectorized path
    for (int j = 0; j < nActive; ++j) {
        c[j] = a[j] + b[j];
    }
else //non-vectorized path
    for (int j = 0; j < nActive; ++j) {
        int i = activeIndex[j];
        c[i] = a[i] + b[i];
    }
Yeah. TBH I haven't studied the code enough to know whether the
vectorized path can be realized using the OSL data structures. If the
arrays are actually given as VaryingRefs then any vectorization attempt
is likely to be dead in the water since VaryingRef has a stride which is
determined at runtime.

~Chris

Join osl-dev@lists.aswf.io to automatically receive all group messages.