Chris Foster <chri...@...>
On Fri, Jan 22, 2010 at 5:22 PM, Wormszer <worm...@...> wrote:
That's interesting, it kind of relates to my original question if theNo, the active index case isn't vectorized by the compiler anyway.
Are those numbers taking into account the setup time to create either theNo. In a SIMD shader machine I generally expect that creating the runstate
representation (whatever it may be) from the results of a conditional is going
to be a relatively small proportion of the total runtime.
The iterator idea crossed my mind too but I too wouldn't of expected it toNot unrolling, vectorizing - the way I wrote the iterator appears to prevent
the compiler vectorizing the loop using SSE.
I wonder if the way you use the iterator is having an effect, where aI don't know. I know nothing about how gcc's tree vectorizer works. If it's
enabled by a heuristic whenever it sees special "simple" uses of the for loop,
then any iterator abstraction is unlikely to work.
It looks like active index is the way to go,If hardware (SSE) vectorization isn't going to be on the cards for most
operations, I think the active index method is looking like a winner.
Generally speaking it seems to have more reliable performance characteristics,
especially in the face of incoherence.
i wonder why it doesn't performYes, I think that's the reason.
If the two methods perform well under different conditions is there enoughI considered something like this too. IMHO, making things this complex
requires that the SIMD state iteration should be abstracted, but an iterator
abstraction isn't appropriate in that case.
Is there enough coherence between frame to frame, execution to execution,That doesn't sound like it would help to me ;-) You need to modify the
runstate at every conditional branch anyway, so it's possible to analyse it for
coherence during modification, if necessary.