Date   

Re: run flags vs active intervals

Larry Gritz <l...@...>
 

I want to point out that Chris F tested out the "all on", "random on", and "alternating on/off" cases. There's one more case that may be important, which is a few isolated "on" points with big "off" gaps in between -- and in that case, I expect Chris F's solution to perform even better (compared to what we have now) than the other cases.

I'm not looking forward to doing this overhaul (only because it's tedious and extensive, there's nothing hard about it), but I think it's potentially a big win. Thanks, Chris!

-- lg


On Jan 21, 2010, at 10:49 AM, Christopher wrote:

I like this idea too. What we were discussing yesterday was something
like:

index: [ 0 1 2 3 4 5 6 7 8 9 ]
flags: [ 0 0 1 1 1 1 0 0 1 0 ]
active_points: [ 2 3 4 5 8 ]

for (int i = 0; i < num_active; i++)
do_something(active_points[i]);

This would be slightly more efficient if you had a single point active
(or isolated single points), but it requires a lot more indirections
in the common case.

So I'm all in favor of trying this out - though it is a pretty big
overhaul ...

I would maintain these ranges as little int arrays on the stack just
like we maintain the runflags now. The upper bound for the
active_range array size is just npoints (run flags: on off on off on
off ...) - flip any bit and you get fewer "runs".

-Chris


On Jan 21, 9:16 am, Larry Gritz <l...@...> wrote:
Awesome, Chris. Would you believe we were just talking internally about this topic yesterday? We were considering the amount of waste if there were big gaps of "off" points in the middle. But I think your solution is quite a bit more elegant than what we were discussing (a list of "on" points). I like how it devolves into (begin,end] for the common case of all points on.

It's a pretty big overhaul, touches every shadeop and a lot of templates, and the call signatures have to be changed (to what, exactly? passing a std::vector<int>& ? or a combo of int *begend and int segments?). Ugh, and we may wish to change/add OIIO texture routines that take runflags, too. But your experiment is quite convincing.

What does everybody else think? Chris/Cliff/Alex? (I'm happy to do the coding, but I want consensus because it touches so much.)

-- lg

On Jan 21, 2010, at 8:21 AM, Chris Foster wrote:



Hi all,
I've been looking through the OSL source a little, and I'm interested to see
that you're using runflags for the SIMD state. I know that's a really
conventional solution, but there's an alternative representation which I reckon
is significantly faster, and I wondered if you considered it.
Imagine a SIMD runstate as follows
index: [ 0 1 2 3 4 5 6 7 8 9 ]
flags: [ 0 0 1 1 1 1 0 0 1 0 ]
An alternative to the flags is to represent this state as a list of active
intervals. As a set of active start/stop pairs, the state looks like
active = [ 2 6 8 9 ]
ie,
state: [ 0 0 1 1 1 1 0 0 1 0 ]
^ v ^ v
2 6 8 9
The advantage of doing this is that the inner SIMD loops become tighter since
there's one less test. Instead of
for (int i = 0; i < len; ++i)
{
if (state[i])
do_something(i);
}
we'd have
for (int j = 0; j < nActiveTimes2; j+=2)
{
for (int i = active[j]; i < active[j+1]; ++i)
do_something(i);
}
and the inner loop is now completely coherent.
I can see you have the beginnings of this idea in the current OSL code, since
you pass beginpoint and endpoint along with the runflags everywhere. However,
why not take it to its logical conclusion?
Given that you already have beginpoint and endpoint, the particularly big win
here is when most of the flags are turned on. If do_something() is a simple
arithmetic operation (eg, float addition) the difference between the two
formulations can be a factor of two in speed.
I'm attaching some basic test code which I whipped up. Timings on my core2 duo
with gcc -O3 look like:
All flags on (completely coherent):
run flags: 1.310s
active intervals: 0.690s
Random flags on (completely incoherent):
run flags: 5.440s
active intervals: 3.310s
Alternate flags on (maximum number of active intervals -> worst case):
run flags: 1.500s
active intervals: 2.150s
Thoughts?
~Chris.
<ATT00001..txt><runstate.cpp>
--
Larry Gritz
l...@...
<ATT00001..txt>
--
Larry Gritz
l...@...


Re: run flags vs active intervals

Christopher <cku...@...>
 

I like this idea too. What we were discussing yesterday was something
like:

index: [ 0 1 2 3 4 5 6 7 8 9 ]
flags: [ 0 0 1 1 1 1 0 0 1 0 ]
active_points: [ 2 3 4 5 8 ]

for (int i = 0; i < num_active; i++)
do_something(active_points[i]);

This would be slightly more efficient if you had a single point active
(or isolated single points), but it requires a lot more indirections
in the common case.

So I'm all in favor of trying this out - though it is a pretty big
overhaul ...

I would maintain these ranges as little int arrays on the stack just
like we maintain the runflags now. The upper bound for the
active_range array size is just npoints (run flags: on off on off on
off ...) - flip any bit and you get fewer "runs".

-Chris

On Jan 21, 9:16 am, Larry Gritz <l...@...> wrote:
Awesome, Chris.  Would you believe we were just talking internally about this topic yesterday?  We were considering the amount of waste if there were big gaps of "off" points in the middle.  But I think your solution is quite a bit more elegant than what we were discussing (a list of "on" points).  I like how it devolves into (begin,end] for the common case of all points on.

It's a pretty big overhaul, touches every shadeop and a lot of templates, and the call signatures have to be changed (to what, exactly? passing a std::vector<int>& ? or a combo of int *begend and int segments?).  Ugh, and we may wish to change/add OIIO texture routines that take runflags, too.  But your experiment is quite convincing.

What does everybody else think?  Chris/Cliff/Alex?  (I'm happy to do the coding, but I want consensus because it touches so much.)

        -- lg

On Jan 21, 2010, at 8:21 AM, Chris Foster wrote:



Hi all,
I've been looking through the OSL source a little, and I'm interested to see
that you're using runflags for the SIMD state.  I know that's a really
conventional solution, but there's an alternative representation which I reckon
is significantly faster, and I wondered if you considered it.
Imagine a SIMD runstate as follows
index: [ 0  1  2  3  4  5  6  7  8  9 ]
flags: [ 0  0  1  1  1  1  0  0  1  0 ]
An alternative to the flags is to represent this state as a list of active
intervals.  As a set of active start/stop pairs, the state looks like
active = [ 2 6  8 9 ]
ie,
state: [ 0  0  1  1  1  1  0  0  1  0 ]
              ^           v     ^  v
              2           6     8  9
The advantage of doing this is that the inner SIMD loops become tighter since
there's one less test.  Instead of
for (int i = 0; i < len; ++i)
{
   if (state[i])
       do_something(i);
}
we'd have
for (int j = 0; j < nActiveTimes2; j+=2)
{
   for (int i = active[j]; i < active[j+1]; ++i)
       do_something(i);
}
and the inner loop is now completely coherent.
I can see you have the beginnings of this idea in the current OSL code, since
you pass beginpoint and endpoint along with the runflags everywhere.  However,
why not take it to its logical conclusion?
Given that you already have beginpoint and endpoint, the particularly big win
here is when most of the flags are turned on.  If do_something() is a simple
arithmetic operation (eg, float addition) the difference between the two
formulations can be a factor of two in speed.
I'm attaching some basic test code which I whipped up.  Timings on my core2 duo
with gcc -O3 look like:
All flags on (completely coherent):
run flags:        1.310s
active intervals: 0.690s
Random flags on (completely incoherent):
run flags:        5.440s
active intervals: 3.310s
Alternate flags on (maximum number of active intervals -> worst case):
run flags:        1.500s
active intervals: 2.150s
Thoughts?
~Chris.
<ATT00001..txt><runstate.cpp>
--
Larry Gritz
l...@...


Re: run flags vs active intervals

Xavier Ho <con...@...>
 

On Fri, Jan 22, 2010 at 4:15 AM, Wormszer <worm...@...> wrote:
Is it ok to ask these questions in this thread like this or should I be starting a new one? I don't want to fill this thread with off topic information from the original questions? I am somewhat new to this process, and unclear on what is looked down upon.

As a uni student who is keen on the OSL open source release, I'm glad to be reading these discussions - particularly questions and answers. They're the source of knowledge, freely shared insights. It excites the community in its common understanding of this project, and benefits those who, like myself, digest the information that go through mailing lists. Not to mention this is probably archived, and later on searchable if you ever need it again. My opinion is, if it's related, it's probably okay. Unless you intend to start a newer/bigger conversation, there isn't really a need to start a new thread. Having multiple threads on similar topics only make digging information harder, anyhow.

My 2 cp,
Xavier


Re: run flags vs active intervals

Wormszer <worm...@...>
 

Thanks, that makes sense. I know compilers are getting smarter all the time but wasn't sure if they were there yet. The cache and memory coherence makes sense, and then sorting things like that would allow for easier parallel implementations on GPU, etc.

Another quick question what is n, is n the total number of pixels in the image, or i possibly rays? And that's why you have disabled ops, pixels that shader is not applied too?
Is there a good place to look or read about that? Or do i need to dig into the source?

Is it ok to ask these questions in this thread like this or should I be starting a new one? I don't want to fill this thread with off topic information from the original questions? I am somewhat new to this process, and unclear on what is looked down upon.

Thanks
Jeremy


On Thu, Jan 21, 2010 at 12:44 PM, Larry Gritz <l...@...> wrote:
It's not really referring to hardware SIMD, but just that we are shading many points at once, "in lock-step."  In other words, if you had many points to shade, you could do it in this order:

    point 0:
        add
        texture
        assign
    point 1:
        add
        texture
        assign
    ...
    point n:
        add
        texture
        assign

or you could do it in this order:

    add all points on [0,n]
    texture all points on [0,n]
    assign all points on [0,n]

We call the latter "SIMD" (single instruction, multiple data), because each instruction operates on big arrays of data (one value for each point being shaded).

SIMD helps by:

   * increasing interpreter speed, by moving much of the interpreter overhead to "per batch" rather than "per point".
   * improving memory coherence, cache behavior, and allowing hardware SIMD, since it's naturally using "structure-of-array" layout, i.e. it's faster to add contiguous arrays than separate floats in more sparse memory locations.
   * improves texture coherence because you're doing lots of lookups from the same texture on all points at the same time (which are in turn likely to be on the same tiles).




On Jan 21, 2010, at 9:30 AM, Wormszer wrote:

Is there a good resource on this topic? I did some googling and didn't see what i was looking for.

Where is the actual SIMD taking place?
Is the compiler figuring it out from the loop and setting up the correct instructions. Or is it relying on the cpu to recognize consecutive ops with different data, modifying instructions in its pipe by combing instructions into a parallel one.

Or maybe i'm way off.

Thanks,

Jeremy

On Thu, Jan 21, 2010 at 12:16 PM, Larry Gritz <l...@...> wrote:
Awesome, Chris.  Would you believe we were just talking internally about this topic yesterday?  We were considering the amount of waste if there were big gaps of "off" points in the middle.  But I think your solution is quite a bit more elegant than what we were discussing (a list of "on" points).  I like how it devolves into (begin,end] for the common case of all points on.

It's a pretty big overhaul, touches every shadeop and a lot of templates, and the call signatures have to be changed (to what, exactly? passing a std::vector<int>& ? or a combo of int *begend and int segments?).  Ugh, and we may wish to change/add OIIO texture routines that take runflags, too.  But your experiment is quite convincing.

What does everybody else think?  Chris/Cliff/Alex?  (I'm happy to do the coding, but I want consensus because it touches so much.)

       -- lg


On Jan 21, 2010, at 8:21 AM, Chris Foster wrote:

> Hi all,
>
> I've been looking through the OSL source a little, and I'm interested to see
> that you're using runflags for the SIMD state.  I know that's a really
> conventional solution, but there's an alternative representation which I reckon
> is significantly faster, and I wondered if you considered it.
>
> Imagine a SIMD runstate as follows
>
> index: [ 0  1  2  3  4  5  6  7  8  9 ]
>
> flags: [ 0  0  1  1  1  1  0  0  1  0 ]
>
>
> An alternative to the flags is to represent this state as a list of active
> intervals.  As a set of active start/stop pairs, the state looks like
>
> active = [ 2 6  8 9 ]
>
> ie,
>
> state: [ 0  0  1  1  1  1  0  0  1  0 ]
>               ^           v     ^  v
>               2           6     8  9
>
> The advantage of doing this is that the inner SIMD loops become tighter since
> there's one less test.  Instead of
>
> for (int i = 0; i < len; ++i)
> {
>    if (state[i])
>        do_something(i);
> }
>
> we'd have
>
> for (int j = 0; j < nActiveTimes2; j+=2)
> {
>    for (int i = active[j]; i < active[j+1]; ++i)
>        do_something(i);
> }
>
> and the inner loop is now completely coherent.
>
> I can see you have the beginnings of this idea in the current OSL code, since
> you pass beginpoint and endpoint along with the runflags everywhere.  However,
> why not take it to its logical conclusion?
>
> Given that you already have beginpoint and endpoint, the particularly big win
> here is when most of the flags are turned on.  If do_something() is a simple
> arithmetic operation (eg, float addition) the difference between the two
> formulations can be a factor of two in speed.
>
>
> I'm attaching some basic test code which I whipped up.  Timings on my core2 duo
> with gcc -O3 look like:
>
>
> All flags on (completely coherent):
>
> run flags:        1.310s
> active intervals: 0.690s
>
>
> Random flags on (completely incoherent):
>
> run flags:        5.440s
> active intervals: 3.310s
>
>
> Alternate flags on (maximum number of active intervals -> worst case):
>
> run flags:        1.500s
> active intervals: 2.150s
>
>
> Thoughts?
>
> ~Chris.
> <ATT00001..txt><runstate.cpp>

--
Larry Gritz
l...@...





--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.




<ATT00001..htm>

--
Larry Gritz




--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.



Re: run flags vs active intervals

Larry Gritz <l...@...>
 

It's not really referring to hardware SIMD, but just that we are shading many points at once, "in lock-step."  In other words, if you had many points to shade, you could do it in this order:

    point 0:
        add
        texture
        assign
    point 1:
        add
        texture
        assign
    ...
    point n:
        add
        texture
        assign

or you could do it in this order:

    add all points on [0,n]
    texture all points on [0,n]
    assign all points on [0,n]

We call the latter "SIMD" (single instruction, multiple data), because each instruction operates on big arrays of data (one value for each point being shaded).

SIMD helps by:

   * increasing interpreter speed, by moving much of the interpreter overhead to "per batch" rather than "per point".
   * improving memory coherence, cache behavior, and allowing hardware SIMD, since it's naturally using "structure-of-array" layout, i.e. it's faster to add contiguous arrays than separate floats in more sparse memory locations.
   * improves texture coherence because you're doing lots of lookups from the same texture on all points at the same time (which are in turn likely to be on the same tiles).




On Jan 21, 2010, at 9:30 AM, Wormszer wrote:

Is there a good resource on this topic? I did some googling and didn't see what i was looking for.

Where is the actual SIMD taking place?
Is the compiler figuring it out from the loop and setting up the correct instructions. Or is it relying on the cpu to recognize consecutive ops with different data, modifying instructions in its pipe by combing instructions into a parallel one.

Or maybe i'm way off.

Thanks,

Jeremy

On Thu, Jan 21, 2010 at 12:16 PM, Larry Gritz <l...@...> wrote:
Awesome, Chris.  Would you believe we were just talking internally about this topic yesterday?  We were considering the amount of waste if there were big gaps of "off" points in the middle.  But I think your solution is quite a bit more elegant than what we were discussing (a list of "on" points).  I like how it devolves into (begin,end] for the common case of all points on.

It's a pretty big overhaul, touches every shadeop and a lot of templates, and the call signatures have to be changed (to what, exactly? passing a std::vector<int>& ? or a combo of int *begend and int segments?).  Ugh, and we may wish to change/add OIIO texture routines that take runflags, too.  But your experiment is quite convincing.

What does everybody else think?  Chris/Cliff/Alex?  (I'm happy to do the coding, but I want consensus because it touches so much.)

       -- lg


On Jan 21, 2010, at 8:21 AM, Chris Foster wrote:

> Hi all,
>
> I've been looking through the OSL source a little, and I'm interested to see
> that you're using runflags for the SIMD state.  I know that's a really
> conventional solution, but there's an alternative representation which I reckon
> is significantly faster, and I wondered if you considered it.
>
> Imagine a SIMD runstate as follows
>
> index: [ 0  1  2  3  4  5  6  7  8  9 ]
>
> flags: [ 0  0  1  1  1  1  0  0  1  0 ]
>
>
> An alternative to the flags is to represent this state as a list of active
> intervals.  As a set of active start/stop pairs, the state looks like
>
> active = [ 2 6  8 9 ]
>
> ie,
>
> state: [ 0  0  1  1  1  1  0  0  1  0 ]
>               ^           v     ^  v
>               2           6     8  9
>
> The advantage of doing this is that the inner SIMD loops become tighter since
> there's one less test.  Instead of
>
> for (int i = 0; i < len; ++i)
> {
>    if (state[i])
>        do_something(i);
> }
>
> we'd have
>
> for (int j = 0; j < nActiveTimes2; j+=2)
> {
>    for (int i = active[j]; i < active[j+1]; ++i)
>        do_something(i);
> }
>
> and the inner loop is now completely coherent.
>
> I can see you have the beginnings of this idea in the current OSL code, since
> you pass beginpoint and endpoint along with the runflags everywhere.  However,
> why not take it to its logical conclusion?
>
> Given that you already have beginpoint and endpoint, the particularly big win
> here is when most of the flags are turned on.  If do_something() is a simple
> arithmetic operation (eg, float addition) the difference between the two
> formulations can be a factor of two in speed.
>
>
> I'm attaching some basic test code which I whipped up.  Timings on my core2 duo
> with gcc -O3 look like:
>
>
> All flags on (completely coherent):
>
> run flags:        1.310s
> active intervals: 0.690s
>
>
> Random flags on (completely incoherent):
>
> run flags:        5.440s
> active intervals: 3.310s
>
>
> Alternate flags on (maximum number of active intervals -> worst case):
>
> run flags:        1.500s
> active intervals: 2.150s
>
>
> Thoughts?
>
> ~Chris.
> <ATT00001..txt><runstate.cpp>

--
Larry Gritz
l...@...





--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.




<ATT00001..htm>

--
Larry Gritz




Re: run flags vs active intervals

Wormszer <worm...@...>
 

Is there a good resource on this topic? I did some googling and didn't see what i was looking for.

Where is the actual SIMD taking place?
Is the compiler figuring it out from the loop and setting up the correct instructions. Or is it relying on the cpu to recognize consecutive ops with different data, modifying instructions in its pipe by combing instructions into a parallel one.

Or maybe i'm way off.

Thanks,

Jeremy


On Thu, Jan 21, 2010 at 12:16 PM, Larry Gritz <l...@...> wrote:
Awesome, Chris.  Would you believe we were just talking internally about this topic yesterday?  We were considering the amount of waste if there were big gaps of "off" points in the middle.  But I think your solution is quite a bit more elegant than what we were discussing (a list of "on" points).  I like how it devolves into (begin,end] for the common case of all points on.

It's a pretty big overhaul, touches every shadeop and a lot of templates, and the call signatures have to be changed (to what, exactly? passing a std::vector<int>& ? or a combo of int *begend and int segments?).  Ugh, and we may wish to change/add OIIO texture routines that take runflags, too.  But your experiment is quite convincing.

What does everybody else think?  Chris/Cliff/Alex?  (I'm happy to do the coding, but I want consensus because it touches so much.)

       -- lg


On Jan 21, 2010, at 8:21 AM, Chris Foster wrote:

> Hi all,
>
> I've been looking through the OSL source a little, and I'm interested to see
> that you're using runflags for the SIMD state.  I know that's a really
> conventional solution, but there's an alternative representation which I reckon
> is significantly faster, and I wondered if you considered it.
>
> Imagine a SIMD runstate as follows
>
> index: [ 0  1  2  3  4  5  6  7  8  9 ]
>
> flags: [ 0  0  1  1  1  1  0  0  1  0 ]
>
>
> An alternative to the flags is to represent this state as a list of active
> intervals.  As a set of active start/stop pairs, the state looks like
>
> active = [ 2 6  8 9 ]
>
> ie,
>
> state: [ 0  0  1  1  1  1  0  0  1  0 ]
>               ^           v     ^  v
>               2           6     8  9
>
> The advantage of doing this is that the inner SIMD loops become tighter since
> there's one less test.  Instead of
>
> for (int i = 0; i < len; ++i)
> {
>    if (state[i])
>        do_something(i);
> }
>
> we'd have
>
> for (int j = 0; j < nActiveTimes2; j+=2)
> {
>    for (int i = active[j]; i < active[j+1]; ++i)
>        do_something(i);
> }
>
> and the inner loop is now completely coherent.
>
> I can see you have the beginnings of this idea in the current OSL code, since
> you pass beginpoint and endpoint along with the runflags everywhere.  However,
> why not take it to its logical conclusion?
>
> Given that you already have beginpoint and endpoint, the particularly big win
> here is when most of the flags are turned on.  If do_something() is a simple
> arithmetic operation (eg, float addition) the difference between the two
> formulations can be a factor of two in speed.
>
>
> I'm attaching some basic test code which I whipped up.  Timings on my core2 duo
> with gcc -O3 look like:
>
>
> All flags on (completely coherent):
>
> run flags:        1.310s
> active intervals: 0.690s
>
>
> Random flags on (completely incoherent):
>
> run flags:        5.440s
> active intervals: 3.310s
>
>
> Alternate flags on (maximum number of active intervals -> worst case):
>
> run flags:        1.500s
> active intervals: 2.150s
>
>
> Thoughts?
>
> ~Chris.
> <ATT00001..txt><runstate.cpp>

--
Larry Gritz
l...@...





--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.





Re: run flags vs active intervals

Larry Gritz <l...@...>
 

Awesome, Chris. Would you believe we were just talking internally about this topic yesterday? We were considering the amount of waste if there were big gaps of "off" points in the middle. But I think your solution is quite a bit more elegant than what we were discussing (a list of "on" points). I like how it devolves into (begin,end] for the common case of all points on.

It's a pretty big overhaul, touches every shadeop and a lot of templates, and the call signatures have to be changed (to what, exactly? passing a std::vector<int>& ? or a combo of int *begend and int segments?). Ugh, and we may wish to change/add OIIO texture routines that take runflags, too. But your experiment is quite convincing.

What does everybody else think? Chris/Cliff/Alex? (I'm happy to do the coding, but I want consensus because it touches so much.)

-- lg


On Jan 21, 2010, at 8:21 AM, Chris Foster wrote:

Hi all,

I've been looking through the OSL source a little, and I'm interested to see
that you're using runflags for the SIMD state. I know that's a really
conventional solution, but there's an alternative representation which I reckon
is significantly faster, and I wondered if you considered it.

Imagine a SIMD runstate as follows

index: [ 0 1 2 3 4 5 6 7 8 9 ]

flags: [ 0 0 1 1 1 1 0 0 1 0 ]


An alternative to the flags is to represent this state as a list of active
intervals. As a set of active start/stop pairs, the state looks like

active = [ 2 6 8 9 ]

ie,

state: [ 0 0 1 1 1 1 0 0 1 0 ]
^ v ^ v
2 6 8 9

The advantage of doing this is that the inner SIMD loops become tighter since
there's one less test. Instead of

for (int i = 0; i < len; ++i)
{
if (state[i])
do_something(i);
}

we'd have

for (int j = 0; j < nActiveTimes2; j+=2)
{
for (int i = active[j]; i < active[j+1]; ++i)
do_something(i);
}

and the inner loop is now completely coherent.

I can see you have the beginnings of this idea in the current OSL code, since
you pass beginpoint and endpoint along with the runflags everywhere. However,
why not take it to its logical conclusion?

Given that you already have beginpoint and endpoint, the particularly big win
here is when most of the flags are turned on. If do_something() is a simple
arithmetic operation (eg, float addition) the difference between the two
formulations can be a factor of two in speed.


I'm attaching some basic test code which I whipped up. Timings on my core2 duo
with gcc -O3 look like:


All flags on (completely coherent):

run flags: 1.310s
active intervals: 0.690s


Random flags on (completely incoherent):

run flags: 5.440s
active intervals: 3.310s


Alternate flags on (maximum number of active intervals -> worst case):

run flags: 1.500s
active intervals: 2.150s


Thoughts?

~Chris.
<ATT00001..txt><runstate.cpp>
--
Larry Gritz
l...@...


run flags vs active intervals

Chris Foster <chri...@...>
 

Hi all,

I've been looking through the OSL source a little, and I'm interested to see
that you're using runflags for the SIMD state. I know that's a really
conventional solution, but there's an alternative representation which I reckon
is significantly faster, and I wondered if you considered it.

Imagine a SIMD runstate as follows

index: [ 0 1 2 3 4 5 6 7 8 9 ]

flags: [ 0 0 1 1 1 1 0 0 1 0 ]


An alternative to the flags is to represent this state as a list of active
intervals. As a set of active start/stop pairs, the state looks like

active = [ 2 6 8 9 ]

ie,

state: [ 0 0 1 1 1 1 0 0 1 0 ]
^ v ^ v
2 6 8 9

The advantage of doing this is that the inner SIMD loops become tighter since
there's one less test. Instead of

for (int i = 0; i < len; ++i)
{
if (state[i])
do_something(i);
}

we'd have

for (int j = 0; j < nActiveTimes2; j+=2)
{
for (int i = active[j]; i < active[j+1]; ++i)
do_something(i);
}

and the inner loop is now completely coherent.

I can see you have the beginnings of this idea in the current OSL code, since
you pass beginpoint and endpoint along with the runflags everywhere. However,
why not take it to its logical conclusion?

Given that you already have beginpoint and endpoint, the particularly big win
here is when most of the flags are turned on. If do_something() is a simple
arithmetic operation (eg, float addition) the difference between the two
formulations can be a factor of two in speed.


I'm attaching some basic test code which I whipped up. Timings on my core2 duo
with gcc -O3 look like:


All flags on (completely coherent):

run flags: 1.310s
active intervals: 0.690s


Random flags on (completely incoherent):

run flags: 5.440s
active intervals: 3.310s


Alternate flags on (maximum number of active intervals -> worst case):

run flags: 1.500s
active intervals: 2.150s


Thoughts?

~Chris.


Re: Volume Shaders

Daniel <night-...@...>
 

On 21 Jan., 16:13, Larry Gritz <l...@...> wrote:
Stay tuned, this will all be done very soon, and probably discussed in detail here as it's happening.
Thanks so far for the answers, I'm looking forward to seeing OSL
progress. Good stuff!

Regards,
Daniel


Re: Volume Shaders

Larry Gritz <l...@...>
 

Er, I'm not sure. I think this is exactly the kind of thing we will be working out over the next few weeks. When we actually implemented surface integrators, we discovered all sorts of issues we hadn't realized when we spec'ed it. The gist is what we imagined, but the little details are different, I expect the same will happen with volumes.

The "density" is already part of the volume's closure, in much the same way that opacity is already part of a surface closure (by virtue of its weighting of closure elements that know they are "transparency-like"). It's possible that we'll want the shader to also return some kind of hint about the frequency that it needs to be sampled (as part of, or in addition to, the closure). Or maybe that's strictly the job of the integrator. The integrator can itself have parameters set by the renderer, and there can be multiple integrators for very different volume situations that need different sampling strategies. Also, it should be obvious that when we add volumes, we will also add several volume scattering closures, as the set of surface closures will not be adequate.

Stay tuned, this will all be done very soon, and probably discussed in detail here as it's happening.

-- lg


On Jan 21, 2010, at 1:12 AM, Daniel wrote:

On 21 Jan., 08:33, Larry Gritz <l...@...> wrote:
OK, that is how I interpreted the spec.
The volume integrator will, presumably, need some information about
the density of the volume, right? Do you already have ideas for
specifying that in OSL? Or will volume density information simply be
part of the scene description so integrators can choose to do this in
a way that fits their scene format?

Regards,
Daniel
<ATT00001..txt>
--
Larry Gritz
l...@...


Re: Volume Shaders

Daniel <night-...@...>
 

On 21 Jan., 10:12, Daniel <night...@...> wrote:
Or will volume density information simply be part of the scene description so integrators can choose to do this in a way that fits their scene format?
Meh, I just realized my ambiguous use of "integrator". In this case I
meant "party integrating OSL into their system"...

Regards,
Daniel


Re: Volume Shaders

Daniel <night-...@...>
 

On 21 Jan., 08:33, Larry Gritz <l...@...> wrote:
A "volume integrator" in the renderer is responsible for doing the actual ray marching, evaluating the closure for specific light directions, and accumulating the contributions along the viewing ray.
OK, that is how I interpreted the spec.
The volume integrator will, presumably, need some information about
the density of the volume, right? Do you already have ideas for
specifying that in OSL? Or will volume density information simply be
part of the scene description so integrators can choose to do this in
a way that fits their scene format?

Regards,
Daniel


Re: Volume Shaders

Larry Gritz <l...@...>
 

We're tackling the volume shaders fairly soon. The spec isn't very clear about them, but that will be beefed up as we implement it. Basically the idea is that it will be very analogous to surface shaders -- returning a closure that describes in a view-independent way what the scattering of the volume is at a particular point. A "volume integrator" in the renderer is responsible for doing the actual ray marching, evaluating the closure for specific light directions, and accumulating the contributions along the viewing ray.

This will all get fleshed out over the next few weeks, there's a pretty strong near-term deadline on our own shows to get this working.

-- lg


On Jan 20, 2010, at 12:29 AM, Daniel wrote:

Guys,

I just started looking into OSL. It certainly looks very interesting!
Did I miss something or is it true that volume shaders are currently
neither implemented, nor truly specified?
I see that you filed issue #3 about this. Can you already comment on
your plans for volume shaders? I'm curious... Which part of the
illumination will the shaders actually compute?

Regards,
Daniel
<ATT00001..txt>
--
Larry Gritz
l...@...


Re: Compiling OpenShadingLanguage under Windows

Wormszer <worm...@...>
 


I have mcpp integrated into the compiler now. Working like CPP and writing everything to stdout.

A few things, the binary available for download at least on windows doesn't support forcing an include file.
So i had to build it from source, luckily it wasn't to difficult to create a new project for it.

I built it as VS for VS. Now that I think about it I bet I could build it as VS for GCC so that it would support the same command line options.

Another weird issue was I guess the lexer doesn't support  c-style /* comments? These might not be defined in the osl language, and if so probably should be removed from the stdosl.h.
Its the only difference i could see, that could of been causing the error.

CL would remove the */ */ comments at the end of the math defines in stdosl.h even though i told it to preserve comments.

mcpp when i told it to leave comments would actually leave them in there(seems like correct behavior), and then the lexer would crash.

I am not sure why comments are needed really at this point, and i think i must of included them for my own debugging, i don't think CPP was set to output them.

Thanks for the suggestions on a easier solution to the pre-processor issue.

I looked real quick for the boost but mcpp seemed easier since i could just get a binary and set it in my path. (but then i had to rebuild it)

Well now to see if I can get anything to happen with my compiled shaders.

Jeremy


On Wed, Jan 20, 2010 at 9:03 PM, Wormszer <worm...@...> wrote:
I am fine with either one. I think having something embedded or buildable would be usefull.

Otherwise there maybe issues with different compilers and would probably need some kind of config or something that cmake would generate at least on windows with several versions of VS etc.

Will just have to see how Larry or the other devs feel about using one of those two for Linux builds as well. I would assume it would be wise to have all the preprocessing done with the same tool when possible.

I will look at both real quick but I might lean towards mcpp.


On Jan 20, 2010, at 8:14 PM, Chris Foster <chri...@...> wrote:

On Thu, Jan 21, 2010 at 11:02 AM, Blair Zajac <bl...@...> wrote:
The main annoyance with wave is that it causes the compiler to issue truly
horrendous error messages if you get things wrong (the wave internals make
heavy use of template metaprogramming).

(Obviously this is only a problem when integrating wave into the project
source, and that's not really difficult at all.)

There's mcpp which is designed to be an embeddable C-preprocessor.

Ice, which we use at Sony Imageworks for all our middle-tier systems, uses
mcpp for it's own internal IDL type language, so I would recommend that.

mcpp looks nice.  In some sense, using wave would mean one less dependency
since OSL already relies on boost, but it does mean linking to libboost-wave,
so if you have a modular boost install the point may be moot...

~Chris

PS: Sorry to Blair for the duplicate message.  I intended to send it
to the list :-(
--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.




Re: Compiling OpenShadingLanguage under Windows

Wormszer <worm...@...>
 

I am fine with either one. I think having something embedded or buildable would be usefull.

Otherwise there maybe issues with different compilers and would probably need some kind of config or something that cmake would generate at least on windows with several versions of VS etc.

Will just have to see how Larry or the other devs feel about using one of those two for Linux builds as well. I would assume it would be wise to have all the preprocessing done with the same tool when possible.

I will look at both real quick but I might lean towards mcpp.

On Jan 20, 2010, at 8:14 PM, Chris Foster <chri...@...> wrote:

On Thu, Jan 21, 2010 at 11:02 AM, Blair Zajac <bl...@...> wrote:
The main annoyance with wave is that it causes the compiler to issue truly
horrendous error messages if you get things wrong (the wave internals make
heavy use of template metaprogramming).
(Obviously this is only a problem when integrating wave into the project
source, and that's not really difficult at all.)

There's mcpp which is designed to be an embeddable C-preprocessor.

Ice, which we use at Sony Imageworks for all our middle-tier systems, uses
mcpp for it's own internal IDL type language, so I would recommend that.
mcpp looks nice. In some sense, using wave would mean one less dependency
since OSL already relies on boost, but it does mean linking to libboost-wave,
so if you have a modular boost install the point may be moot...

~Chris

PS: Sorry to Blair for the duplicate message. I intended to send it
to the list :-(
--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@... .
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en .


Re: Compiling OpenShadingLanguage under Windows

Chris Foster <chri...@...>
 

On Thu, Jan 21, 2010 at 11:02 AM, Blair Zajac <bl...@...> wrote:
The main annoyance with wave is that it causes the compiler to issue truly
horrendous error messages if you get things wrong (the wave internals make
heavy use of template metaprogramming).
(Obviously this is only a problem when integrating wave into the project
source, and that's not really difficult at all.)

There's mcpp which is designed to be an embeddable C-preprocessor.

Ice, which we use at Sony Imageworks for all our middle-tier systems, uses
mcpp for it's own internal IDL type language, so I would recommend that.
mcpp looks nice. In some sense, using wave would mean one less dependency
since OSL already relies on boost, but it does mean linking to libboost-wave,
so if you have a modular boost install the point may be moot...

~Chris

PS: Sorry to Blair for the duplicate message. I intended to send it
to the list :-(


Re: Compiling OpenShadingLanguage under Windows

Blair Zajac <bl...@...>
 

On 01/20/2010 03:11 PM, Chris Foster wrote:
On Thu, Jan 21, 2010 at 7:37 AM, Wormszer<worm...@...> wrote:
I have a solution but its not very elegant. The compiler is hard coded to
use linux CPP preprocessor. /usr/bin/CPP
Over at aqsis, we use the C preprecessor from boost called boost.wave. It's
been a good choice for us since it runs anywhere with a capable C++ compiler.

The main annoyance with wave is that it causes the compiler to issue truly
horrendous error messages if you get things wrong (the wave internals make
heavy use of template metaprogramming).
There's mcpp which is designed to be an embeddable C-preprocessor.

Ice, which we use at Sony Imageworks for all our middle-tier systems, uses mcpp for it's own internal IDL type language, so I would recommend that.

Regards,
Blair


Re: Compiling OpenShadingLanguage under Windows

Chris Foster <chri...@...>
 

On Thu, Jan 21, 2010 at 7:37 AM, Wormszer <worm...@...> wrote:
I have a solution but its not very elegant. The compiler is hard coded to
use linux CPP preprocessor. /usr/bin/CPP
Over at aqsis, we use the C preprecessor from boost called boost.wave. It's
been a good choice for us since it runs anywhere with a capable C++ compiler.

The main annoyance with wave is that it causes the compiler to issue truly
horrendous error messages if you get things wrong (the wave internals make
heavy use of template metaprogramming).

~Chris.


Re: Compiling OpenShadingLanguage under Windows

Oleg <ode...@...>
 

Nice debugging session... Thanks for the information :-)

Did you try to use the "ShellExecute()" function? Does the
"ShellExecute()" function setup the environment in the right way (or
just not modify the existing one)?

I will try to look into it in the next days.

Oleg

On 20 Jan., 22:37, Wormszer <wo...@...> wrote:
Hello Oleg,

That's probably a better solution. Will need to go back and make sure the
libraries export their classes rather than duplicating the source in the
projects.
I was trying to avoid adding to much to the code for windows, and I was
planning on after hopefully to get it all building, to then do a clean
checkout and do it again and make changes to submit as a patch.

I played around some and finally got a shader to compile.

I have a solution but its not very elegant. The compiler is hard coded to
use linux CPP preprocessor. /usr/bin/CPP
I was able to get it working using CL to do the pre-processing. But I had to
write it to a file and read it back in rather than using stdout.
Its only a few lines of code extra and passes the file handle from the
opened file to the parser instead of the _popen handle.

CL will output to stdout. When i was testing in in the debugger the issue
was that by default the build environment is not setup.
You typically have to run the vcvars32.bat to set it up and that was causing
some issues.

And then i am not positive if the stdout of CL parses correctly, it seems to
insert the filename at the begining, but the file output does.

So maybe we can figure out a better solution to get it working with cl using
stdout. It may work if you run oslc from a command prompt with the vs
variables setup already.
I will try that next, except im expecting the process spawned by _popen
won't have the environment setup correctly. And since it using stdout to
read from then anything you do can't output or it gets parsed.

If you mess with it the parameters to CL i used are /P /C /EP /X /nologo and
then /FI to force the include file. Remove /P if you want stdout instead of
file.

Jeremy

On Wed, Jan 20, 2010 at 4:07 PM, Oleg <od...@...> wrote:
Hi Jeremy,
Great,
Did you encounter the same linking issues i mentioned?
Yes, but I've solved them a bit different: just added export macros to
the corresponding classes and "implemented" the "Symbol::mangled ()"
method inline.
Let me know how it works out for you.
I didn't perform any tests so far.
Regards,
Oleg
I had tried compiling the shaders after making some changes to the
project
file for a path issues, but I didnt get any warnings or outputs from
inside
vs.
If I tried to run it from the command line, after finding all the
dependencies, it gave me a could not find file error. It didn't seem to
be a
dll error but something else.
I plan to look into it more now.
Jeremy
--
You received this message because you are subscribed to the Google Groups
"OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to
osl...@...<osl-dev%2B...@...>
.
For more options, visit this group at
http://groups.google.com/group/osl-dev?hl=en.


Re: Compiling OpenShadingLanguage under Windows

Wormszer <worm...@...>
 

Hello Oleg,

That's probably a better solution. Will need to go back and make sure the libraries export their classes rather than duplicating the source in the projects.
I was trying to avoid adding to much to the code for windows, and I was planning on after hopefully to get it all building, to then do a clean checkout and do it again and make changes to submit as a patch.

I played around some and finally got a shader to compile.

I have a solution but its not very elegant. The compiler is hard coded to use linux CPP preprocessor. /usr/bin/CPP
I was able to get it working using CL to do the pre-processing. But I had to write it to a file and read it back in rather than using stdout.
Its only a few lines of code extra and passes the file handle from the opened file to the parser instead of the _popen handle.

CL will output to stdout. When i was testing in in the debugger the issue was that by default the build environment is not setup.
You typically have to run the vcvars32.bat to set it up and that was causing some issues.

And then i am not positive if the stdout of CL parses correctly, it seems to insert the filename at the begining, but the file output does.

So maybe we can figure out a better solution to get it working with cl using stdout. It may work if you run oslc from a command prompt with the vs variables setup already.
I will try that next, except im expecting the process spawned by _popen won't have the environment setup correctly. And since it using stdout to read from then anything you do can't output or it gets parsed.

If you mess with it the parameters to CL i used are /P /C /EP /X /nologo and then /FI to force the include file. Remove /P if you want stdout instead of file.

Jeremy


On Wed, Jan 20, 2010 at 4:07 PM, Oleg <ode...@...> wrote:
Hi Jeremy,

> Great,
>
> Did you encounter the same linking issues i mentioned?
>

Yes, but I've solved them a bit different: just added export macros to
the corresponding classes and "implemented" the "Symbol::mangled ()"
method inline.

> Let me know how it works out for you.

I didn't perform any tests so far.

Regards,
Oleg

>
> I had tried compiling the shaders after making some changes to the project
> file for a path issues, but I didnt get any warnings or outputs from inside
> vs.
> If I tried to run it from the command line, after finding all the
> dependencies, it gave me a could not find file error. It didn't seem to be a
> dll error but something else.
>
> I plan to look into it more now.
>
> Jeremy
>

--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.




4901 - 4920 of 5005