Date   

Re: run flags vs active intervals

Larry Gritz <l...@...>
 

It's not really referring to hardware SIMD, but just that we are shading many points at once, "in lock-step."  In other words, if you had many points to shade, you could do it in this order:

    point 0:
        add
        texture
        assign
    point 1:
        add
        texture
        assign
    ...
    point n:
        add
        texture
        assign

or you could do it in this order:

    add all points on [0,n]
    texture all points on [0,n]
    assign all points on [0,n]

We call the latter "SIMD" (single instruction, multiple data), because each instruction operates on big arrays of data (one value for each point being shaded).

SIMD helps by:

   * increasing interpreter speed, by moving much of the interpreter overhead to "per batch" rather than "per point".
   * improving memory coherence, cache behavior, and allowing hardware SIMD, since it's naturally using "structure-of-array" layout, i.e. it's faster to add contiguous arrays than separate floats in more sparse memory locations.
   * improves texture coherence because you're doing lots of lookups from the same texture on all points at the same time (which are in turn likely to be on the same tiles).




On Jan 21, 2010, at 9:30 AM, Wormszer wrote:

Is there a good resource on this topic? I did some googling and didn't see what i was looking for.

Where is the actual SIMD taking place?
Is the compiler figuring it out from the loop and setting up the correct instructions. Or is it relying on the cpu to recognize consecutive ops with different data, modifying instructions in its pipe by combing instructions into a parallel one.

Or maybe i'm way off.

Thanks,

Jeremy

On Thu, Jan 21, 2010 at 12:16 PM, Larry Gritz <l...@...> wrote:
Awesome, Chris.  Would you believe we were just talking internally about this topic yesterday?  We were considering the amount of waste if there were big gaps of "off" points in the middle.  But I think your solution is quite a bit more elegant than what we were discussing (a list of "on" points).  I like how it devolves into (begin,end] for the common case of all points on.

It's a pretty big overhaul, touches every shadeop and a lot of templates, and the call signatures have to be changed (to what, exactly? passing a std::vector<int>& ? or a combo of int *begend and int segments?).  Ugh, and we may wish to change/add OIIO texture routines that take runflags, too.  But your experiment is quite convincing.

What does everybody else think?  Chris/Cliff/Alex?  (I'm happy to do the coding, but I want consensus because it touches so much.)

       -- lg


On Jan 21, 2010, at 8:21 AM, Chris Foster wrote:

> Hi all,
>
> I've been looking through the OSL source a little, and I'm interested to see
> that you're using runflags for the SIMD state.  I know that's a really
> conventional solution, but there's an alternative representation which I reckon
> is significantly faster, and I wondered if you considered it.
>
> Imagine a SIMD runstate as follows
>
> index: [ 0  1  2  3  4  5  6  7  8  9 ]
>
> flags: [ 0  0  1  1  1  1  0  0  1  0 ]
>
>
> An alternative to the flags is to represent this state as a list of active
> intervals.  As a set of active start/stop pairs, the state looks like
>
> active = [ 2 6  8 9 ]
>
> ie,
>
> state: [ 0  0  1  1  1  1  0  0  1  0 ]
>               ^           v     ^  v
>               2           6     8  9
>
> The advantage of doing this is that the inner SIMD loops become tighter since
> there's one less test.  Instead of
>
> for (int i = 0; i < len; ++i)
> {
>    if (state[i])
>        do_something(i);
> }
>
> we'd have
>
> for (int j = 0; j < nActiveTimes2; j+=2)
> {
>    for (int i = active[j]; i < active[j+1]; ++i)
>        do_something(i);
> }
>
> and the inner loop is now completely coherent.
>
> I can see you have the beginnings of this idea in the current OSL code, since
> you pass beginpoint and endpoint along with the runflags everywhere.  However,
> why not take it to its logical conclusion?
>
> Given that you already have beginpoint and endpoint, the particularly big win
> here is when most of the flags are turned on.  If do_something() is a simple
> arithmetic operation (eg, float addition) the difference between the two
> formulations can be a factor of two in speed.
>
>
> I'm attaching some basic test code which I whipped up.  Timings on my core2 duo
> with gcc -O3 look like:
>
>
> All flags on (completely coherent):
>
> run flags:        1.310s
> active intervals: 0.690s
>
>
> Random flags on (completely incoherent):
>
> run flags:        5.440s
> active intervals: 3.310s
>
>
> Alternate flags on (maximum number of active intervals -> worst case):
>
> run flags:        1.500s
> active intervals: 2.150s
>
>
> Thoughts?
>
> ~Chris.
> <ATT00001..txt><runstate.cpp>

--
Larry Gritz
l...@...





--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.




<ATT00001..htm>

--
Larry Gritz




Re: run flags vs active intervals

Wormszer <worm...@...>
 

Thanks, that makes sense. I know compilers are getting smarter all the time but wasn't sure if they were there yet. The cache and memory coherence makes sense, and then sorting things like that would allow for easier parallel implementations on GPU, etc.

Another quick question what is n, is n the total number of pixels in the image, or i possibly rays? And that's why you have disabled ops, pixels that shader is not applied too?
Is there a good place to look or read about that? Or do i need to dig into the source?

Is it ok to ask these questions in this thread like this or should I be starting a new one? I don't want to fill this thread with off topic information from the original questions? I am somewhat new to this process, and unclear on what is looked down upon.

Thanks
Jeremy


On Thu, Jan 21, 2010 at 12:44 PM, Larry Gritz <l...@...> wrote:
It's not really referring to hardware SIMD, but just that we are shading many points at once, "in lock-step."  In other words, if you had many points to shade, you could do it in this order:

    point 0:
        add
        texture
        assign
    point 1:
        add
        texture
        assign
    ...
    point n:
        add
        texture
        assign

or you could do it in this order:

    add all points on [0,n]
    texture all points on [0,n]
    assign all points on [0,n]

We call the latter "SIMD" (single instruction, multiple data), because each instruction operates on big arrays of data (one value for each point being shaded).

SIMD helps by:

   * increasing interpreter speed, by moving much of the interpreter overhead to "per batch" rather than "per point".
   * improving memory coherence, cache behavior, and allowing hardware SIMD, since it's naturally using "structure-of-array" layout, i.e. it's faster to add contiguous arrays than separate floats in more sparse memory locations.
   * improves texture coherence because you're doing lots of lookups from the same texture on all points at the same time (which are in turn likely to be on the same tiles).




On Jan 21, 2010, at 9:30 AM, Wormszer wrote:

Is there a good resource on this topic? I did some googling and didn't see what i was looking for.

Where is the actual SIMD taking place?
Is the compiler figuring it out from the loop and setting up the correct instructions. Or is it relying on the cpu to recognize consecutive ops with different data, modifying instructions in its pipe by combing instructions into a parallel one.

Or maybe i'm way off.

Thanks,

Jeremy

On Thu, Jan 21, 2010 at 12:16 PM, Larry Gritz <l...@...> wrote:
Awesome, Chris.  Would you believe we were just talking internally about this topic yesterday?  We were considering the amount of waste if there were big gaps of "off" points in the middle.  But I think your solution is quite a bit more elegant than what we were discussing (a list of "on" points).  I like how it devolves into (begin,end] for the common case of all points on.

It's a pretty big overhaul, touches every shadeop and a lot of templates, and the call signatures have to be changed (to what, exactly? passing a std::vector<int>& ? or a combo of int *begend and int segments?).  Ugh, and we may wish to change/add OIIO texture routines that take runflags, too.  But your experiment is quite convincing.

What does everybody else think?  Chris/Cliff/Alex?  (I'm happy to do the coding, but I want consensus because it touches so much.)

       -- lg


On Jan 21, 2010, at 8:21 AM, Chris Foster wrote:

> Hi all,
>
> I've been looking through the OSL source a little, and I'm interested to see
> that you're using runflags for the SIMD state.  I know that's a really
> conventional solution, but there's an alternative representation which I reckon
> is significantly faster, and I wondered if you considered it.
>
> Imagine a SIMD runstate as follows
>
> index: [ 0  1  2  3  4  5  6  7  8  9 ]
>
> flags: [ 0  0  1  1  1  1  0  0  1  0 ]
>
>
> An alternative to the flags is to represent this state as a list of active
> intervals.  As a set of active start/stop pairs, the state looks like
>
> active = [ 2 6  8 9 ]
>
> ie,
>
> state: [ 0  0  1  1  1  1  0  0  1  0 ]
>               ^           v     ^  v
>               2           6     8  9
>
> The advantage of doing this is that the inner SIMD loops become tighter since
> there's one less test.  Instead of
>
> for (int i = 0; i < len; ++i)
> {
>    if (state[i])
>        do_something(i);
> }
>
> we'd have
>
> for (int j = 0; j < nActiveTimes2; j+=2)
> {
>    for (int i = active[j]; i < active[j+1]; ++i)
>        do_something(i);
> }
>
> and the inner loop is now completely coherent.
>
> I can see you have the beginnings of this idea in the current OSL code, since
> you pass beginpoint and endpoint along with the runflags everywhere.  However,
> why not take it to its logical conclusion?
>
> Given that you already have beginpoint and endpoint, the particularly big win
> here is when most of the flags are turned on.  If do_something() is a simple
> arithmetic operation (eg, float addition) the difference between the two
> formulations can be a factor of two in speed.
>
>
> I'm attaching some basic test code which I whipped up.  Timings on my core2 duo
> with gcc -O3 look like:
>
>
> All flags on (completely coherent):
>
> run flags:        1.310s
> active intervals: 0.690s
>
>
> Random flags on (completely incoherent):
>
> run flags:        5.440s
> active intervals: 3.310s
>
>
> Alternate flags on (maximum number of active intervals -> worst case):
>
> run flags:        1.500s
> active intervals: 2.150s
>
>
> Thoughts?
>
> ~Chris.
> <ATT00001..txt><runstate.cpp>

--
Larry Gritz
l...@...





--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.




<ATT00001..htm>

--
Larry Gritz




--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.



Re: run flags vs active intervals

Xavier Ho <con...@...>
 

On Fri, Jan 22, 2010 at 4:15 AM, Wormszer <worm...@...> wrote:
Is it ok to ask these questions in this thread like this or should I be starting a new one? I don't want to fill this thread with off topic information from the original questions? I am somewhat new to this process, and unclear on what is looked down upon.

As a uni student who is keen on the OSL open source release, I'm glad to be reading these discussions - particularly questions and answers. They're the source of knowledge, freely shared insights. It excites the community in its common understanding of this project, and benefits those who, like myself, digest the information that go through mailing lists. Not to mention this is probably archived, and later on searchable if you ever need it again. My opinion is, if it's related, it's probably okay. Unless you intend to start a newer/bigger conversation, there isn't really a need to start a new thread. Having multiple threads on similar topics only make digging information harder, anyhow.

My 2 cp,
Xavier


Re: run flags vs active intervals

Christopher <cku...@...>
 

I like this idea too. What we were discussing yesterday was something
like:

index: [ 0 1 2 3 4 5 6 7 8 9 ]
flags: [ 0 0 1 1 1 1 0 0 1 0 ]
active_points: [ 2 3 4 5 8 ]

for (int i = 0; i < num_active; i++)
do_something(active_points[i]);

This would be slightly more efficient if you had a single point active
(or isolated single points), but it requires a lot more indirections
in the common case.

So I'm all in favor of trying this out - though it is a pretty big
overhaul ...

I would maintain these ranges as little int arrays on the stack just
like we maintain the runflags now. The upper bound for the
active_range array size is just npoints (run flags: on off on off on
off ...) - flip any bit and you get fewer "runs".

-Chris

On Jan 21, 9:16 am, Larry Gritz <l...@...> wrote:
Awesome, Chris.  Would you believe we were just talking internally about this topic yesterday?  We were considering the amount of waste if there were big gaps of "off" points in the middle.  But I think your solution is quite a bit more elegant than what we were discussing (a list of "on" points).  I like how it devolves into (begin,end] for the common case of all points on.

It's a pretty big overhaul, touches every shadeop and a lot of templates, and the call signatures have to be changed (to what, exactly? passing a std::vector<int>& ? or a combo of int *begend and int segments?).  Ugh, and we may wish to change/add OIIO texture routines that take runflags, too.  But your experiment is quite convincing.

What does everybody else think?  Chris/Cliff/Alex?  (I'm happy to do the coding, but I want consensus because it touches so much.)

        -- lg

On Jan 21, 2010, at 8:21 AM, Chris Foster wrote:



Hi all,
I've been looking through the OSL source a little, and I'm interested to see
that you're using runflags for the SIMD state.  I know that's a really
conventional solution, but there's an alternative representation which I reckon
is significantly faster, and I wondered if you considered it.
Imagine a SIMD runstate as follows
index: [ 0  1  2  3  4  5  6  7  8  9 ]
flags: [ 0  0  1  1  1  1  0  0  1  0 ]
An alternative to the flags is to represent this state as a list of active
intervals.  As a set of active start/stop pairs, the state looks like
active = [ 2 6  8 9 ]
ie,
state: [ 0  0  1  1  1  1  0  0  1  0 ]
              ^           v     ^  v
              2           6     8  9
The advantage of doing this is that the inner SIMD loops become tighter since
there's one less test.  Instead of
for (int i = 0; i < len; ++i)
{
   if (state[i])
       do_something(i);
}
we'd have
for (int j = 0; j < nActiveTimes2; j+=2)
{
   for (int i = active[j]; i < active[j+1]; ++i)
       do_something(i);
}
and the inner loop is now completely coherent.
I can see you have the beginnings of this idea in the current OSL code, since
you pass beginpoint and endpoint along with the runflags everywhere.  However,
why not take it to its logical conclusion?
Given that you already have beginpoint and endpoint, the particularly big win
here is when most of the flags are turned on.  If do_something() is a simple
arithmetic operation (eg, float addition) the difference between the two
formulations can be a factor of two in speed.
I'm attaching some basic test code which I whipped up.  Timings on my core2 duo
with gcc -O3 look like:
All flags on (completely coherent):
run flags:        1.310s
active intervals: 0.690s
Random flags on (completely incoherent):
run flags:        5.440s
active intervals: 3.310s
Alternate flags on (maximum number of active intervals -> worst case):
run flags:        1.500s
active intervals: 2.150s
Thoughts?
~Chris.
<ATT00001..txt><runstate.cpp>
--
Larry Gritz
l...@...


Re: run flags vs active intervals

Larry Gritz <l...@...>
 

I want to point out that Chris F tested out the "all on", "random on", and "alternating on/off" cases. There's one more case that may be important, which is a few isolated "on" points with big "off" gaps in between -- and in that case, I expect Chris F's solution to perform even better (compared to what we have now) than the other cases.

I'm not looking forward to doing this overhaul (only because it's tedious and extensive, there's nothing hard about it), but I think it's potentially a big win. Thanks, Chris!

-- lg


On Jan 21, 2010, at 10:49 AM, Christopher wrote:

I like this idea too. What we were discussing yesterday was something
like:

index: [ 0 1 2 3 4 5 6 7 8 9 ]
flags: [ 0 0 1 1 1 1 0 0 1 0 ]
active_points: [ 2 3 4 5 8 ]

for (int i = 0; i < num_active; i++)
do_something(active_points[i]);

This would be slightly more efficient if you had a single point active
(or isolated single points), but it requires a lot more indirections
in the common case.

So I'm all in favor of trying this out - though it is a pretty big
overhaul ...

I would maintain these ranges as little int arrays on the stack just
like we maintain the runflags now. The upper bound for the
active_range array size is just npoints (run flags: on off on off on
off ...) - flip any bit and you get fewer "runs".

-Chris


On Jan 21, 9:16 am, Larry Gritz <l...@...> wrote:
Awesome, Chris. Would you believe we were just talking internally about this topic yesterday? We were considering the amount of waste if there were big gaps of "off" points in the middle. But I think your solution is quite a bit more elegant than what we were discussing (a list of "on" points). I like how it devolves into (begin,end] for the common case of all points on.

It's a pretty big overhaul, touches every shadeop and a lot of templates, and the call signatures have to be changed (to what, exactly? passing a std::vector<int>& ? or a combo of int *begend and int segments?). Ugh, and we may wish to change/add OIIO texture routines that take runflags, too. But your experiment is quite convincing.

What does everybody else think? Chris/Cliff/Alex? (I'm happy to do the coding, but I want consensus because it touches so much.)

-- lg

On Jan 21, 2010, at 8:21 AM, Chris Foster wrote:



Hi all,
I've been looking through the OSL source a little, and I'm interested to see
that you're using runflags for the SIMD state. I know that's a really
conventional solution, but there's an alternative representation which I reckon
is significantly faster, and I wondered if you considered it.
Imagine a SIMD runstate as follows
index: [ 0 1 2 3 4 5 6 7 8 9 ]
flags: [ 0 0 1 1 1 1 0 0 1 0 ]
An alternative to the flags is to represent this state as a list of active
intervals. As a set of active start/stop pairs, the state looks like
active = [ 2 6 8 9 ]
ie,
state: [ 0 0 1 1 1 1 0 0 1 0 ]
^ v ^ v
2 6 8 9
The advantage of doing this is that the inner SIMD loops become tighter since
there's one less test. Instead of
for (int i = 0; i < len; ++i)
{
if (state[i])
do_something(i);
}
we'd have
for (int j = 0; j < nActiveTimes2; j+=2)
{
for (int i = active[j]; i < active[j+1]; ++i)
do_something(i);
}
and the inner loop is now completely coherent.
I can see you have the beginnings of this idea in the current OSL code, since
you pass beginpoint and endpoint along with the runflags everywhere. However,
why not take it to its logical conclusion?
Given that you already have beginpoint and endpoint, the particularly big win
here is when most of the flags are turned on. If do_something() is a simple
arithmetic operation (eg, float addition) the difference between the two
formulations can be a factor of two in speed.
I'm attaching some basic test code which I whipped up. Timings on my core2 duo
with gcc -O3 look like:
All flags on (completely coherent):
run flags: 1.310s
active intervals: 0.690s
Random flags on (completely incoherent):
run flags: 5.440s
active intervals: 3.310s
Alternate flags on (maximum number of active intervals -> worst case):
run flags: 1.500s
active intervals: 2.150s
Thoughts?
~Chris.
<ATT00001..txt><runstate.cpp>
--
Larry Gritz
l...@...
<ATT00001..txt>
--
Larry Gritz
l...@...


Add derivatives to I in shader globals (issue186262)

aco...@...
 

Reviewers: osl-dev_googlegroups.com,

Description:
We were missing the derivatives in the I field which is important for
background shader. This little patch fixes the problem.

Please review this at http://codereview.appspot.com/186262/show

Affected files:
src/include/oslexec.h
src/liboslexec/exec.cpp


Index: src/include/oslexec.h
===================================================================
--- src/include/oslexec.h (revision 538)
+++ src/include/oslexec.h (working copy)
@@ -195,6 +195,7 @@ public:
VaryingRef<Vec3> P; ///< Position
VaryingRef<Vec3> dPdx, dPdy; ///< Partials
VaryingRef<Vec3> I; ///< Incident ray
+ VaryingRef<Vec3> dIdx, dIdy; ///< Partial derivatives for I
VaryingRef<Vec3> N; ///< Shading normal
VaryingRef<Vec3> Ng; ///< True geometric normal
VaryingRef<float> u, v; ///< Surface parameters
Index: src/liboslexec/exec.cpp
===================================================================
--- src/liboslexec/exec.cpp (revision 538)
+++ src/liboslexec/exec.cpp (working copy)
@@ -195,8 +195,16 @@ ShadingExecution::bind (ShadingContext *context, ShaderUse use,
sym.data (globals->P.ptr()); sym.step (globals->P.step());
}
} else if (sym.name() == Strings::I) {
- sym.has_derivs (false);
- sym.data (globals->I.ptr()); sym.step (globals->I.step());
+ if (globals->dIdx.ptr() && globals->dIdy.ptr()) {
+ sym.has_derivs (true);
+ void *addr = m_context->heap_allot (sym, true);
+ VaryingRef<Dual2<Vec3> > I ((Dual2<Vec3> *)addr, sym.step());
+ for (int i = 0; i < npoints(); ++i)
+ I[i].set (globals->I[i], globals->dIdx[i], globals->dIdy[i]);
+ } else {
+ sym.has_derivs (false);
+ sym.data (globals->I.ptr()); sym.step (globals->I.step());
+ }
} else if (sym.name() == Strings::N) {
sym.has_derivs (false);
sym.data (globals->N.ptr()); sym.step (globals->N.step());


Re: Add derivatives to I in shader globals (issue186262)

cku...@...
 


Re: run flags vs active intervals

Chris Foster <chri...@...>
 

On Fri, Jan 22, 2010 at 4:56 AM, Larry Gritz <l...@...> wrote:
I want to point out that Chris F tested out the "all on", "random on", and
"alternating on/off" cases.  There's one more case that may be important,
which is a few isolated "on" points with big "off" gaps in between -- and in
that case, I expect Chris F's solution to perform even better (compared to
what we have now) than the other cases.
Right, here's the initialization code for some sparse on-states:

// Four isolated flags turned on.
std::fill((Runflag*)r, r+len, (Runflag)Runflag_Off);
r[0] = r[50] = r[100] = r[150] = Runflag_On;

results for this sparse case:

run flags: 1.710s
active intervals: 0.100s

Of course in this case, the run flags completely fail to captialize on the
fact that most of the shading elements are turned off, so the active intervals
formulation thrashes it. Out of curiosity, I've also implemented the direct
indexing array Chris K suggested (code attached, see the function
addIndexed() ):

active index: 0.050s

as expected, blazingly fast here!


Here's the rest of the benchmarks redone with the active indexing method added:


All flags on (completely coherent):

run flags: 1.310s
active intervals: 0.690s
active index: 1.330s


Random flags on (completely incoherent):

run flags: 5.440s
active intervals: 3.310s
active index: 0.760s


Alternate flags on (maximum number of active intervals):

run flags: 1.500s
active intervals: 2.150s
active index: 0.710s


The results are quite interesting. They suggest that active indexing is likely
to be faster for highly incoherent data, but that active intervals is the clear
winnier when all your flags are on.

The random flags benchmark is particularly interesting to me because the active
intervals formulation does a lot worse than the active index in this case.


As for the implementation, you can potentially have the ability to change
between any of these run state implementations if you introduce an iterator
abstraction. The tricky thing seems to be making sure the abstraction is
getting optimized as efficiently as the plain loops.

Here's my crack at an active intervals iterator class, but alas, the benchmarks
show that this gives a time of 1.170s for the all-on case compared to 0.68s for
the loop-based implementation.


class RunStateIter
{
private:
const int* m_intervals; ///< active intervals
const int* m_intervalsEnd; ///< end of active intervals
int m_currEnd; ///< end of current interval
int m_idx; ///< current index
public:
RunStateIter(const int* intervals, int nIntervals)
: m_intervals(intervals),
m_intervalsEnd(intervals + 2*nIntervals),
m_currEnd(nIntervals ? intervals[1] : 0),
m_idx(nIntervals ? intervals[0] : 0)
{ }

RunStateIter& operator++()
{
++m_idx;
if (m_idx >= m_currEnd) {
m_intervals += 2;
if (m_intervals < m_intervalsEnd) {
m_idx = m_intervals[0];
m_currEnd = m_intervals[1];
}
}
return *this;
}

bool valid() { return m_idx < m_currEnd; }

int operator*() { return m_idx; }
};


Now why is this? The above is essentially the loop-based code but unravelled
into an iterator. I had to look at the assembly code to find out, and it turns
out that the compiler is optimizing addIntervals() using hardware SIMD! (I spy
lots of movlps and an addps in there.)

Ack! I hadn't expected that. I was imagining that the efficiency gains in the
all-on case came from improving branch prediciton or some such. Oops :-)
Using the flag -fno-tree-vectorize causes performance of the active intervals
code in the all-on case to devolve to 1.33s, just the same as the runflags
code.

So, this changes a few things, because (1) I guess there's not that many
operations which the compiler can produce hardware SIMD code for, so the
efficiency gains I've just shown may evaporate if user-defined types come into
play and (2) the compiler (g++) seems to have trouble producing hardware SIMD
code when an iterator abstraction is involved. That's a pity because using an
iterator would let you reimplement this stuff once and decide what the iterator
implementation should be later, based on real benchmarks.

Hum, so suddenly the way forward doesn't seem quite so clear anymore. The
active index method starts to look very attractive if we discount hardware
vectorization.

~Chris.


Re: run flags vs active intervals

Wormszer <worm...@...>
 

That's interesting, it kind of relates to my original question if the compiler was able to apply SIMD operations to the loop.
When you disabled vectorization did it effect the active index case?

Are those numbers taking into account the setup time to create either the active index or the intervals? Or is it basically the same for each method?

The iterator idea crossed my mind too but I too wouldn't of expected it to have such a performance hit either. I guess it prevents the compiler from unrolling the loop?
I wonder if the way you use the iterator is having an effect, where a for(begin, end, ++) implementation etc, if the compiler would do something different.

It looks like active index is the way to go, i wonder why it doesn't perform as well on the full range, is it because of the indirection that the compiler won't vectorize it? That the memory addresses may not be consecutive?

If the two methods perform well under different conditions is there enough of a benefit to say implement both, active intervals/indexs? Or a hybrid,  Active index, and if the # of index's == N, then its all on and could just do a loop without indirection and use a vectorized code path.

Is there enough coherence between frame to frame, execution to execution, that you could possibly score the run and use that method the next time?
Sort of like branch prediction, have some method to measure the coherence or incoherence of the current run to predict the next, even occasionally.


Jeremy


On Thu, Jan 21, 2010 at 8:36 PM, Chris Foster <chri...@...> wrote:
On Fri, Jan 22, 2010 at 4:56 AM, Larry Gritz <l...@...> wrote:
> I want to point out that Chris F tested out the "all on", "random on", and
> "alternating on/off" cases.  There's one more case that may be important,
> which is a few isolated "on" points with big "off" gaps in between -- and in
> that case, I expect Chris F's solution to perform even better (compared to
> what we have now) than the other cases.

Right, here's the initialization code for some sparse on-states:

   // Four isolated flags turned on.
   std::fill((Runflag*)r, r+len, (Runflag)Runflag_Off);
   r[0] = r[50] = r[100] = r[150] = Runflag_On;

results for this sparse case:

run flags:        1.710s
active intervals: 0.100s

Of course in this case, the run flags completely fail to captialize on the
fact that most of the shading elements are turned off, so the active intervals
formulation thrashes it.  Out of curiosity, I've also implemented the direct
indexing array Chris K suggested (code attached, see the function
addIndexed() ):

active index:     0.050s

as expected, blazingly fast here!


Here's the rest of the benchmarks redone with the active indexing method added:


All flags on (completely coherent):

run flags:        1.310s
active intervals: 0.690s
active index:     1.330s


Random flags on (completely incoherent):

run flags:        5.440s
active intervals: 3.310s
active index:     0.760s


Alternate flags on (maximum number of active intervals):

run flags:        1.500s
active intervals: 2.150s
active index:     0.710s


The results are quite interesting.  They suggest that active indexing is likely
to be faster for highly incoherent data, but that active intervals is the clear
winnier when all your flags are on.

The random flags benchmark is particularly interesting to me because the active
intervals formulation does a lot worse than the active index in this case.


As for the implementation, you can potentially have the ability to change
between any of these run state implementations if you introduce an iterator
abstraction.  The tricky thing seems to be making sure the abstraction is
getting optimized as efficiently as the plain loops.

Here's my crack at an active intervals iterator class, but alas, the benchmarks
show that this gives a time of 1.170s for the all-on case compared to 0.68s for
the loop-based implementation.


class RunStateIter
{
   private:
       const int* m_intervals; ///< active intervals
       const int* m_intervalsEnd; ///< end of active intervals
       int m_currEnd;          ///< end of current interval
       int m_idx;              ///< current index
   public:
       RunStateIter(const int* intervals, int nIntervals)
           : m_intervals(intervals),
           m_intervalsEnd(intervals + 2*nIntervals),
           m_currEnd(nIntervals ? intervals[1] : 0),
           m_idx(nIntervals ? intervals[0] : 0)
       { }

       RunStateIter& operator++()
       {
           ++m_idx;
           if (m_idx >= m_currEnd) {
               m_intervals += 2;
               if (m_intervals < m_intervalsEnd) {
                   m_idx = m_intervals[0];
                   m_currEnd = m_intervals[1];
               }
           }
           return *this;
       }

       bool valid() { return m_idx < m_currEnd; }

       int operator*() { return m_idx; }
};


Now why is this?  The above is essentially the loop-based code but unravelled
into an iterator.  I had to look at the assembly code to find out, and it turns
out that the compiler is optimizing addIntervals() using hardware SIMD!  (I spy
lots of movlps and an addps in there.)

Ack!  I hadn't expected that.  I was imagining that the efficiency gains in the
all-on case came from improving branch prediciton or some such.  Oops :-)
Using the flag -fno-tree-vectorize causes performance of the active intervals
code in the all-on case to devolve to 1.33s, just the same as the runflags
code.

So, this changes a few things, because (1) I guess there's not that many
operations which the compiler can produce hardware SIMD code for, so the
efficiency gains I've just shown may evaporate if user-defined types come into
play and (2) the compiler (g++) seems to have trouble producing hardware SIMD
code when an iterator abstraction is involved.  That's a pity because using an
iterator would let you reimplement this stuff once and decide what the iterator
implementation should be later, based on real benchmarks.

Hum, so suddenly the way forward doesn't seem quite so clear anymore.  The
active index method starts to look very attractive if we discount hardware
vectorization.

~Chris.

--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.



Re: Compiling OpenShadingLanguage under Windows

Wormszer <worm...@...>
 

I have been trying to get the shaders test to build but I have run into lots of issues with linking dependencies, exports imports etc.

The projects cmake generates for example oslquery uses source files from olsexec. This is causing issues with the DLL_PUBLIC etc type of declerations, because other projects use oslexec as a library to import.
I guess this may work fine on GCC but windows is pitches a fit, because in one project the defines are correct import/export in the other they are import when they should be nothing or export.

Are the source files being duplicated in the projects on purpose? Is this just a cmake issue? 
Is this just an issue with VS because GCC doesn;t care and just links everything together?

I think i should be able to rearrange the import/export types for windows, and then make olsquery have a depencency to oslexec instead of rebuilding the lexer etc. And for any of the other projects(i think the others are better).
And make sure the projects use the library imports rather than building the code themselves?

Or should i get them to build duplicating the code in areas and just fix the special cases with more defines.

I prefer the first option, and at first i could probably isolate it to windows in cmake and the source. But maybe its a issue for gcc as well in some areas.
Or maybe oslquery needs to run standalone you dont want to inlcude a dll/so.

Any thoughts?

Jeremy


On Thu, Jan 21, 2010 at 12:39 AM, Wormszer <worm...@...> wrote:

I have mcpp integrated into the compiler now. Working like CPP and writing everything to stdout.

A few things, the binary available for download at least on windows doesn't support forcing an include file.
So i had to build it from source, luckily it wasn't to difficult to create a new project for it.

I built it as VS for VS. Now that I think about it I bet I could build it as VS for GCC so that it would support the same command line options.

Another weird issue was I guess the lexer doesn't support  c-style /* comments? These might not be defined in the osl language, and if so probably should be removed from the stdosl.h.
Its the only difference i could see, that could of been causing the error.

CL would remove the */ */ comments at the end of the math defines in stdosl.h even though i told it to preserve comments.

mcpp when i told it to leave comments would actually leave them in there(seems like correct behavior), and then the lexer would crash.

I am not sure why comments are needed really at this point, and i think i must of included them for my own debugging, i don't think CPP was set to output them.

Thanks for the suggestions on a easier solution to the pre-processor issue.

I looked real quick for the boost but mcpp seemed easier since i could just get a binary and set it in my path. (but then i had to rebuild it)

Well now to see if I can get anything to happen with my compiled shaders.

Jeremy



On Wed, Jan 20, 2010 at 9:03 PM, Wormszer <worm...@...> wrote:
I am fine with either one. I think having something embedded or buildable would be usefull.

Otherwise there maybe issues with different compilers and would probably need some kind of config or something that cmake would generate at least on windows with several versions of VS etc.

Will just have to see how Larry or the other devs feel about using one of those two for Linux builds as well. I would assume it would be wise to have all the preprocessing done with the same tool when possible.

I will look at both real quick but I might lean towards mcpp.


On Jan 20, 2010, at 8:14 PM, Chris Foster <chri...@...> wrote:

On Thu, Jan 21, 2010 at 11:02 AM, Blair Zajac <bl...@...> wrote:
The main annoyance with wave is that it causes the compiler to issue truly
horrendous error messages if you get things wrong (the wave internals make
heavy use of template metaprogramming).

(Obviously this is only a problem when integrating wave into the project
source, and that's not really difficult at all.)

There's mcpp which is designed to be an embeddable C-preprocessor.

Ice, which we use at Sony Imageworks for all our middle-tier systems, uses
mcpp for it's own internal IDL type language, so I would recommend that.

mcpp looks nice.  In some sense, using wave would mean one less dependency
since OSL already relies on boost, but it does mean linking to libboost-wave,
so if you have a modular boost install the point may be moot...

~Chris

PS: Sorry to Blair for the duplicate message.  I intended to send it
to the list :-(
--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.





Error for bad connection type rather than assertion. (issue193063)

cku...@...
 

LGTM

Having dev-osl@imagework usually works for me, but I don't see this
mail.

http://codereview.appspot.com/193063/show


Re: run flags vs active intervals

Chris Foster <chri...@...>
 

On Fri, Jan 22, 2010 at 5:22 PM, Wormszer <worm...@...> wrote:
That's interesting, it kind of relates to my original question if the
compiler was able to apply SIMD operations to the loop.
When you disabled vectorization did it effect the active index case?
No, the active index case isn't vectorized by the compiler anyway.

Are those numbers taking into account the setup time to create either the
active index or the intervals?
No. In a SIMD shader machine I generally expect that creating the runstate
representation (whatever it may be) from the results of a conditional is going
to be a relatively small proportion of the total runtime.

The iterator idea crossed my mind too but I too wouldn't of expected it to
have such a performance hit either. I guess it prevents the compiler from
unrolling the loop?
Not unrolling, vectorizing - the way I wrote the iterator appears to prevent
the compiler vectorizing the loop using SSE.

I wonder if the way you use the iterator is having an effect, where a
for(begin, end, ++) implementation etc, if the compiler would do something
different.
I don't know. I know nothing about how gcc's tree vectorizer works. If it's
enabled by a heuristic whenever it sees special "simple" uses of the for loop,
then any iterator abstraction is unlikely to work.

It looks like active index is the way to go,
If hardware (SSE) vectorization isn't going to be on the cards for most
operations, I think the active index method is looking like a winner.
Generally speaking it seems to have more reliable performance characteristics,
especially in the face of incoherence.

i wonder why it doesn't perform
as well on the full range, is it because of the indirection that the
compiler won't vectorize it? That the memory addresses may not be
consecutive?
Yes, I think that's the reason.

If the two methods perform well under different conditions is there enough
of a benefit to say implement both, active intervals/indexs? Or a hybrid,
Active index, and if the # of index's == N, then its all on and could just
do a loop without indirection and use a vectorized code path.
I considered something like this too. IMHO, making things this complex
requires that the SIMD state iteration should be abstracted, but an iterator
abstraction isn't appropriate in that case.

Is there enough coherence between frame to frame, execution to execution,
that you could possibly score the run and use that method the next time?
Sort of like branch prediction, have some method to measure the coherence or
incoherence of the current run to predict the next, even occasionally.
That doesn't sound like it would help to me ;-) You need to modify the
runstate at every conditional branch anyway, so it's possible to analyse it for
coherence during modification, if necessary.

~Chris.


Re: run flags vs active intervals

Wormszer <worm...@...>
 

 
Not unrolling, vectorizing - the way I wrote the iterator appears to prevent
the compiler vectorizing the loop using SSE.

I guess i was thinking of the vectorization being a type of unrolling, not really in the correct sense i guess.
Where it was expanding it by say by 4 or how ever wide the vector operator is, reducing the number of iterations total.
On the add case i was assuming it would vectorize something like  i ,i+1, i+2, i+3.

If hardware (SSE) vectorization isn't going to be on the cards for most
operations, I think the active index method is looking like a winner.
Generally speaking it seems to have more reliable performance characteristics,
especially in the face of incoherence.

As for this and the rest, I don't know enough about the system yet and how it actually works. I was basing it more on your test code and some of the earlier discussion on SIMD shaders.
After looking at your numbers more if the only case that performed better was the all on, because of the vectorization. Then monitoring and predicting wouldn't help.
Because it would just be a simple check # = N. And from your example i was thinking might have two code paths.
So for your add it would be like

if(nActive==nTotal)  //vectorized path
    for (int j = 0; j < nActive; ++j) {
        c[j] = a[j] + b[j];
    }
else //non-vectorized path
    for (int j = 0; j < nActive; ++j) {
        int i = activeIndex[j];
        c[i] = a[i] + b[i];
    }
In a case where the operator couldn't vectorize etc, you would only need the one option.

But things probably are not that simple and im sure there is a lot more going on that I am missing.

Jeremy

On Fri, Jan 22, 2010 at 10:32 PM, Chris Foster <chri...@...> wrote:
On Fri, Jan 22, 2010 at 5:22 PM, Wormszer <worm...@...> wrote:
> That's interesting, it kind of relates to my original question if the
> compiler was able to apply SIMD operations to the loop.
> When you disabled vectorization did it effect the active index case?

No, the active index case isn't vectorized by the compiler anyway.

> Are those numbers taking into account the setup time to create either the
> active index or the intervals?

No.  In a SIMD shader machine I generally expect that creating the runstate
representation (whatever it may be) from the results of a conditional is going
to be a relatively small proportion of the total runtime.

> The iterator idea crossed my mind too but I too wouldn't of expected it to
> have such a performance hit either. I guess it prevents the compiler from
> unrolling the loop?

Not unrolling, vectorizing - the way I wrote the iterator appears to prevent
the compiler vectorizing the loop using SSE.

> I wonder if the way you use the iterator is having an effect, where a
> for(begin, end, ++) implementation etc, if the compiler would do something
> different.

I don't know.  I know nothing about how gcc's tree vectorizer works.  If it's
enabled by a heuristic whenever it sees special "simple" uses of the for loop,
then any iterator abstraction is unlikely to work.

> It looks like active index is the way to go,

If hardware (SSE) vectorization isn't going to be on the cards for most
operations, I think the active index method is looking like a winner.
Generally speaking it seems to have more reliable performance characteristics,
especially in the face of incoherence.

> i wonder why it doesn't perform
> as well on the full range, is it because of the indirection that the
> compiler won't vectorize it? That the memory addresses may not be
> consecutive?

Yes, I think that's the reason.

> If the two methods perform well under different conditions is there enough
> of a benefit to say implement both, active intervals/indexs? Or a hybrid,
> Active index, and if the # of index's == N, then its all on and could just
> do a loop without indirection and use a vectorized code path.

I considered something like this too.  IMHO, making things this complex
requires that the SIMD state iteration should be abstracted, but an iterator
abstraction isn't appropriate in that case.

> Is there enough coherence between frame to frame, execution to execution,
> that you could possibly score the run and use that method the next time?
> Sort of like branch prediction, have some method to measure the coherence or
> incoherence of the current run to predict the next, even occasionally.

That doesn't sound like it would help to me ;-)  You need to modify the
runstate at every conditional branch anyway, so it's possible to analyse it for
coherence during modification, if necessary.

~Chris.

--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.



Re: run flags vs active intervals

Chris Foster <chri...@...>
 

On Sat, Jan 23, 2010 at 2:19 PM, Wormszer <worm...@...> wrote:


Not unrolling, vectorizing - the way I wrote the iterator appears to
prevent
the compiler vectorizing the loop using SSE.
I guess i was thinking of the vectorization being a type of unrolling, not
really in the correct sense i guess.
Fair enough. Vectorization does imply unrolling the loop (by a factor of
4 for SSE), though loop unrolling can sometimes be a useful optimization
without hardware vectorization.

The active index idea does permit loop unrolling, but not necessarily
vectorization.

If hardware (SSE) vectorization isn't going to be on the cards for most
operations, I think the active index method is looking like a winner.
Generally speaking it seems to have more reliable performance
characteristics,
especially in the face of incoherence.
As for this and the rest, I don't know enough about the system yet and how
it actually works. I was basing it more on your test code and some of the
earlier discussion on SIMD shaders.
After looking at your numbers more if the only case that performed better
was the all on, because of the vectorization. Then monitoring and predicting
wouldn't help.
Because it would just be a simple check # = N. And from your example i was
thinking might have two code paths.
So for your add it would be like

if(nActive==nTotal)  //vectorized path
    for (int j = 0; j < nActive; ++j) {
        c[j] = a[j] + b[j];
    }
else //non-vectorized path
    for (int j = 0; j < nActive; ++j) {
        int i = activeIndex[j];
        c[i] = a[i] + b[i];
    }
Yeah. TBH I haven't studied the code enough to know whether the
vectorized path can be realized using the OSL data structures. If the
arrays are actually given as VaryingRefs then any vectorization attempt
is likely to be dead in the water since VaryingRef has a stride which is
determined at runtime.

~Chris


Re: Compiling OpenShadingLanguage under Windows

Wormszer <worm...@...>
 

I finally have it all building on windows and can run the testshade program.

I had to create some new OSL_DLLPUBLIC defines and place them around, OSLCOMP_DLLPUBLIC, OSLEXEC_DLLPUBLIC

liboslcomp exports some of the base types, and common functions. Basically if oslcomp uses it, then it exports it and anything that was duplicated in oslexec now imports it from oslcomp.
liboslexec uses the objects exported from oslcomp, typespec and a few others.
liboslquery now has dependencies on oslexec, oslcomp.

I don't believe this will affect the GCC build since they are defined to nothing, other than the source has more OSLCOMP_DLLPUBLIC, OSLEXEC_DLLPUBLIC in it.
I did move typespec.cpp to liboslcomp since its used there first.

Due to the way testshader is it makes use of oslexec private includes and I had to basically export a bunch of classes in oslexec private and public headers.
I guess for a test program this is ok, but to me it seems like its breaking the rules, by using classes in oslexec_pvt.h, and if they are indeed public should be moved to the public header
PVT does mean private right? :).

I am probably going to redo everything I did and then submit a patch. If anyone else has found any issues or a better solution for some of the issues in this thread please let me know.


Jeremy


On Fri, Jan 22, 2010 at 12:15 PM, Wormszer <worm...@...> wrote:
I have been trying to get the shaders test to build but I have run into lots of issues with linking dependencies, exports imports etc.

The projects cmake generates for example oslquery uses source files from olsexec. This is causing issues with the DLL_PUBLIC etc type of declerations, because other projects use oslexec as a library to import.
I guess this may work fine on GCC but windows is pitches a fit, because in one project the defines are correct import/export in the other they are import when they should be nothing or export.

Are the source files being duplicated in the projects on purpose? Is this just a cmake issue? 
Is this just an issue with VS because GCC doesn;t care and just links everything together?

I think i should be able to rearrange the import/export types for windows, and then make olsquery have a depencency to oslexec instead of rebuilding the lexer etc. And for any of the other projects(i think the others are better).
And make sure the projects use the library imports rather than building the code themselves?

Or should i get them to build duplicating the code in areas and just fix the special cases with more defines.

I prefer the first option, and at first i could probably isolate it to windows in cmake and the source. But maybe its a issue for gcc as well in some areas.
Or maybe oslquery needs to run standalone you dont want to inlcude a dll/so.

Any thoughts?

Jeremy



On Thu, Jan 21, 2010 at 12:39 AM, Wormszer <worm...@...> wrote:

I have mcpp integrated into the compiler now. Working like CPP and writing everything to stdout.

A few things, the binary available for download at least on windows doesn't support forcing an include file.
So i had to build it from source, luckily it wasn't to difficult to create a new project for it.

I built it as VS for VS. Now that I think about it I bet I could build it as VS for GCC so that it would support the same command line options.

Another weird issue was I guess the lexer doesn't support  c-style /* comments? These might not be defined in the osl language, and if so probably should be removed from the stdosl.h.
Its the only difference i could see, that could of been causing the error.

CL would remove the */ */ comments at the end of the math defines in stdosl.h even though i told it to preserve comments.

mcpp when i told it to leave comments would actually leave them in there(seems like correct behavior), and then the lexer would crash.

I am not sure why comments are needed really at this point, and i think i must of included them for my own debugging, i don't think CPP was set to output them.

Thanks for the suggestions on a easier solution to the pre-processor issue.

I looked real quick for the boost but mcpp seemed easier since i could just get a binary and set it in my path. (but then i had to rebuild it)

Well now to see if I can get anything to happen with my compiled shaders.

Jeremy



On Wed, Jan 20, 2010 at 9:03 PM, Wormszer <worm...@...> wrote:
I am fine with either one. I think having something embedded or buildable would be usefull.

Otherwise there maybe issues with different compilers and would probably need some kind of config or something that cmake would generate at least on windows with several versions of VS etc.

Will just have to see how Larry or the other devs feel about using one of those two for Linux builds as well. I would assume it would be wise to have all the preprocessing done with the same tool when possible.

I will look at both real quick but I might lean towards mcpp.


On Jan 20, 2010, at 8:14 PM, Chris Foster <chri...@...> wrote:

On Thu, Jan 21, 2010 at 11:02 AM, Blair Zajac <bl...@...> wrote:
The main annoyance with wave is that it causes the compiler to issue truly
horrendous error messages if you get things wrong (the wave internals make
heavy use of template metaprogramming).

(Obviously this is only a problem when integrating wave into the project
source, and that's not really difficult at all.)

There's mcpp which is designed to be an embeddable C-preprocessor.

Ice, which we use at Sony Imageworks for all our middle-tier systems, uses
mcpp for it's own internal IDL type language, so I would recommend that.

mcpp looks nice.  In some sense, using wave would mean one less dependency
since OSL already relies on boost, but it does mean linking to libboost-wave,
so if you have a modular boost install the point may be moot...

~Chris

PS: Sorry to Blair for the duplicate message.  I intended to send it
to the list :-(
--
You received this message because you are subscribed to the Google Groups "OSL Developers" group.
To post to this group, send email to osl...@....
To unsubscribe from this group, send email to osl...@....
For more options, visit this group at http://groups.google.com/group/osl-dev?hl=en.






Re: problem building on macos

Nick Porcino <nick....@...>
 

Thanks, that was just the info I needed! A few libraries in macports
were left overs from a Leopard universal binary build. I uninstalled a
pile of libraries, working backwards from the link errors; reinstalled
the ones OIIO needed, and now all is well!

On Jan 17, 7:17 am, Larry Gritz <l...@...> wrote:
I do a most of my day-to-day development on Snow Leopard, so I know this can be made to work.

Are you building OIIO by checking out the "external" project and trying to build the whole thing?  If so, that may be more trouble than it's worth.  The "external" project is not needed, it serves mainly to guarantee to be the same dependency versions among all developers, independent of anything else installed on the system, or for people who lack permissions to install the packages in the right places in their system.

I would get rid of the "external" project and just compile OIIO straight -- it will find the dependencies it needs in your system.  The only ones it really needs in order to be useful for OSL are boost, libtiff, ilmbase, and openexr, and it's pretty easy to install those with Macports if you don't already have them on your system.  (OIIO's build system will handle other missing dependencies by just not building the parts that need them -- typically plugins for other image formats, none of which are especially useful for OSL's texturing.)

Another alternative is to post to the OIIO list and attach the actual output.  ("libtiff doesn't build" is a tad vague, maybe it's an easy fix we can suggest.)

        -- lg

On Jan 16, 2010, at 11:49 PM, Nick Porcino wrote:



I'm having some difficulty building OpenImageIO on macos (Snow
Leopard), as a required pre-requisite for building OSL. The build of
the external packages fails because libtiff doesn't build. Everything
else does.
I simply did a clean pull from the repository and followed the
instructions (and I've built dozens of open source things before, so
the issue is not unfamiliarity with make systems or anything like
that).
I've done all the usual sorts of troubleshooting, but nothing strikes
me as obviously wrong with the set up. Before I dig in seriously to
work out why the build is failing, I thought I would potentially save
some time by asking here first if anyone has successfully built on
macos after a fresh pull of OIIO and OSL, and if not, what issues and
resolutions might have got things working.
Thanks
<ATT00001..txt>
--
Larry Gritz
l...@...


Re: problem building on macos

Blair Zajac <bl...@...>
 

If someone wants to write a MacPorts Portfile for OIIO I can commit it. That should make building OSL that much easier.

Regards,
Blair

Nick Porcino wrote:

Thanks, that was just the info I needed! A few libraries in macports
were left overs from a Leopard universal binary build. I uninstalled a
pile of libraries, working backwards from the link errors; reinstalled
the ones OIIO needed, and now all is well!


Fix errors and warnings from g++-4.4.1 (issue193074)

chri...@...
 

Reviewers: osl-dev_googlegroups.com,

Description:
Several additional warnings and some new errors occur when trying to
compile OSL using g++-4.4.1.

Errors were due to:
- header rearrangements which cause strcmp() and exit() to be
unavaliable without additional includes
- strchr (const char*, char) returns const char*

The warnings are various:
- hash_map is deprecated. The obvious alternative is to use
unordered_map from boost, but it's only avaliable after boost-1.36, so
not sure if my fix will work at SPI :-/
- failing to check the return value of fgets()
- a probable operator precedence bug - && has higher precedence than ||
and gcc warns about a suspicious usage (hopefully I guessed right about
the intention here!)
- a dangling else ambiguity (not a bug, just suspicious)
- warnings about uninitialized values


Please review this at http://codereview.appspot.com/193074/show

Affected files:
liboslcomp/oslcomp.cpp
liboslcomp/osllex.l
liboslcomp/symtab.h
liboslexec/bsdf_cloth.cpp
liboslexec/dual.h
oslc/oslcmain.cpp
oslinfo/oslinfo.cpp


Re: Fix errors and warnings from g++-4.4.1 (issue193074)

Blair Zajac <bl...@...>
 

On Jan 23, 2010, at 9:01 PM, chri...@... wrote:

Reviewers: osl-dev_googlegroups.com,

Description:
Several additional warnings and some new errors occur when trying to
compile OSL using g++-4.4.1.

Errors were due to:
- header rearrangements which cause strcmp() and exit() to be
unavaliable without additional includes
- strchr (const char*, char) returns const char*

The warnings are various:
- hash_map is deprecated. The obvious alternative is to use
unordered_map from boost, but it's only avaliable after boost-1.36, so
not sure if my fix will work at SPI :-/
You could do what Google Protocol Buffers does and determine which hash map to use at configure time. It works with g++ 3.4.x all the way up to 4.4.x. See the m4 file at:

http://code.google.com/p/protobuf/source/browse/trunk/m4/stl_hash.m4

Protocol Buffers has a new BSD license so this could be copied straight from them.

Running configure on my Ubuntu Karmic system with g++ 4.4.1 shows

checking the location of hash_map... <tr1/unordered_map>

Regards,
Blair


Re: Fix errors and warnings from g++-4.4.1 (issue193074)

chri...@...
 

On 2010/01/24 08:26:12, blair wrote:
You could do what Google Protocol Buffers does and determine which
hash map to
use at configure time. It works with g++ 3.4.x all the way up to
4.4.x. See
the m4 file at:
http://code.google.com/p/protobuf/source/browse/trunk/m4/stl_hash.m4
IMHO this seems to be a bit of overkill, and my preferred option would
be just to specify that >=boost-1.36 was necessary. However I know that
may not be an option for everyone so I'll let the OSL core developers
chime in. What's the story guys?

I'll note that using hash_map *does* compile with g++-4.4.1, but not
without warnings (and hence doesn't compile when using -Werror which is
turned on by default in the build scripts).

Protocol Buffers has a new BSD license so this could be copied
straight from
them.
The M4 would have to be converted to cmake, but it's good to see all the
potential places which hash_map may reside. Gosh there's a lot!


http://codereview.appspot.com/193074/show

101 - 120 of 4975