Cool. Get's closer, but another build error.

I've created this issue:

so we can take the remainder of the CUDA build discussion off-list.
For those who are interested in following, please feel free to add
yourself to the issue.

On Tue, Apr 17, 2012 at 7:19 AM, Nathan Weston
I tried it with 4.0 this morning and got the same error. It does seem to be
a bug in nvcc, but I was able to work around it. I just pushed the fix to my

On 4/16/2012 7:40 PM, Jeremy Selan wrote:

i'll try to get 4.1 installed.  Let's wait and see if that fixes things.
On Mon, Apr 16, 2012 at 4:39 PM, Nathan Weston

Hmmm... I didn't run into anything like that.
It looks like the error is coming from an intermediate file that nvcc
generates and passes off to gcc. I've seen that kind of thing in the past
when there's a bug in nvcc or it runs into some C++ construct it can't

I notice that you're building with CUDA 4.0. I've actually been using 4.1
I didn't think there was anything in my code that 4.0 couldn't handle,
you never know. If you have 4.1 around, it might be worth a try.
I'll test with 4.0 myself tomorrow.

On 04/16/2012 06:54 PM, Jeremy Selan wrote:


Haven't looked very far into the code yet, but tried to build it and
having some issues.

Does this error ring any bells?  I have to admit my CUDA experience is

On Mon, Apr 16, 2012 at 12:25 PM, Nathan Weston

On 4/9/2012 7:12 PM, Jeremy Selan wrote:

So what are the next steps?

I think my preference would be for you to
- mockup the public API
- write CUDA support for only the simplest possible Op, such as
- copy src/apps/ocioconvert ->      src/apps/ociocudaconvert, and
this example to load to a cuda buffer, process using OCIO, copy back
to host memory, and then save to a file.

One there are done, we can iterate on this trivial case until we get
an API / file layout we all like.

This is done now. My code is on GitHub:

I worked out a little differently than I had planned. I ended up with a
parallel class hierarchy of CudaOps. This doesn't result in too much
duplicated code since the Ops typically call a function to do most of
work of apply().

I had to move some code into different files, but on the whole the
to existing code weren't as bad as I expected.

The public API just consists of two new ImageDesc classes, for
CUDA images.

There are two limitations at the moment:
1. nvcc doesn't support C++0x yet, so the CUDA path only builds if
OCIO_USE_BOOST_PTR is enabled. I don't think we really need smart
anywhere in the CUDA code, so we ought to be able to work around this,
but I
haven't tried it yet.

2. The current implementation requires CUDA 4.0 and a Fermi card,
makes virtual calls in device code. Eventually I'd like to support
cards, but I can worry about that later.

Let me know what you think so far.

