OCIO CUDA


Jeremy Selan <jeremy...@...>
 

Awesome!

Haven't looked very far into the code yet, but tried to build it and
having some issues.

Does this error ring any bells? I have to admit my CUDA experience is limited.
https://gist.github.com/2402166

-- Jeremy

On Mon, Apr 16, 2012 at 12:25 PM, Nathan Weston <elb...@...> wrote:
On 4/9/2012 7:12 PM, Jeremy Selan wrote:

So what are the next steps?

I think my preference would be for you to
- mockup the public API
- write CUDA support for only the simplest possible Op, such as
'ExponentOp'
- copy src/apps/ocioconvert ->  src/apps/ociocudaconvert, and update
this example to load to a cuda buffer, process using OCIO, copy back
to host memory, and then save to a file.

One there are done, we can iterate on this trivial case until we get
an API / file layout we all like.

This is done now. My code is on GitHub:
https://github.com/nweston/OpenColorIO/tree/cuda

I worked out a little differently than I had planned. I ended up with a
parallel class hierarchy of CudaOps. This doesn't result in too much
duplicated code since the Ops typically call a function to do most of the
work of apply().

I had to move some code into different files, but on the whole the changes
to existing code weren't as bad as I expected.

The public API just consists of two new ImageDesc classes, for packed/planar
CUDA images.

There are two limitations at the moment:
1. nvcc doesn't support C++0x yet, so the CUDA path only builds if
OCIO_USE_BOOST_PTR is enabled. I don't think we really need smart pointers
anywhere in the CUDA code, so we ought to be able to work around this, but I
haven't tried it yet.

2. The current implementation requires CUDA 4.0 and a Fermi card, because it
makes virtual calls in device code. Eventually I'd like to support older
cards, but I can worry about that later.

Let me know what you think so far.

-- Nathan


Nathan Weston <elb...@...>
 

Hmmm... I didn't run into anything like that.
It looks like the error is coming from an intermediate file that nvcc generates and passes off to gcc. I've seen that kind of thing in the past when there's a bug in nvcc or it runs into some C++ construct it can't handle.

I notice that you're building with CUDA 4.0. I've actually been using 4.1 -- I didn't think there was anything in my code that 4.0 couldn't handle, but you never know. If you have 4.1 around, it might be worth a try. Otherwise I'll test with 4.0 myself tomorrow.

- Nathan

On 04/16/2012 06:54 PM, Jeremy Selan wrote:
Awesome!

Haven't looked very far into the code yet, but tried to build it and
having some issues.

Does this error ring any bells? I have to admit my CUDA experience is limited.
https://gist.github.com/2402166

-- Jeremy

On Mon, Apr 16, 2012 at 12:25 PM, Nathan Weston<elb...@...> wrote:
On 4/9/2012 7:12 PM, Jeremy Selan wrote:

So what are the next steps?

I think my preference would be for you to
- mockup the public API
- write CUDA support for only the simplest possible Op, such as
'ExponentOp'
- copy src/apps/ocioconvert -> src/apps/ociocudaconvert, and update
this example to load to a cuda buffer, process using OCIO, copy back
to host memory, and then save to a file.

One there are done, we can iterate on this trivial case until we get
an API / file layout we all like.

This is done now. My code is on GitHub:
https://github.com/nweston/OpenColorIO/tree/cuda

I worked out a little differently than I had planned. I ended up with a
parallel class hierarchy of CudaOps. This doesn't result in too much
duplicated code since the Ops typically call a function to do most of the
work of apply().

I had to move some code into different files, but on the whole the changes
to existing code weren't as bad as I expected.

The public API just consists of two new ImageDesc classes, for packed/planar
CUDA images.

There are two limitations at the moment:
1. nvcc doesn't support C++0x yet, so the CUDA path only builds if
OCIO_USE_BOOST_PTR is enabled. I don't think we really need smart pointers
anywhere in the CUDA code, so we ought to be able to work around this, but I
haven't tried it yet.

2. The current implementation requires CUDA 4.0 and a Fermi card, because it
makes virtual calls in device code. Eventually I'd like to support older
cards, but I can worry about that later.

Let me know what you think so far.

-- Nathan


Jeremy Selan <jeremy...@...>
 

i'll try to get 4.1 installed. Let's wait and see if that fixes things.
-- Jeremy

On Mon, Apr 16, 2012 at 4:39 PM, Nathan Weston <elb...@...> wrote:
Hmmm... I didn't run into anything like that.
It looks like the error is coming from an intermediate file that nvcc
generates and passes off to gcc. I've seen that kind of thing in the past
when there's a bug in nvcc or it runs into some C++ construct it can't
handle.

I notice that you're building with CUDA 4.0. I've actually been using 4.1 --
I didn't think there was anything in my code that 4.0 couldn't handle, but
you never know. If you have 4.1 around, it might be worth a try. Otherwise
I'll test with 4.0 myself tomorrow.

- Nathan


On 04/16/2012 06:54 PM, Jeremy Selan wrote:

Awesome!

Haven't looked very far into the code yet, but tried to build it and
having some issues.

Does this error ring any bells?  I have to admit my CUDA experience is
limited.
https://gist.github.com/2402166

-- Jeremy

On Mon, Apr 16, 2012 at 12:25 PM, Nathan Weston<elb...@...>
 wrote:

On 4/9/2012 7:12 PM, Jeremy Selan wrote:


So what are the next steps?

I think my preference would be for you to
- mockup the public API
- write CUDA support for only the simplest possible Op, such as
'ExponentOp'
- copy src/apps/ocioconvert ->    src/apps/ociocudaconvert, and update
this example to load to a cuda buffer, process using OCIO, copy back
to host memory, and then save to a file.

One there are done, we can iterate on this trivial case until we get
an API / file layout we all like.


This is done now. My code is on GitHub:
https://github.com/nweston/OpenColorIO/tree/cuda

I worked out a little differently than I had planned. I ended up with a
parallel class hierarchy of CudaOps. This doesn't result in too much
duplicated code since the Ops typically call a function to do most of the
work of apply().

I had to move some code into different files, but on the whole the
changes
to existing code weren't as bad as I expected.

The public API just consists of two new ImageDesc classes, for
packed/planar
CUDA images.

There are two limitations at the moment:
1. nvcc doesn't support C++0x yet, so the CUDA path only builds if
OCIO_USE_BOOST_PTR is enabled. I don't think we really need smart
pointers
anywhere in the CUDA code, so we ought to be able to work around this,
but I
haven't tried it yet.

2. The current implementation requires CUDA 4.0 and a Fermi card, because
it
makes virtual calls in device code. Eventually I'd like to support older
cards, but I can worry about that later.

Let me know what you think so far.

-- Nathan


Nathan Weston <elb...@...>
 

I tried it with 4.0 this morning and got the same error. It does seem to be a bug in nvcc, but I was able to work around it. I just pushed the fix to my repo.

On 4/16/2012 7:40 PM, Jeremy Selan wrote:
i'll try to get 4.1 installed. Let's wait and see if that fixes things.
-- Jeremy

On Mon, Apr 16, 2012 at 4:39 PM, Nathan Weston<elb...@...> wrote:
Hmmm... I didn't run into anything like that.
It looks like the error is coming from an intermediate file that nvcc
generates and passes off to gcc. I've seen that kind of thing in the past
when there's a bug in nvcc or it runs into some C++ construct it can't
handle.

I notice that you're building with CUDA 4.0. I've actually been using 4.1 --
I didn't think there was anything in my code that 4.0 couldn't handle, but
you never know. If you have 4.1 around, it might be worth a try. Otherwise
I'll test with 4.0 myself tomorrow.

- Nathan


On 04/16/2012 06:54 PM, Jeremy Selan wrote:

Awesome!

Haven't looked very far into the code yet, but tried to build it and
having some issues.

Does this error ring any bells? I have to admit my CUDA experience is
limited.
https://gist.github.com/2402166

-- Jeremy

On Mon, Apr 16, 2012 at 12:25 PM, Nathan Weston<elb...@...>
wrote:

On 4/9/2012 7:12 PM, Jeremy Selan wrote:


So what are the next steps?

I think my preference would be for you to
- mockup the public API
- write CUDA support for only the simplest possible Op, such as
'ExponentOp'
- copy src/apps/ocioconvert -> src/apps/ociocudaconvert, and update
this example to load to a cuda buffer, process using OCIO, copy back
to host memory, and then save to a file.

One there are done, we can iterate on this trivial case until we get
an API / file layout we all like.


This is done now. My code is on GitHub:
https://github.com/nweston/OpenColorIO/tree/cuda

I worked out a little differently than I had planned. I ended up with a
parallel class hierarchy of CudaOps. This doesn't result in too much
duplicated code since the Ops typically call a function to do most of the
work of apply().

I had to move some code into different files, but on the whole the
changes
to existing code weren't as bad as I expected.

The public API just consists of two new ImageDesc classes, for
packed/planar
CUDA images.

There are two limitations at the moment:
1. nvcc doesn't support C++0x yet, so the CUDA path only builds if
OCIO_USE_BOOST_PTR is enabled. I don't think we really need smart
pointers
anywhere in the CUDA code, so we ought to be able to work around this,
but I
haven't tried it yet.

2. The current implementation requires CUDA 4.0 and a Fermi card, because
it
makes virtual calls in device code. Eventually I'd like to support older
cards, but I can worry about that later.

Let me know what you think so far.

-- Nathan


Jeremy Selan <jeremy...@...>
 

Cool. Get's closer, but another build error.

I've created this issue:
https://github.com/imageworks/OpenColorIO/issues/261

so we can take the remainder of the CUDA build discussion off-list.
For those who are interested in following, please feel free to add
yourself to the issue.

-- Jeremy

On Tue, Apr 17, 2012 at 7:19 AM, Nathan Weston <elb...@...> wrote:
I tried it with 4.0 this morning and got the same error. It does seem to be
a bug in nvcc, but I was able to work around it. I just pushed the fix to my
repo.


On 4/16/2012 7:40 PM, Jeremy Selan wrote:

i'll try to get 4.1 installed.  Let's wait and see if that fixes things.
-- Jeremy

On Mon, Apr 16, 2012 at 4:39 PM, Nathan Weston<elb...@...>  wrote:

Hmmm... I didn't run into anything like that.
It looks like the error is coming from an intermediate file that nvcc
generates and passes off to gcc. I've seen that kind of thing in the past
when there's a bug in nvcc or it runs into some C++ construct it can't
handle.

I notice that you're building with CUDA 4.0. I've actually been using 4.1
--
I didn't think there was anything in my code that 4.0 couldn't handle,
but
you never know. If you have 4.1 around, it might be worth a try.
Otherwise
I'll test with 4.0 myself tomorrow.

- Nathan


On 04/16/2012 06:54 PM, Jeremy Selan wrote:


Awesome!

Haven't looked very far into the code yet, but tried to build it and
having some issues.

Does this error ring any bells?  I have to admit my CUDA experience is
limited.
https://gist.github.com/2402166

-- Jeremy

On Mon, Apr 16, 2012 at 12:25 PM, Nathan Weston<elb...@...>
 wrote:


On 4/9/2012 7:12 PM, Jeremy Selan wrote:



So what are the next steps?

I think my preference would be for you to
- mockup the public API
- write CUDA support for only the simplest possible Op, such as
'ExponentOp'
- copy src/apps/ocioconvert ->      src/apps/ociocudaconvert, and
update
this example to load to a cuda buffer, process using OCIO, copy back
to host memory, and then save to a file.

One there are done, we can iterate on this trivial case until we get
an API / file layout we all like.



This is done now. My code is on GitHub:
https://github.com/nweston/OpenColorIO/tree/cuda

I worked out a little differently than I had planned. I ended up with a
parallel class hierarchy of CudaOps. This doesn't result in too much
duplicated code since the Ops typically call a function to do most of
the
work of apply().

I had to move some code into different files, but on the whole the
changes
to existing code weren't as bad as I expected.

The public API just consists of two new ImageDesc classes, for
packed/planar
CUDA images.

There are two limitations at the moment:
1. nvcc doesn't support C++0x yet, so the CUDA path only builds if
OCIO_USE_BOOST_PTR is enabled. I don't think we really need smart
pointers
anywhere in the CUDA code, so we ought to be able to work around this,
but I
haven't tried it yet.

2. The current implementation requires CUDA 4.0 and a Fermi card,
because
it
makes virtual calls in device code. Eventually I'd like to support
older
cards, but I can worry about that later.

Let me know what you think so far.

-- Nathan