OCIO CUDA


Larry Gritz <l...@...>
 

IIRC, although it's not quite as easy as OpenCL, these days Cuda can dynamically compile kernels.  (OpenCL was able to do that all along.)

The other advantage of Cuda is that it's really C++ with a couple minor additions, which may make porting our existing code a lot easier, and also use all your favorite C++ features such as classes and templates.  OpenCL is its own thing (though very C like).

The disadvantage is, of course, less HW and vendor independence.


On Mar 30, 2012, at 12:02 PM, Jeremy Selan wrote:

Excellent, looking forward to seeing what you do.

You mention CUDA (historically) doesn't support dynamic compilation of kernels?  That would imply an implementation that looks more like a fixed function processing path, unfortunately.  The downside being that the results would be even less accurate (potentially) then either our current CPU or GPU pathways.

Recall that in OCIO, all of the color transforms are dynamically loaded at runtime, so at library compile-time there's no way to know processing will be required for a given color transform.  (You roughly know the building blocks, but not how they will be chained together).

Even our current GLSL codepath, which leverages a single 3dlut, tries to do as much as possible in the fragment shader at runtime.  (It's pipeline looks like [GLSL CODE + 3DLUT + GLSL CODE], with as much done in code as possible).

What type of CUDA application are you writing? Are you looking for OCIO in CUDA for performance reasons?  Are you looking for OCIO to match the quality of the GPU?  Perhaps we can come up with an alternate implementation approach, or decide that it's better to just target recent CUDA versions.

On first glance, it appears that OpenCL may support dynamic compilation, and thus be easier to match the CPU 1:1.  Can anyone with OpenCL experience chime in?

clCreateProgramWithSource(...), etc.

-- Jeremy

On Thu, Mar 29, 2012 at 11:11 AM, Nathan Weston <elb...@...> wrote:
Cool. If all goes well, I can hopefully find time to work on this over the next couple of months.

CUDA historically hasn't supported dynamic generation/compilation of kernels. I believe it's possible with newer versions of the compiler, but only with the lower-level driver API. A statically-compiled kernel is probably a better bet, which would tend to point toward a more analytical approach along the lines of your CPU codepath.

I've had pretty good luck in the past sharing code between C++ and CUDA in order to implement parallel code paths that produce the same results. The only snags are
 1) the shared code has to go into header files
 2) Virtual functions require CUDA 4.0 and Fermi hardware

If possible I'd like to support older cards and CUDA toolkits, which means no virtual functions.

Is there any documentation of the OCIO internals to help me get my bearings?


On Wednesday, March 28, 2012 6:47:58 PM UTC-4, Jeremy Selan wrote:
I don't think anyone has looked implementing a CUDA pathway, but I'm very open to such ideas. Someone did ask about an OpenCL implementation recently, but I believe it's still in the concept stage.

A few thoughts on the concept...

Our current GPU implementation does not attempt to match the CPU implementation, by design.  The CPU codepath does the full analytical color operations per pixels, while the GPU GLSL/Cg implementation relies on a combination of analytical shader text code generation, along with a 3d lut sampling.   For color operations which can be done in simple shader text (such as math ops), these all happen in the glsl shader. But if the user references multiple 3d luts for example, it's all baked into a single 3d lut.

I was always hoping that, if we ever implemented a CUDA or OpenCL pathway, it would be more akin to the GPU code path and do more analytically.  Im not sure if this is possible, but I think it's a nice ideal for a 'compute' context.

Another nicety of the current implementation is that even though we support gpu(s), OpenColorIO doesnt actually link to libGL, etc.   The 'GPU API' conceptually only deals with POD types, returning the float * 3dlut, and the const char * shader text.

My hope would be that, if possible, a CUDA / OpenCL wouldn't impose any new linking requirements on the core library, but would instead support new code paths using simple data types.

-- Jeremy


On Wed, Mar 28, 2012 at 12:56 PM, Nathan Weston <elb...@...> wrote:
I'm currently integrating OpenColorIO into an application that uses CUDA for GPU processing. In order to use OCIO's shader path, we'd need to copy our images over to OpenGL textures and back again. If OCIO had a CUDA path, it would be cleaner and faster.

Has anyone looked into implementing such a thing? If I were to implement it myself, is there any interest in including it in OCIO?



--
Larry Gritz




Jeremy Selan <jeremy...@...>
 

Excellent, looking forward to seeing what you do.

You mention CUDA (historically) doesn't support dynamic compilation of kernels?  That would imply an implementation that looks more like a fixed function processing path, unfortunately.  The downside being that the results would be even less accurate (potentially) then either our current CPU or GPU pathways.

Recall that in OCIO, all of the color transforms are dynamically loaded at runtime, so at library compile-time there's no way to know processing will be required for a given color transform.  (You roughly know the building blocks, but not how they will be chained together).

Even our current GLSL codepath, which leverages a single 3dlut, tries to do as much as possible in the fragment shader at runtime.  (It's pipeline looks like [GLSL CODE + 3DLUT + GLSL CODE], with as much done in code as possible).

What type of CUDA application are you writing? Are you looking for OCIO in CUDA for performance reasons?  Are you looking for OCIO to match the quality of the GPU?  Perhaps we can come up with an alternate implementation approach, or decide that it's better to just target recent CUDA versions.

On first glance, it appears that OpenCL may support dynamic compilation, and thus be easier to match the CPU 1:1.  Can anyone with OpenCL experience chime in?

clCreateProgramWithSource(...), etc.

-- Jeremy

On Thu, Mar 29, 2012 at 11:11 AM, Nathan Weston <elb...@...> wrote:
Cool. If all goes well, I can hopefully find time to work on this over the next couple of months.

CUDA historically hasn't supported dynamic generation/compilation of kernels. I believe it's possible with newer versions of the compiler, but only with the lower-level driver API. A statically-compiled kernel is probably a better bet, which would tend to point toward a more analytical approach along the lines of your CPU codepath.

I've had pretty good luck in the past sharing code between C++ and CUDA in order to implement parallel code paths that produce the same results. The only snags are
 1) the shared code has to go into header files
 2) Virtual functions require CUDA 4.0 and Fermi hardware

If possible I'd like to support older cards and CUDA toolkits, which means no virtual functions.

Is there any documentation of the OCIO internals to help me get my bearings?


On Wednesday, March 28, 2012 6:47:58 PM UTC-4, Jeremy Selan wrote:
I don't think anyone has looked implementing a CUDA pathway, but I'm very open to such ideas. Someone did ask about an OpenCL implementation recently, but I believe it's still in the concept stage.

A few thoughts on the concept...

Our current GPU implementation does not attempt to match the CPU implementation, by design.  The CPU codepath does the full analytical color operations per pixels, while the GPU GLSL/Cg implementation relies on a combination of analytical shader text code generation, along with a 3d lut sampling.   For color operations which can be done in simple shader text (such as math ops), these all happen in the glsl shader. But if the user references multiple 3d luts for example, it's all baked into a single 3d lut.

I was always hoping that, if we ever implemented a CUDA or OpenCL pathway, it would be more akin to the GPU code path and do more analytically.  Im not sure if this is possible, but I think it's a nice ideal for a 'compute' context.

Another nicety of the current implementation is that even though we support gpu(s), OpenColorIO doesnt actually link to libGL, etc.   The 'GPU API' conceptually only deals with POD types, returning the float * 3dlut, and the const char * shader text.

My hope would be that, if possible, a CUDA / OpenCL wouldn't impose any new linking requirements on the core library, but would instead support new code paths using simple data types.

-- Jeremy


On Wed, Mar 28, 2012 at 12:56 PM, Nathan Weston <elb...@...> wrote:
I'm currently integrating OpenColorIO into an application that uses CUDA for GPU processing. In order to use OCIO's shader path, we'd need to copy our images over to OpenGL textures and back again. If OCIO had a CUDA path, it would be cleaner and faster.

Has anyone looked into implementing such a thing? If I were to implement it myself, is there any interest in including it in OCIO?



Nathan Weston <elb...@...>
 

Cool. If all goes well, I can hopefully find time to work on this over the next couple of months.

CUDA historically hasn't supported dynamic generation/compilation of kernels. I believe it's possible with newer versions of the compiler, but only with the lower-level driver API. A statically-compiled kernel is probably a better bet, which would tend to point toward a more analytical approach along the lines of your CPU codepath.

I've had pretty good luck in the past sharing code between C++ and CUDA in order to implement parallel code paths that produce the same results. The only snags are
 1) the shared code has to go into header files
 2) Virtual functions require CUDA 4.0 and Fermi hardware

If possible I'd like to support older cards and CUDA toolkits, which means no virtual functions.

Is there any documentation of the OCIO internals to help me get my bearings?


On Wednesday, March 28, 2012 6:47:58 PM UTC-4, Jeremy Selan wrote:
I don't think anyone has looked implementing a CUDA pathway, but I'm very open to such ideas. Someone did ask about an OpenCL implementation recently, but I believe it's still in the concept stage.

A few thoughts on the concept...

Our current GPU implementation does not attempt to match the CPU implementation, by design.  The CPU codepath does the full analytical color operations per pixels, while the GPU GLSL/Cg implementation relies on a combination of analytical shader text code generation, along with a 3d lut sampling.   For color operations which can be done in simple shader text (such as math ops), these all happen in the glsl shader. But if the user references multiple 3d luts for example, it's all baked into a single 3d lut.

I was always hoping that, if we ever implemented a CUDA or OpenCL pathway, it would be more akin to the GPU code path and do more analytically.  Im not sure if this is possible, but I think it's a nice ideal for a 'compute' context.

Another nicety of the current implementation is that even though we support gpu(s), OpenColorIO doesnt actually link to libGL, etc.   The 'GPU API' conceptually only deals with POD types, returning the float * 3dlut, and the const char * shader text.

My hope would be that, if possible, a CUDA / OpenCL wouldn't impose any new linking requirements on the core library, but would instead support new code paths using simple data types.

-- Jeremy


On Wed, Mar 28, 2012 at 12:56 PM, Nathan Weston <elb...@...> wrote:
I'm currently integrating OpenColorIO into an application that uses CUDA for GPU processing. In order to use OCIO's shader path, we'd need to copy our images over to OpenGL textures and back again. If OCIO had a CUDA path, it would be cleaner and faster.

Has anyone looked into implementing such a thing? If I were to implement it myself, is there any interest in including it in OCIO?


Jeremy Selan <jeremy...@...>
 

I don't think anyone has looked implementing a CUDA pathway, but I'm very open to such ideas. Someone did ask about an OpenCL implementation recently, but I believe it's still in the concept stage.

A few thoughts on the concept...

Our current GPU implementation does not attempt to match the CPU implementation, by design.  The CPU codepath does the full analytical color operations per pixels, while the GPU GLSL/Cg implementation relies on a combination of analytical shader text code generation, along with a 3d lut sampling.   For color operations which can be done in simple shader text (such as math ops), these all happen in the glsl shader. But if the user references multiple 3d luts for example, it's all baked into a single 3d lut.

I was always hoping that, if we ever implemented a CUDA or OpenCL pathway, it would be more akin to the GPU code path and do more analytically.  Im not sure if this is possible, but I think it's a nice ideal for a 'compute' context.

Another nicety of the current implementation is that even though we support gpu(s), OpenColorIO doesnt actually link to libGL, etc.   The 'GPU API' conceptually only deals with POD types, returning the float * 3dlut, and the const char * shader text.

My hope would be that, if possible, a CUDA / OpenCL wouldn't impose any new linking requirements on the core library, but would instead support new code paths using simple data types.

-- Jeremy


On Wed, Mar 28, 2012 at 12:56 PM, Nathan Weston <elb...@...> wrote:

I'm currently integrating OpenColorIO into an application that uses CUDA for GPU processing. In order to use OCIO's shader path, we'd need to copy our images over to OpenGL textures and back again. If OCIO had a CUDA path, it would be cleaner and faster.

Has anyone looked into implementing such a thing? If I were to implement it myself, is there any interest in including it in OCIO?


Nathan Weston <elb...@...>
 

I'm currently integrating OpenColorIO into an application that uses CUDA for GPU processing. In order to use OCIO's shader path, we'd need to copy our images over to OpenGL textures and back again. If OCIO had a CUDA path, it would be cleaner and faster.

Has anyone looked into implementing such a thing? If I were to implement it myself, is there any interest in including it in OCIO?