where scale and offset are based on the lut edge len, as shown in the original Sony Imageworks' GPUGems 2, chapter 24.
Now I need to understand why the GpuShaderDesc of OCIO spits out some more vector multiplications before the shader look up. That seems to improve the lookup so that only a lattice of 32x32x32 is needed.