Re: S-Log2/S-gamut 10-bit 422 XAVC to Linear ACES RGB EXR


Jeremy Selan <jeremy...@...>
 

Vincent,

Interesting code, thanks for sharing!  My apologies for not replying sooner, my bandwidth has been way too limited of late. :(

There are quite a few different questions here.  Let me take a stab some.

First off, can you share a bit more about what you're trying to accomplish in transcoding the F55 stream to OpenEXR? What are you hoping to do with the EXR frames? Are you aiming for real-time playback?  Is the encoding performance critical? Hearing more about the desired usage would be very helpful.  Next off, how are you going to view these linearized EXR frames? Note that when you use the referenced color math to go to scene-linear, you'll probably prefer some sort of 's-shaped' tone mapping operator, rather than a simple gamma curve or such.

The reason I ask is that in your code example, you appear have a single chunk of code that is responsible for both decoding the frames, and applying a particular set of hard-coded color transformations.  In my personal experience, I tend to gravitate towards separable chunks of re-usable processing rather than 'all in 1' binaries. For example, it's common to have multiple productions at Imageworks concurrently which, while sharing input cameras, may choose to use slightly different input color transforms.  For this reason, in OpenColorIO the color configurations are loaded at runtime, rather than being built in.

Here's an example of a stand-alone binary, which uses OpenImageIO to do the image reading/writing, and OpenColorIO to do the color processing.  Note that the color math is not built-in, but is abstracted away in the library:

  Then, I'm using FFMEG's Lanczos 422 to 444 upscaling algorithm, which is slow, but produces the best results, IMHO.

Agreed!  Lanczos is a great compromise between maintaining sharpness, without introducing too much overshoot/undershoot. :)

> My only question here is: what does Poyton mean when, comparing 8-bit to 10-bit, he says "The extra two bits are appended as least-significant bits to provide increased precision."?

I believe what poynton is saying is that when converting between bit-depths at different precisions, you typically get them as extra LSB info.  Put another way, say you have a floating-point image representation where pixels values are between [0.0-1.0]. (This is NOT scene-linear imagery, of course).  If we wanted to encode that in 8 bits, we would use integer values 0-255.  And for 10 bits, we would using 0-1023.  Note that scaling from 8 to 10 bits (or back) is NOT a simple bit shift. Recall that a bit shift by 2 places is a simple mult or divide by 4, so if we took 255 to 10 bits using shifting 255 would map to 1020! Ugh! (Remember that we want to use the full 1023 sized coding range).  So I tend to think about bit changes as a mult/divide by the max.  I.e., to go from 8 bits -> 10 in a manner that uses the full range, you must mult by 1023/255.0. There are more bit-efficient ways to do this - and old OpenImageIO thread discusses the subtleties - I can search for it if you're interested. For non-performance critical versions, going to float as an intermediate representation is usually the best option. Getting integer math right is hard, in some crappy non-obvious ways.

Your code has a few lines similar to,
>  pow(256, i)
This 'double' arithmetic is probably not what you're looking for. Perhaps a simple bitshift instead?

> S-log2/S-gamut YCbCr to S-log2/S-gamut R'G'B'
1) The camera uses the Rec 709 R'G'B' to YCbCr transform. And I use the reverse Rec 708 to get R'G'B' from YCbCr. Or is there something in SMPTE-ST-2048-1:2011 I should know about and take into consideration here?

Your matrixing back to rgb, with appropriate range consideration, is probably appropriate.  What I'd recommend for validating your code is to see if you can break this processing into separate steps and compare against known reference solutions.  For example, does this sony camera have any way to write out an RGB image directly? Or, does sony provide any reference software to transcode the stream to an uncompressed full range RGB image?  Step one for testing your code is to disable the linearization, and to only compare the YCbCr transform bits.

If memory serves, I also believe that openexr has native support for subsampled chroma images, you may want to investigate that.

As you note, YCbCr most often utilizes the range of 16-235 (for 8 bits) and 64-940 (for 10 bits) when storing *rec709* imagery. However, the Slog2 imagery takes advantage of this extra range so you have to be careful not to build in the wrong scale factors. Once again, off the top of my head I'm not sure if your code is correct or not.  But if I were in your shoes I would carefully compare the reconstructed RGB full range image versus a known correct result. (this may require capturing with a different setting in the camera).

> There are two distinct CTL transforms from S-gamut to ACES for two white points: 3200K and 5500K. Why? Would one get the 5500K transform matrix by applying a white-balance transform on the 3200K version (and vice-versa)?

The color transform you're looking is tailored to the F65, FYI, so I'm not sure how closely these matricies would match for your camera.  The reason there are two transforms is that I believe Sony has optimized the conversion to SLog2, in the F65, to be specific to the color balance on the camera.  This is pretty non-standard, and so should be taken with a grain of salt until we get an official IDT from the ACES community. But in my understanding the different IDTs are required for strict accuracy.  Perhaps if you have a camera at hand, you can do an example test with both approaches, and see how large the residual differences are?

In practice, people my prefer to standardize on one of the IDTs for sanity sake, even if it's not perfect in all situations.  (An example of a similar common practice would be the Arri Alexa's log-c conversion ,where a different lut is required depending on the exposure index used. But in practice, people often drop this and just assume EI800 linearization).

> I feel like I'm reverse engineering the whole thing and I'm not confident enough (yet) of the rigorous "scene-referredness" of the output. It looks good, > but there has been too much guesswork involved to fully trust it. I would really appreciate some pointers.

Agreed! You definitely need to validate this stuff when so much of the code is untested / bleeding edge.  If you have the time, interest, and access to the camera, nothing beats a ground truth linearization test.  The rough outline for the test is to setup a test chart with a stable light source, and then to shoot an exposure sweep across the full range of exposures. Then, post-linearization, you should be able to align the different exposures and see how close they match! If they all match (other than clipping at the ends of the dynamic range), then your conversion is dead on (at least for grayscale axis).

> Finally, on a side note, I would eventually accelerate the linear parts of the color transform through CUBLAS. I think I can achieve realtime speed both for offline conversion and field monitoring. Has anyone tried to port some of the OCIO code to CUDA?

Yes, there have some attempts to do CUDA integration for GPU accelerated 'final quality' transforms, but these were never taken past the prototype stage as of yet.  (Once again, my fault!)  I can point you to the branch if you're interested.  There has also been some OpenCL interest.

For simple monitoring though, you dont necessarily need to go to scene-linear, but instead can go straight from slog2 to the display transform using a 3d-lut.  OCIO does support this on the GPU already, see the ociodisplay example for the code.

Cheers,
Jeremy


On Sun, May 26, 2013 at 6:38 PM, Vincent Olivier <vin...@...> wrote:
Hi,

I'm trying to convert the S-Log2/S-gamut 10-bit 422 XAVC footage coming out of my Sony F55 camera to a sequence of Linear ACES RGB OpenEXR files. I would like to validate the assumptions I make throughout the conversion process with you guys, if you be so kind.




Little-endian S-log2/S-gamut 10-bit 422 YCbCr to little-endian S-log2/S-gamut 10-bit 444 YCbCr

First, I'm using FFMPEG to open the XAVC file (which it recognizes simply as a MXF-muxed XAVC Intra stream). The x264 decoding seems to work superbly as far as I can see. Then, I'm using FFMEG's Lanczos 422 to 444 upscaling algorithm, which is slow, but produces the best results, IMHO.

My only question here is: what does Poyton mean when, comparing 8-bit to 10-bit, he says "The extra two bits are appended as least-significant bits to provide increased precision."?

Because FFMPEG indicates that the stream is 10-bit little-endian, which calls for a 8-bit shift of the second byte (lines 168-184 in my code). Anyways, this seems to work just fine. I'm just checking if there is something I don't understand in Poyton's qualification of the 10-bit YCbCr bitstream endianness or maybe it's reformatted under the hood by FFMPEG from the raw XAVC output. Mystery…



S-log2/S-gamut YCbCr to S-log2/S-gamut R'G'B'

My assumptions here are that:

1) The camera uses the Rec 709 R'G'B' to YCbCr transform. And I use the reverse Rec 708 to get R'G'B' from YCbCr. Or is there something in SMPTE-ST-2048-1:2011 I should know about and take into consideration here?

2) The footroom provision is 0…64 for luma samples and 0…512 for chroma samples. See lines 187-191 in my code. I have adapted that from the 8-bit footroom values (16/128) because it seems to make sense according to basic signal statistics I've made on the samples from one frame. But I'm REALLY not sure about that…

3) Slog2 code uses "full-range" RGB (0…255 and not 0…219). See matrix at lines 131-136 in my code for the YCbCr to RGB "full-range". The headroom-preserving transform matrix is at 140-145 (I'm not using this one).



S-log2/S-gamut R'G'B' to Linear S-gamut RGB

This is where it gets interesting. I have adapted the "slog2.py" code, part of the OpenColorIO-Configs project on Github provided by Jeremy Selan (I sent him an email weeks ago and didn't hear from him).

Assumptions made in my code:

1) The rescale at slog2.py:17-23 is redundant if you provide "full-range" RGB to the S-Log2 linearization algorithm. I left it in my code (see lines 65-66). But commenting it out seems to give more dynamic range to the result. Again, I might be dead wrong on this.

2) The differences between S-log1 and S-log2 are only: A: for the same input, S-log1 doesn't have a rescaling step and S-log2 has one (see my previous point), B: there is a linear portion in the shadows, and C: the highlight-portion power-function is scaled by 219/155.



Linear S-gamut RGB to Linear ACES RGB

There are two distinct CTL transforms from S-gamut to ACES for two white points: 3200K and 5500K. Why? Would one get the 5500K transform matrix by applying a white-balance transform on the 3200K version (and vice-versa)?



I feel like I'm reverse engineering the whole thing and I'm not confident enough (yet) of the rigorous "scene-referredness" of the output. It looks good, but there has been too much guesswork involved to fully trust it. I would really appreciate some pointers.


Finally, on a side note, I would eventually accelerate the linear parts of the color transform through CUBLAS. I think I can achieve realtime speed both for offline conversion and field monitoring. Has anyone tried to port some of the OCIO code to CUDA?


Thanks for everything!

Vincent

--
You received this message because you are subscribed to the Google Groups "OpenColorIO Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ocio-dev+u...@....
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Join {ocio-dev@lists.aswf.io to automatically receive all group messages.