Re: S-Log2/S-gamut 10-bit 422 XAVC to Linear ACES RGB EXR

Vincent Olivier <vin...@...>

Hi Jeremy,

Thanks so much for your reply.

On 2013-05-29, at 12:09 AM, Jeremy Selan <jeremy...@...> wrote:

First off, can you share a bit more about what you're trying to accomplish in transcoding the F55 stream to OpenEXR? What are you hoping to do with the EXR frames? Are you aiming for real-time playback?  Is the encoding performance critical? Hearing more about the desired usage would be very helpful.

For now, I "only" want to find a way to get the most accurate, richest, "objectively" (standalone) scene-referred images from my camera in ACES-linear. This is the first step: to which degree can the images from the F55/F65 be considered a physically accurate photographic reference measurement of the the scene for processing and also for archival (I don't think only keeping the original camera-referred data is sufficient in the long term). I'm also looking for a way to keep other exposure-related data such as t-stop and sensitivity and other in-camera processing logs into the EXR headers to be able to physically qualify a scene based with regards to the image.

I'm a programmer and a photographer. So, applications I'm looking forward to tackle are, first and foremost, of course, related to the mathematical and computational aspects of color and lighting æsthetics. I mean, I am very bored with the state of cinematography right now, there are really only 2 looks applied to every single movie: the Transformers anamorphic-crushed lensflare blue and the HDSLR peaches and cream indy porridge. Since The Archers (especially Red Shoes and Black Narcissus, and notice that these 2-words titles start with the name of a color), I haven't seen cinematography that can get a braingasm out of my visual cortex. Just look at what is created by the contrast between the herbal arrangement and the dresses in the the fashion show at the end of Queen Cotton (starting at 10:00). That's what I call art. I think the job of a cinematographer (a good one anyways) has to extend all the way to the computer imagery, the compositing and the image finishing of the movie in the digital realm. That's my personal goal.

Image-based lighting for unbiased rendering is another area of innovation I will be interested to look into in the near future once I get comfortable with ACES. I cannot afford access to Arnold nodes, but some Arnold core developers have contributed to Blender's Cycles in a significant way, and I really like this renderer, and it's open source. And the color pipeline is a really hot topic right now in the Blender community.

I'm not sure this answers you question. But I hope it clarifies my intentions a bit.

Next off, how are you going to view these linearized EXR frames? Note that when you use the referenced color math to go to scene-linear, you'll probably prefer some sort of 's-shaped' tone mapping operator, rather than a simple gamma curve or such.

I only have access to Rec 709 monitors, some OLED ones. So I look at the images through an ACES to Rec 709 LUT, but the computational space is ACES and as far as measurements are concerned, I rely on the various analytical tools, mostly histograms and vectorscopes (I'm building a custom CIE [x,y] diagram vectorscope to track what happens to image data in and out of color transforms as a way to visually represent those transforms: VERY useful when you are trying to communicate where you want to go with your color pipeline).

The reason I ask is that in your code example, you appear have a single chunk of code that is responsible for both decoding the frames, and applying a particular set of hard-coded color transformations.  In my personal experience, I tend to gravitate towards separable chunks of re-usable processing rather than 'all in 1' binaries.

Well, to me, before it's ACES, it's garbage™. ;-) My code reflects this philosophy. I don't think I will find a need for independent Sgamut and Slog2 transforms. Maybe the YCbCr to RGB transform will become optional from the Sgamut/Slog2 handling depending on the original footage input, yes. But I am also looking for the most computationally efficient code for a specific application (and I will probably write one big CUDA kernel for that too) and code-gathering gets me there (at the price of elegance, perhaps, but, well, performance is important in a production context). And this code is merely a proof of concept for now.

For example, it's common to have multiple productions at Imageworks concurrently which, while sharing input cameras, may choose to use slightly different input color transforms.  For this reason, in OpenColorIO the color configurations are loaded at runtime, rather than being built in.

And I would love to contribute the code I'm writing back to OCIO, in the elegant dynamic linear 3D color space transform combination you created. It's just that I need to get my YCbCr to ACES transform right, first. Why don't Imageworks have the color computing equivalent of Google's Summer of Code? I'll bring my sleeping bag and my toothbrush at your signal! ;-)

> My only question here is: what does Poyton mean when, comparing 8-bit to 10-bit, he says "The extra two bits are appended as least-significant bits to provide increased precision."?

I believe what poynton is saying is that when converting between bit-depths at different precisions, you typically get them as extra LSB info.  Put another way, say you have a floating-point image representation where pixels values are between [0.0-1.0]. (This is NOT scene-linear imagery, of course).  If we wanted to encode that in 8 bits, we would use integer values 0-255.  And for 10 bits, we would using 0-1023.

Agree with the [0, 255] to [0,1023], but then saying that the extra 2 bits are the least significant doesn't really makes sense… But anyway, if we both thing that's what he meant and coding makes sense visually, then, that's that.

 Note that scaling from 8 to 10 bits (or back) is NOT a simple bit shift. Recall that a bit shift by 2 places is a simple mult or divide by 4, so if we took 255 to 10 bits using shifting 255 would map to 1020! Ugh! (Remember that we want to use the full 1023 sized coding range).  So I tend to think about bit changes as a mult/divide by the max.  I.e., to go from 8 bits -> 10 in a manner that uses the full range, you must mult by 1023/255.0.

Agreed. That's how I do it in lines 187 to 191. But as far as headroom/footroom are concerned I don't go from 8 to 10, I simply take out the assumed 10-bit footroom values from the sample and then divide it by the 10-bit max.

There are more bit-efficient ways to do this - and old OpenImageIO thread discusses the subtleties - I can search for it if you're interested. For non-performance critical versions, going to float as an intermediate representation is usually the best option. Getting integer math right is hard, in some crappy non-obvious ways.

Your code has a few lines similar to,
>  pow(256, i)
This 'double' arithmetic is probably not what you're looking for. Perhaps a simple bitshift instead?

I chose a power function because I do not know the endianness at this point. It is not implemented yet, but I want to be able 

> S-log2/S-gamut YCbCr to S-log2/S-gamut R'G'B'
1) The camera uses the Rec 709 R'G'B' to YCbCr transform. And I use the reverse Rec 708 to get R'G'B' from YCbCr. Or is there something in SMPTE-ST-2048-1:2011 I should know about and take into consideration here?

Your matrixing back to rgb, with appropriate range consideration, is probably appropriate.  What I'd recommend for validating your code is to see if you can break this processing into separate steps and compare against known reference solutions.  For example, does this sony camera have any way to write out an RGB image directly? Or, does sony provide any reference software to transcode the stream to an uncompressed full range RGB image?  Step one for testing your code is to disable the linearization, and to only compare the YCbCr transform bits.

Yes, I'm partnering with a local FX company to take reference shots of a macbeth + f-stop chart and we'll see. But I would REALLY appreciate someone from Sony Electronics to validate the code at some point in time. I know they probably don't start with something linear in-camera to get to Slog2/Sgamut, but they sure have done something similar to what I'm doing… I  just find it odd that they didn't publish it (if they have it, which might not be the case), like they did for the original Slog/Sgamut transform in the 2009 whitepaper.

If memory serves, I also believe that openexr has native support for subsampled chroma images, you may want to investigate that.

I can and did output separate files with for each Y, Cb and Cr channels as grayscale. But I didn't find a way to put all three in one file.

As you note, YCbCr most often utilizes the range of 16-235 (for 8 bits) and 64-940 (for 10 bits) when storing *rec709* imagery. However, the Slog2 imagery takes advantage of this extra range so you have to be careful not to build in the wrong scale factors. Once again, off the top of my head I'm not sure if your code is correct or not.  But if I were in your shoes I would carefully compare the reconstructed RGB full range image versus a known correct result. (this may require capturing with a different setting in the camera).

Yes, will be looking for a raw recorder to get their 16-bit Slog2/Sgamut RAW to OpenEXR ACES transform. However, my understanding is that they are at the same point as I: their Slog2/Sgamut transforms are still a work in progress, even for the F65 (based on what I can see in Vegas and in their RAW Viewer).

> There are two distinct CTL transforms from S-gamut to ACES for two white points: 3200K and 5500K. Why? Would one get the 5500K transform matrix by applying a white-balance transform on the 3200K version (and vice-versa)?

The color transform you're looking is tailored to the F65, FYI, so I'm not sure how closely these matricies would match for your camera.  The reason there are two transforms is that I believe Sony has optimized the conversion to SLog2, in the F65, to be specific to the color balance on the camera.  This is pretty non-standard, and so should be taken with a grain of salt until we get an official IDT from the ACES community. But in my understanding the different IDTs are required for strict accuracy.  Perhaps if you have a camera at hand, you can do an example test with both approaches, and see how large the residual differences are?

In practice, people my prefer to standardize on one of the IDTs for sanity sake, even if it's not perfect in all situations.  (An example of a similar common practice would be the Arri Alexa's log-c conversion ,where a different lut is required depending on the exposure index used. But in practice, people often drop this and just assume EI800 linearization).

The ACES community, that's us, right? ;-)

One reason I sent this message is to see if there is interest in obtaining a consensus around community official LUTs (and there is, as you must know, VFX supervisors are in a permanent state of panic regarding the increasing influx of Slog2/Sgamut material coming their way, and right now, they settle on the "least visually horrible" frankeinstein transform they can find. 

And since there is interest (we have to take into account that Sony is wildly pushing with all its political weight to have one and only one camera-referred space and gamma standardized - SMPTE-ST-2048, xvYCC- and preferably their own) I am wondering if we can all join our efforts to at least corroborate the findings.

> I feel like I'm reverse engineering the whole thing and I'm not confident enough (yet) of the rigorous "scene-referredness" of the output. It looks good, > but there has been too much guesswork involved to fully trust it. I would really appreciate some pointers.

Agreed! You definitely need to validate this stuff when so much of the code is untested / bleeding edge.  If you have the time, interest, and access to the camera, nothing beats a ground truth linearization test.  The rough outline for the test is to setup a test chart with a stable light source, and then to shoot an exposure sweep across the full range of exposures. Then, post-linearization, you should be able to align the different exposures and see how close they match! If they all match (other than clipping at the ends of the dynamic range), then your conversion is dead on (at least for grayscale axis).

Yes! We are definitely on the same page. I would just like some help from Sony, ideally.

That being said, I am in touch with them (Sony Electronics) to see if they have information in-house that they could let me take a look at. We are at the NDA stage right now. My feeling is that they "don't". That Sgamut/Slog2 is mostly a marketing initiative for the moment and they have yet to produce rigorous statements about the technical nature of the ideal transforms.

> Finally, on a side note, I would eventually accelerate the linear parts of the color transform through CUBLAS. I think I can achieve realtime speed both for offline conversion and field monitoring. Has anyone tried to port some of the OCIO code to CUDA?

Yes, there have some attempts to do CUDA integration for GPU accelerated 'final quality' transforms, but these were never taken past the prototype stage as of yet.  (Once again, my fault!)  I can point you to the branch if you're interested.

Yes, please!

 There has also been some OpenCL interest.

Hasn't OpenCL gone the way of Cg, already? ;-)


PS: excuse my bad written English. It is really not my mother tongue, nor my working language either.

PPS: I'm a total sucker for your ideas that got implemented in Katana, BTW. Really, this is sexy stuff to me. Have you ever had thoughts about extending/abstracting the Katana ontology/wokflow/project management system into more than just postproduction? Because, and my comment is nothing compared to the recognition you already had for this, I think that looking at a whole movie in this way (including previz, physical in-camera capture, audio, etc.) could be one heck of a deal-changer for filmmaking. That probably deserve another thread or another list entirely, but I thought I'd just pitch it here while I have your attention!

Join { to automatically receive all group messages.