S-Log2/S-gamut 10-bit 422 XAVC to Linear ACES RGB EXR
Vincent Olivier <vin...@...>
I'm trying to convert the S-Log2/S-gamut 10-bit 422 XAVC footage coming out of my Sony F55 camera to a sequence of Linear ACES RGB OpenEXR files. I would like to validate the assumptions I make throughout the conversion process with you guys, if you be so kind.
Little-endian S-log2/S-gamut 10-bit 422 YCbCr to little-endian S-log2/S-gamut 10-bit 444 YCbCr
First, I'm using FFMPEG to open the XAVC file (which it recognizes simply as a MXF-muxed XAVC Intra stream). The x264 decoding seems to work superbly as far as I can see. Then, I'm using FFMEG's Lanczos 422 to 444 upscaling algorithm, which is slow, but produces the best results, IMHO.
My only question here is: what does Poyton mean when, comparing 8-bit to 10-bit, he says "The extra two bits are appended as least-significant bits to provide increased precision."?
Because FFMPEG indicates that the stream is 10-bit little-endian, which calls for a 8-bit shift of the second byte (lines 168-184 in my code). Anyways, this seems to work just fine. I'm just checking if there is something I don't understand in Poyton's qualification of the 10-bit YCbCr bitstream endianness or maybe it's reformatted under the hood by FFMPEG from the raw XAVC output. Mystery…
S-log2/S-gamut YCbCr to S-log2/S-gamut R'G'B'
My assumptions here are that:
1) The camera uses the Rec 709 R'G'B' to YCbCr transform. And I use the reverse Rec 708 to get R'G'B' from YCbCr. Or is there something in SMPTE-ST-2048-1:2011 I should know about and take into consideration here?
2) The footroom provision is 0…64 for luma samples and 0…512 for chroma samples. See lines 187-191 in my code. I have adapted that from the 8-bit footroom values (16/128) because it seems to make sense according to basic signal statistics I've made on the samples from one frame. But I'm REALLY not sure about that…
3) Slog2 code uses "full-range" RGB (0…255 and not 0…219). See matrix at lines 131-136 in my code for the YCbCr to RGB "full-range". The headroom-preserving transform matrix is at 140-145 (I'm not using this one).
S-log2/S-gamut R'G'B' to Linear S-gamut RGB
This is where it gets interesting. I have adapted the "slog2.py" code, part of the OpenColorIO-Configs project on Github provided by Jeremy Selan (I sent him an email weeks ago and didn't hear from him).
Assumptions made in my code:
1) The rescale at slog2.py:17-23 is redundant if you provide "full-range" RGB to the S-Log2 linearization algorithm. I left it in my code (see lines 65-66). But commenting it out seems to give more dynamic range to the result. Again, I might be dead wrong on this.
2) The differences between S-log1 and S-log2 are only: A: for the same input, S-log1 doesn't have a rescaling step and S-log2 has one (see my previous point), B: there is a linear portion in the shadows, and C: the highlight-portion power-function is scaled by 219/155.
Linear S-gamut RGB to Linear ACES RGB
There are two distinct CTL transforms from S-gamut to ACES for two white points: 3200K and 5500K. Why? Would one get the 5500K transform matrix by applying a white-balance transform on the 3200K version (and vice-versa)?
I feel like I'm reverse engineering the whole thing and I'm not confident enough (yet) of the rigorous "scene-referredness" of the output. It looks good, but there has been too much guesswork involved to fully trust it. I would really appreciate some pointers.
Finally, on a side note, I would eventually accelerate the linear parts of the color transform through CUBLAS. I think I can achieve realtime speed both for offline conversion and field monitoring. Has anyone tried to port some of the OCIO code to CUDA?
Thanks for everything!
Jeremy Selan <jeremy...@...>
Interesting code, thanks for sharing! My apologies for not replying sooner, my bandwidth has been way too limited of late. :(
There are quite a few different questions here. Let me take a stab some.
First off, can you share a bit more about what you're trying to accomplish in transcoding the F55 stream to OpenEXR? What are you hoping to do with the EXR frames? Are you aiming for real-time playback? Is the encoding performance critical? Hearing more about the desired usage would be very helpful. Next off, how are you going to view these linearized EXR frames? Note that when you use the referenced color math to go to scene-linear, you'll probably prefer some sort of 's-shaped' tone mapping operator, rather than a simple gamma curve or such.
The reason I ask is that in your code example, you appear have a single chunk of code that is responsible for both decoding the frames, and applying a particular set of hard-coded color transformations. In my personal experience, I tend to gravitate towards separable chunks of re-usable processing rather than 'all in 1' binaries. For example, it's common to have multiple productions at Imageworks concurrently which, while sharing input cameras, may choose to use slightly different input color transforms. For this reason, in OpenColorIO the color configurations are loaded at runtime, rather than being built in.
Here's an example of a stand-alone binary, which uses OpenImageIO to do the image reading/writing, and OpenColorIO to do the color processing. Note that the color math is not built-in, but is abstracted away in the library:
> Then, I'm using FFMEG's Lanczos 422 to 444 upscaling algorithm, which is slow, but produces the best results, IMHO.
Agreed! Lanczos is a great compromise between maintaining sharpness, without introducing too much overshoot/undershoot. :)
> My only question here is: what does Poyton mean when, comparing 8-bit to 10-bit, he says "The extra two bits are appended as least-significant bits to provide increased precision."?
I believe what poynton is saying is that when converting between bit-depths at different precisions, you typically get them as extra LSB info. Put another way, say you have a floating-point image representation where pixels values are between [0.0-1.0]. (This is NOT scene-linear imagery, of course). If we wanted to encode that in 8 bits, we would use integer values 0-255. And for 10 bits, we would using 0-1023. Note that scaling from 8 to 10 bits (or back) is NOT a simple bit shift. Recall that a bit shift by 2 places is a simple mult or divide by 4, so if we took 255 to 10 bits using shifting 255 would map to 1020! Ugh! (Remember that we want to use the full 1023 sized coding range). So I tend to think about bit changes as a mult/divide by the max. I.e., to go from 8 bits -> 10 in a manner that uses the full range, you must mult by 1023/255.0. There are more bit-efficient ways to do this - and old OpenImageIO thread discusses the subtleties - I can search for it if you're interested. For non-performance critical versions, going to float as an intermediate representation is usually the best option. Getting integer math right is hard, in some crappy non-obvious ways.
Your code has a few lines similar to,
> pow(256, i)
This 'double' arithmetic is probably not what you're looking for. Perhaps a simple bitshift instead?
> S-log2/S-gamut YCbCr to S-log2/S-gamut R'G'B'
> 1) The camera uses the Rec 709 R'G'B' to YCbCr transform. And I use the reverse Rec 708 to get R'G'B' from YCbCr. Or is there something in SMPTE-ST-2048-1:2011 I should know about and take into consideration here?
Your matrixing back to rgb, with appropriate range consideration, is probably appropriate. What I'd recommend for validating your code is to see if you can break this processing into separate steps and compare against known reference solutions. For example, does this sony camera have any way to write out an RGB image directly? Or, does sony provide any reference software to transcode the stream to an uncompressed full range RGB image? Step one for testing your code is to disable the linearization, and to only compare the YCbCr transform bits.
If memory serves, I also believe that openexr has native support for subsampled chroma images, you may want to investigate that.
As you note, YCbCr most often utilizes the range of 16-235 (for 8 bits) and 64-940 (for 10 bits) when storing *rec709* imagery. However, the Slog2 imagery takes advantage of this extra range so you have to be careful not to build in the wrong scale factors. Once again, off the top of my head I'm not sure if your code is correct or not. But if I were in your shoes I would carefully compare the reconstructed RGB full range image versus a known correct result. (this may require capturing with a different setting in the camera).
> There are two distinct CTL transforms from S-gamut to ACES for two white points: 3200K and 5500K. Why? Would one get the 5500K transform matrix by applying a white-balance transform on the 3200K version (and vice-versa)?
The color transform you're looking is tailored to the F65, FYI, so I'm not sure how closely these matricies would match for your camera. The reason there are two transforms is that I believe Sony has optimized the conversion to SLog2, in the F65, to be specific to the color balance on the camera. This is pretty non-standard, and so should be taken with a grain of salt until we get an official IDT from the ACES community. But in my understanding the different IDTs are required for strict accuracy. Perhaps if you have a camera at hand, you can do an example test with both approaches, and see how large the residual differences are?
In practice, people my prefer to standardize on one of the IDTs for sanity sake, even if it's not perfect in all situations. (An example of a similar common practice would be the Arri Alexa's log-c conversion ,where a different lut is required depending on the exposure index used. But in practice, people often drop this and just assume EI800 linearization).
> I feel like I'm reverse engineering the whole thing and I'm not confident enough (yet) of the rigorous "scene-referredness" of the output. It looks good, > but there has been too much guesswork involved to fully trust it. I would really appreciate some pointers.
Agreed! You definitely need to validate this stuff when so much of the code is untested / bleeding edge. If you have the time, interest, and access to the camera, nothing beats a ground truth linearization test. The rough outline for the test is to setup a test chart with a stable light source, and then to shoot an exposure sweep across the full range of exposures. Then, post-linearization, you should be able to align the different exposures and see how close they match! If they all match (other than clipping at the ends of the dynamic range), then your conversion is dead on (at least for grayscale axis).
> Finally, on a side note, I would eventually accelerate the linear parts of the color transform through CUBLAS. I think I can achieve realtime speed both for offline conversion and field monitoring. Has anyone tried to port some of the OCIO code to CUDA?
Yes, there have some attempts to do CUDA integration for GPU accelerated 'final quality' transforms, but these were never taken past the prototype stage as of yet. (Once again, my fault!) I can point you to the branch if you're interested. There has also been some OpenCL interest.
For simple monitoring though, you dont necessarily need to go to scene-linear, but instead can go straight from slog2 to the display transform using a 3d-lut. OCIO does support this on the GPU already, see the ociodisplay example for the code.
On Sun, May 26, 2013 at 6:38 PM, Vincent Olivier <vin...@...> wrote:
Vincent Olivier <vin...@...>
Thanks so much for your reply.
On 2013-05-29, at 12:09 AM, Jeremy Selan <jeremy...@...> wrote:
For now, I "only" want to find a way to get the most accurate, richest, "objectively" (standalone) scene-referred images from my camera in ACES-linear. This is the first step: to which degree can the images from the F55/F65 be considered a physically accurate photographic reference measurement of the the scene for processing and also for archival (I don't think only keeping the original camera-referred data is sufficient in the long term). I'm also looking for a way to keep other exposure-related data such as t-stop and sensitivity and other in-camera processing logs into the EXR headers to be able to physically qualify a scene based with regards to the image.
I'm a programmer and a photographer. So, applications I'm looking forward to tackle are, first and foremost, of course, related to the mathematical and computational aspects of color and lighting æsthetics. I mean, I am very bored with the state of cinematography right now, there are really only 2 looks applied to every single movie: the Transformers anamorphic-crushed lensflare blue and the HDSLR peaches and cream indy porridge. Since The Archers (especially Red Shoes and Black Narcissus, and notice that these 2-words titles start with the name of a color), I haven't seen cinematography that can get a braingasm out of my visual cortex. Just look at what is created by the contrast between the herbal arrangement and the dresses in the the fashion show at the end of Queen Cotton (starting at 10:00). That's what I call art. I think the job of a cinematographer (a good one anyways) has to extend all the way to the computer imagery, the compositing and the image finishing of the movie in the digital realm. That's my personal goal.
Image-based lighting for unbiased rendering is another area of innovation I will be interested to look into in the near future once I get comfortable with ACES. I cannot afford access to Arnold nodes, but some Arnold core developers have contributed to Blender's Cycles in a significant way, and I really like this renderer, and it's open source. And the color pipeline is a really hot topic right now in the Blender community.
I'm not sure this answers you question. But I hope it clarifies my intentions a bit.
I only have access to Rec 709 monitors, some OLED ones. So I look at the images through an ACES to Rec 709 LUT, but the computational space is ACES and as far as measurements are concerned, I rely on the various analytical tools, mostly histograms and vectorscopes (I'm building a custom CIE [x,y] diagram vectorscope to track what happens to image data in and out of color transforms as a way to visually represent those transforms: VERY useful when you are trying to communicate where you want to go with your color pipeline).
Well, to me, before it's ACES, it's garbage™. ;-) My code reflects this philosophy. I don't think I will find a need for independent Sgamut and Slog2 transforms. Maybe the YCbCr to RGB transform will become optional from the Sgamut/Slog2 handling depending on the original footage input, yes. But I am also looking for the most computationally efficient code for a specific application (and I will probably write one big CUDA kernel for that too) and code-gathering gets me there (at the price of elegance, perhaps, but, well, performance is important in a production context). And this code is merely a proof of concept for now.
And I would love to contribute the code I'm writing back to OCIO, in the elegant dynamic linear 3D color space transform combination you created. It's just that I need to get my YCbCr to ACES transform right, first. Why don't Imageworks have the color computing equivalent of Google's Summer of Code? I'll bring my sleeping bag and my toothbrush at your signal! ;-)
Agree with the [0, 255] to [0,1023], but then saying that the extra 2 bits are the least significant doesn't really makes sense… But anyway, if we both thing that's what he meant and coding makes sense visually, then, that's that.
Agreed. That's how I do it in lines 187 to 191. But as far as headroom/footroom are concerned I don't go from 8 to 10, I simply take out the assumed 10-bit footroom values from the sample and then divide it by the 10-bit max.
I chose a power function because I do not know the endianness at this point. It is not implemented yet, but I want to be able
Yes, I'm partnering with a local FX company to take reference shots of a macbeth + f-stop chart and we'll see. But I would REALLY appreciate someone from Sony Electronics to validate the code at some point in time. I know they probably don't start with something linear in-camera to get to Slog2/Sgamut, but they sure have done something similar to what I'm doing… I just find it odd that they didn't publish it (if they have it, which might not be the case), like they did for the original Slog/Sgamut transform in the 2009 whitepaper.
I can and did output separate files with for each Y, Cb and Cr channels as grayscale. But I didn't find a way to put all three in one file.
Yes, will be looking for a raw recorder to get their 16-bit Slog2/Sgamut RAW to OpenEXR ACES transform. However, my understanding is that they are at the same point as I: their Slog2/Sgamut transforms are still a work in progress, even for the F65 (based on what I can see in Vegas and in their RAW Viewer).
The ACES community, that's us, right? ;-)
One reason I sent this message is to see if there is interest in obtaining a consensus around community official LUTs (and there is, as you must know, VFX supervisors are in a permanent state of panic regarding the increasing influx of Slog2/Sgamut material coming their way, and right now, they settle on the "least visually horrible" frankeinstein transform they can find.
And since there is interest (we have to take into account that Sony is wildly pushing with all its political weight to have one and only one camera-referred space and gamma standardized - SMPTE-ST-2048, xvYCC- and preferably their own) I am wondering if we can all join our efforts to at least corroborate the findings.
Yes! We are definitely on the same page. I would just like some help from Sony, ideally.
That being said, I am in touch with them (Sony Electronics) to see if they have information in-house that they could let me take a look at. We are at the NDA stage right now. My feeling is that they "don't". That Sgamut/Slog2 is mostly a marketing initiative for the moment and they have yet to produce rigorous statements about the technical nature of the ideal transforms.
Hasn't OpenCL gone the way of Cg, already? ;-)
PS: excuse my bad written English. It is really not my mother tongue, nor my working language either.
PPS: I'm a total sucker for your ideas that got implemented in Katana, BTW. Really, this is sexy stuff to me. Have you ever had thoughts about extending/abstracting the Katana ontology/wokflow/project management system into more than just postproduction? Because, and my comment is nothing compared to the recognition you already had for this, I think that looking at a whole movie in this way (including previz, physical in-camera capture, audio, etc.) could be one heck of a deal-changer for filmmaking. That probably deserve another thread or another list entirely, but I thought I'd just pitch it here while I have your attention!
Vincent Olivier <vin...@...>
Oups, I didn't finish this sentence…
On 2013-05-29, at 12:51 PM, Vincent Olivier <vin...@...> wrote:
…to work with big-endian streams. ;-)
> " If we wanted to encode that in 8 bits, we would use integer values 0-255. And for 10 bits, we would using 0-1023. Note that scaling from 8 to 10 bits (or back) is NOT a simple bit shift. Recall that a bit shift by 2 places is a simple mult or divide by 4, so if we took 255 to 10 bits using shifting 255 would map to 1020! Ugh!"
It is a small point, but this is incorrect; in Rec. 709 it really is a factor of 4 between them (not 1023/255). According to the specification, 8-bit value 16 is equivalent to 10-bit value 64 and 8-bit 235 is equivalent to 10-bit 940, therefore 8-bit 255 is equivalent to 10-bit 1020 (not 1023). This aligns with what Poynton says, "The extra two bits are appended as least-significant bits to provide increased precision". See also http://en.wikipedia.org/wiki/Rec._709#Digital_representation
> "Getting integer math right is hard, in some crappy non-obvious ways."
On Tue, May 28, 2013 at 11:09 PM, Jeremy Selan <jeremy...@...> wrote:
Brendan Bolles <bre...@...>
On Jun 4, 2013, at 9:38 AM, Dithermaster wrote:
It is a small point, but this is incorrect; in Rec. 709 it really is a factor of 4 between them (not 1023/255). According to the specification, 8-bit value 16 is equivalent to 10-bit value 64 and 8-bit 235 is equivalent to 10-bit 940, therefore 8-bit 255 is equivalent to 10-bit 1020 (not 1023).
Interesting! Of course, this only applies if your source is arriving with headroom and footroom, which it often does not. After all, Rec. 709 is really YCrCb, right? So you rarely would access the image without at least an RGB conversion.
If your 10-bit RGB image does have headroom and footroom, you probably want to run tests to make sure you really are getting a 64-940 signal. Of the people who have imported files with headroom in Nuke, I'm sure many have expanded the range by setting white to 0.922 (235/255) where maybe they should have been using 0.919 (940/1023).
The OCIO sample configurations wisely stay out of this and do all their Rec. 709 conversions using the full 0.0-1.0 range.
Troy Sobotka <troy.s...@...>
On Tue, Jun 4, 2013 at 10:33 AM, Brendan Bolles <bre...@...> wrote:
On Jun 4, 2013, at 9:38 AM, Dithermaster wrote:Should it not be considered ITU-BT-709 if and only if it complies withIt is a small point, but this is incorrect; in Rec. 709 it really is a factor of 4 between them (not 1023/255). According to the specification, 8-bit value 16 is equivalent to 10-bit value 64 and 8-bit 235 is equivalent to 10-bit 940, therefore 8-bit 255 is equivalent to 10-bit 1020 (not 1023).Interesting! Of course, this only applies if your source is arriving with headroom and footroom, which it often does not. After all, Rec. 709 is really YCrCb, right? So you rarely would access the image without at least an RGB conversion.
the specification range? Further, that headroom and footroom should be
baked into the YCbCr.
It would seem there are several combinations that could make for complexity:
1) Baked YCbCr in a "full range" mode. (1-254 with 0 and 255 reserved
as per the ITU-BT-709 section 6.11 for timing.)
2) Baked YCbCr with proper broadcast range. (16-235/240 and 709 transfer curve.)
And the non-standard vendor specific adjustments such as:
1) Baked YCbCr in a non-standard full range mode (0-255 JFIF / JPEG as
per many DSLRs, which often are 601 primaries and 709 transfer curve.)