HDR Imaging and Tonemapping

13 Apr 2022


§ The Problem

The real world is in high dynamic range. A camera sensor is unable to capture in HDR; as bright spots may oversaturate a photosensor and force it to white -- losing information in that area. We can modify our exposure time to adjust the amount of light that enters the sensor -- but there is no one exposure time that will give us a good picture, so we must take many.

§ Many-to-one

Camera Pipeline:

  1. 3D Radiance (L=Wsrm2L = \frac{W}{srm^2})
  2. Lens
  3. 2D Sensor (Irradiance, E=Wm2E = \frac{W}{m^2})
  4. Shutter (Exposure time, Δt\Delta t)
  5. Sensor Exposure H=EΔtH = E\Delta t
  6. ADC (Analogue to Digital Converter)
  7. Raw Image
  8. Remapping
  9. Pixel Values (ZZ)

Our goal is to go from ZZ to EE, however, to fully go to EE we must reverse the ADC -- we have no interest or intention of doing so. We can note that EE is invariant to exposure time, so if we know Δt\Delta t we can find EE -- typically Δt\Delta t is given as metadata on an image.

We need multiple images to recover the ZZ vs HH curve -- typically with varying Δt\Delta t.

§ Camera Response Curve

Then, our goal is to find EiE_i or the exposure sensitivity for the iith pixel.

Objective: Solve for EiE_i in Zi,j=f(EiΔtj)Z_{i,j} = f(E_i\Delta t_j), where Zi,jZ_{i,j} represents the value of pixel ii on image jj, and Δtj\Delta t_j is the exposure time on the jjth image.

Using algebra, we get

f1(Zi,j)=EiΔtjlogf1(Zi,j)=logEi+logΔtjg(Z)=logf1(z)g(Zi,j)=logEi+logΔtj.\begin{align*} f^{-1}(Z_{i,j}) &= E_i\Delta t_j\\ \log f^{-1}(Z_{i, j}) &= \log E_i + \log \Delta t_j\\ g(Z) &= \log \circ f^{-1} (z)\\ g(Z_{i, j}) &= \log E_i + \log \Delta t_j. \end{align*}

In the above expression, the values Zi,jZ_{i,j} and Δtj\Delta t_j are known, and g(Z)g(Z) and EiE_i are unknown.

Through optimization, we may solve for the unknown values by via the following

[g,Ei]=arg ming, Eii=1Nj=1P[g(zi,j)logEilogΔtj]2,[g, E_i] = \argmin_{g,\ E_i} \sum_{i=1}^N\sum_{j=1}^P \left[g(z_{i,j}) - \log E_i - \log \Delta t_j\right]^{2},

where NN is the number of pixels, PP the number of images.

Number of Samples

For an 8-bit color depth image, Zi,j[0,255]256Z_{i, j} \in [0, 255] \to 256 values, then there are 256 unknowns for the function g(Z)g(Z), i.e. g(0),g(1),,g(255).g(0), g(1), \dots, g(255). We also have NN unknowns for EiE_i.

  • There are N+256N + 256 unknowns
  • There are NPNP total equations

Then, NP>N+256    NPN>256    N(P1)>256    N>256P1NP > N + 256 \implies NP - N > 256 \implies N(P - 1) > 256 \implies N > \frac{256}{P-1}. Usually N=5(256P1)N = 5\left(\frac{256}{P-1}\right) is sufficient, e.g. if P=11P = 11 then N=128N = 128 random pixel samples is sufficient to recover EE.

This makes the assumption that the images are reliable -- however, this is not the case, we modify our equation to

[g,Ei]=arg ming, Eii=1Nj=1P[w(zi,j)(g(zi,j)logEilogΔtj)]2+λz=02b[w(z)d2dz2g(z)]2,[g, E_i] = \argmin_{g,\ E_i} \sum_{i=1}^N\sum_{j=1}^P \Big[w(z_{i,j})\left(g(z_{i,j}) - \log E_i - \log \Delta t_j\right)\Big]^{2} + \lambda \sum_{z=0}^{2^b} \left[w(z)\frac{d^2}{dz^2}g(z)\right]^2,

  1. w(z)w(z) is a function that assigns a weight to a pixel value; we want large zz values to map to smaller values, as these are generally unreliable. One possible function is the triangle function given by

w(z)={z, if z2b12b(z+1),otherwisew(z) = \begin{cases} z, &\text{ if } z \leq 2^b - 1\\ 2^b - (z + 1), &\text{otherwise} \end{cases}

where bb is the bit-depth of the image. 2. There is no reason for a manufacturer to use a noisy curve; therefore, by making the assumption that the camera's response curve is smooth,

d2dz2g(z)g(z1)2g(z)+g(z+1)=0.\frac{d^2}{dz^2}g(z) \approxeq g(z - 1) - 2g(z) + g(z + 1) = 0.

  1. λ\lambda is a factor used to balance the two terms -- a larger λ\lambda gives a smoother response curve.

We only need to perform this optimization once per camera, then we can reuse it for other images; sometimes this method is called "irradiance calibration".

§ Tonemapping

Previously we solved for g:ZE,g: Z \to E, g(Zi,j)=logEi+logΔtj,g(Z_{i,j}) = \log E_i + \log \Delta t_j, or for every pixel we get PP estimates for logEi,\log E_i, we cannot just use the arithmetic mean as the highly irradiant areas in longer exposures will bias the mean.

We can then reuse the weight function w(Z)w(Z) to get a weighted mean,

logEij=1Pw(Zi,j)[g(Zi,j)logΔtj]j=1Pw(Zi,j).\log E_i \approxeq \frac{\sum_{j=1}^P w(Z_{i, j})\left[g(Z_{i,j}) - \log \Delta t_j\right]}{\sum_{j=1}^P w(Z_{i, j})}.

Having recovered the irradiance of each pixel in an image, we can now use it in various applications such as reusing the irradiance in a computer rendering or in viewing an image -- however, we must compress the high dynamic range into the range displayable by a device.

  • Clipping
    • I=min(E,I)I = \min (E, I)
    • Causes the loss of detail in highly irradiant areas.
  • Scaling
    • I=EmaxEI = \frac{E}{\max E}
    • Causes the loss of detail in areas of low irradiance.
  • Reinhard Global Operator
    • I=E1+EI = \frac{E}{1 + E}
    • Better than scaling and clipping as it increasly sharply at small values (boosting), and compresses the high irradiance areas.
  • Gamma Compression
    • I=(I)γ,I=EmaxEI'' = (I')^\gamma, I' = \frac{E}{\max E}
    • We first perform scaling, then a gamma correction to boost the smaller values like Reinhard.
    • Colour gets washed out.
  • Gamma Compression on Intensity
    • Intensity is given by L=13(R+G+B)L = \frac{1}{3}(R + G + B)
    • Then, gamma compression on intensity L=(L)γ,L=LmaxLL'' = (L')^\gamma, L' = \frac{L}{\max L}
    • Colour is given by ParseError: KaTeX parse error: Got function '\\' with no arguments as argument to '\left' at position 21: …frac{1}{L}\left\̲\̲{R, G, B\right\…
    • Then, the final image is computed via I=LCI'' = L'' C
    • This preserves colour, but instead washes out intensity -- or lowers the constrast of the image.

§ Chiu et. al (1993)

In Chiu[1], the authors propose the following method:

I=I(If(x))=IJ,I' = I - (I \star f(x)) = I - J,

where f(x)f(x) is the Gaussian function,

f(x)=12πσ2ex22σ2.f(x) = \frac{1}{\sqrt{2\pi}\sigma^2}e^{-\frac{x^2}{2\sigma^2}}.

Their process can be visualised by the following image,

Their motivation was to use a lowpass filter (here, we used the Gaussian as an illustration) to isolate ranges of radiance that are perceptually important.

Unfortunately, this approach leads to a "halo" effect around sharp irradiance transitions due to undesirable easing of a transition in a step-like function.

To understand why, we turn towards the definition of the convolution,

J(p)=qf(pq)I(q).J(p) = \sum_{q} f(p - q) I(q).

We may notice that f(pq)f(p - q) is a weighting on I(q)I(q) that is dependent on the spatial distance between the two pixels pp and q.q.

Visually, we can see the effect that this spatial weighting has on the step-like result that we desire

§ Tomasi and Manduchi (1998)

In Tomasi[2], they explore a method called "bilateral filtering" to reduce the halo effect.

The method is given by the following adjustment the convolution calculation

J(p)=1k(p)qfs(pq)fc(I(p)I(q))I(q),(1a)J(p) = \frac{1}{k(p)} \sum_q f_s(p - q) f_c (I(p) - I(q)) I(q), \tag{1a}

where

k(p)=qfs(pq)fc(I(p)I(q)).(1b)k(p) = \sum_q f_s(p - q) f_c(I(p) - I(q)). \tag{1b}

The fsf_s and fcf_c functions represent the spatial and colour filters -- now instead of weighting just the spatial distance of two pixels, we also use the "colour distance".

§ Durand and Dorsey (2002); Paris and Durand (2006)

In Durand[3], they explore a method of computing an approximation of the bilateral filter that is O(1)\mathcal{O}(1) per pixel.

A further improvement upon Durand[3:1] was made in Paris[4]. The authors note that the product of the spatial and range Gaussian defines a higher dimensional Gaussian in the 3D product space between the domain and range of the image.

Visualization of the weights in the 3D product space in Durand[^3]

However, the definition of the bilateral filter in eq. 1 is not a convolution as it is a summation over the 2D spatial domain. The authors introduce a new dimension ζ\zeta for the intensity for each point in the product space in order to define a summation of the 3D space.

We now define for each point (x,y,ζ)(x, y, \zeta) an intensity II, and we let R\mathcal{R} be the interval for which intensites are defined. Also, we define the Kronecker symbol δ(ζ)\delta(\zeta) by δ(0)=1,δ(ζ)=0.\delta(0) = 1, \delta(\zeta) = 0.

J(p)=1k(p)qSζRfs(pq)fc(I(p)ζ)δ(ζI(q))I(q),J(p) = \frac{1}{k(p)} \sum_{q \in \mathcal{S}}\sum_{\zeta \in \mathcal{R}} f_s(||p - q||) f_c(|I(p) - \zeta|) \delta(\zeta - I(q)) I(q),

where

k(p)=qSζRfs(pq)fc(I(p)ζ)δ(ζI(q)).k(p) = \sum_{q \in \mathcal{S}}\sum_{\zeta \in \mathcal{R}} f_s(||p - q||) f_c(|I(p) - \zeta|) \delta(\zeta - I(q)).

We can use homogeneous coordinates to express the same equation,

(W(p)J(p)W(p))=qSζRfs(pq)fc(I(p)ζ)δ(ζI(q))(W(q)I(q)W(q)),\begin{pmatrix} W(p)J(p)\\\\ W(p) \end{pmatrix} = \sum_{q \in \mathcal{S}}\sum_{\zeta \in \mathcal{R}} f_s(||p - q||) f_c(I(p) - \zeta) \delta(\zeta - I(q)) \begin{pmatrix} W(q)I(q)\\\\ W(q) \end{pmatrix},

where W(q)=1.W(q) = 1.

Note that the δ(ζI(q))\delta(\zeta - I(q)) terms are cancelled when ζI(q)\zeta \neq I(q).

The summation is now over the space S×R,\mathcal{S} \times \mathcal{R}, the product fsfcf_s f_c defines a separable Gaussian kernel fs,cf_{s, c} on S×R,\mathcal{S} \times \mathcal{R},

fs,c:(xS,ζR)fs(x)fc(ζ).f_{s, c}: (x \in \mathcal{S}, \zeta \in \mathcal{R}) \to f_s(||x||) f_c(|\zeta|).

Then, we construct the functions ii and ww,

i:(xS,ζR)I(x),i: (x \in \mathcal{S}, \zeta \in \mathcal{R}) \to I(x),

w:(xS,ζR)δ(ζI(x)).w: (x \in \mathcal{S}, \zeta \in \mathcal{R}) \to \delta(\zeta - I(x)).

Then, we rewrite the 3D summation

δ(ζI(q))(W(q)I(q)W(q))=(δ(ζI(q))W(q)I(q)δ(ζI(q))W(q))=(w(q,ζ)i(q,ζ)w(q,ζ)).\delta(\zeta - I(q)) \begin{pmatrix} W(q)I(q)\\\\ W(q) \end{pmatrix}= \begin{pmatrix} \delta(\zeta - I(q))W(q)I(q)\\\\ \delta(\zeta - I(q))W(q) \end{pmatrix}= \begin{pmatrix} w(q, \zeta)i(q, \zeta)\\\\ w(q, \zeta) \end{pmatrix}.

Using separability of the 3D Gaussian,

(W(p)J(p)W(p))=(q,ζ)S×Rfs,c(pq,I(p)ζ)(w(q,ζ)i(q,ζ)w(q,ζ)).\begin{pmatrix} W(p)J(p)\\\\ W(p) \end{pmatrix}= \sum_{(q, \zeta) \in \mathcal{S}\times\mathcal{R}} f_{s, c}(p - q, I(p) - \zeta) \begin{pmatrix} w(q, \zeta)i(q, \zeta)\\\\ w(q, \zeta) \end{pmatrix}.

This formula corresponds to the value of the convolution between fs,cf_{s, c} and the two-dimensional function (wi,w)(wi, w) at (p,I(p))(p, I(p)),

(W(p)J(p)W(p))=[gs,c(wiw)](p,I(p)).\begin{pmatrix} W(p)J(p)\\\\ W(p) \end{pmatrix}= \left[ g_{s, c} \otimes \begin{pmatrix} wi\\\\ w \end{pmatrix} \right] (p, I(p)).

The functions ibi^b and wbw^b are given by

(wbib,wb)=gs,c(wi,w).(w^b i^b, w^b) = g_{s, c} \otimes (wi, w).

Finally, the bilateral filter can be given by

J(p)=wb(p,I(p))ib(p,I(p))wb(p,I(p)).J(p) = \frac{w^b(p, I(p))i^b(p, I(p))}{w^b(p, I(p))}.

First, we perform slicing or evaluating wbibw^bi^b and wbw^b at (p,I(p)),(p, I(p)), then division in order to normalize the result.


  1. K. Chiu, M. Herf, P Shirley, S. Swamy, C. Wang, K. Zimmerman. "Spatially Nonuniform Scaling Functions for High Contrast Images" (1993). ↩︎

  2. C. Tomasi, R. Manduchi. "Bilateral Filtering for Gray and Color Images" (1998). ↩︎

  3. F. Durand, J. Dorsey. "Fast Bilateral Filtering for the Display of High-Dynamic-Range Images" (2002). ↩︎ ↩︎

  4. S. Paris, F. Durand. "A Fast Approximation of the Bilateral Filter using a Signal Processing Approach" (2006). ↩︎