# HDR Imaging and Tonemapping

## The Problem§

The real world is in high dynamic range. A camera sensor is unable to capture in HDR; as bright spots may oversaturate a photosensor and force it to white – losing information in that area. We can modify our exposure time to adjust the amount of light that enters the sensor – but there is no one exposure time that will give us a good picture, so we must take many.

## Many-to-one§

Camera Pipeline:

- 3D Radiance ($L = \frac{W}{srm^2}$)
- Lens
- 2D Sensor (Irradiance, $E = \frac{W}{m^2}$)
- Shutter (Exposure time, $\Delta t$)
- Sensor Exposure $H = E\Delta t$
- ADC (Analogue to Digital Converter)
- Raw Image
- Remapping
- Pixel Values ($Z$)

Our goal is to go from $Z$ to $E$, however, to fully go to $E$ we must reverse the ADC – we have no interest or intention of doing so. We can note that $E$ is invariant to exposure time, so if we know $\Delta t$ we can find $E$ – typically $\Delta t$ is given as metadata on an image.

We need multiple images to recover the $Z$ vs $H$ curve – typically with varying $\Delta t$.

## Camera Response Curve§

Then, our goal is to find $E_i$ or the exposure sensitivity for the $i$^{th} pixel.

Objective: Solve for $E_i$ in $Z_{i,j} = f(E_i\Delta t_j)$,
where $Z_{i,j}$ represents the value of pixel $i$ on image $j$, and $\Delta t_j$ is the exposure time on the $j$^{th} image.

Using algebra, we get $$ \begin{align*} f^{-1}(Z_{i,j}) &= E_i\Delta t_j\\ \log f^{-1}(Z_{i, j}) &= \log E_i + \log \Delta t_j\\ g(Z) &= \log \circ f^{-1} (z)\\ g(Z_{i, j}) &= \log E_i + \log \Delta t_j. \end{align*} $$

In the above expression, the values $Z_{i,j}$ and $\Delta t_j$ are known, and $g(Z)$ and $E_i$ are unknown.

Through optimization, we may solve for the unknown values by via the following $$ [g, E_i] = \argmin_{g,\ E_i} \sum_{i=1}^N\sum_{j=1}^P \left[g(z_{i,j}) - \log E_i - \log \Delta t_j\right]^{2}, $$ where $N$ is the number of pixels, $P$ the number of images.

Number of Samples

For an 8-bit color depth image, $Z_{i, j} \in [0, 255] \to 256$ values, then there are 256 unknowns for the function $g(Z)$, i.e. $g(0), g(1), \dots, g(255).$ We also have $N$ unknowns for $E_i$.

- There are $N + 256$ unknowns
- There are $NP$ total equations

Then, $NP > N + 256 \implies NP - N > 256 \implies N(P - 1) > 256 \implies N > \frac{256}{P-1}$. Usually $N = 5\left(\frac{256}{P-1}\right)$ is sufficient, e.g. if $P = 11$ then $N = 128$ random pixel samples is sufficient to recover $E$.

This makes the assumption that the images are reliable – however, this is not the case, we modify our equation to

$$ [g, E_i] = \argmin_{g,\ E_i} \sum_{i=1}^N\sum_{j=1}^P \Big[w(z_{i,j})\left(g(z_{i,j}) - \log E_i - \log \Delta t_j\right)\Big]^{2} + \lambda \sum_{z=0}^{2^b} \left[w(z)\frac{d^2}{dz^2}g(z)\right]^2, $$

- $w(z)$ is a function that assigns a weight to a pixel value; we want large $z$ values to map to smaller values, as these are generally unreliable. One possible function is the triangle function given by $$ w(z) = \begin{cases} z, &\text{ if } z \leq 2^b - 1\\ 2^b - (z + 1), &\text{otherwise} \end{cases} $$ where $b$ is the bit-depth of the image.
- There is no reason for a manufacturer to use a noisy curve; therefore, by making the assumption that the camera’s response curve is smooth, $$ \frac{d^2}{dz^2}g(z) \approxeq g(z - 1) - 2g(z) + g(z + 1) = 0. $$
- $\lambda$ is a factor used to balance the two terms – a larger $\lambda$ gives a smoother response curve.

We only need to perform this optimization once per camera, then we can reuse it for other images; sometimes this method is called “irradiance calibration”.

## Tonemapping§

Previously we solved for $g: Z \to E, $ $g(Z_{i,j}) = \log E_i + \log \Delta t_j,$ or for every pixel we get $P$ estimates for $\log E_i,$ we cannot just use the arithmetic mean as the highly irradiant areas in longer exposures will bias the mean.

We can then reuse the weight function $w(Z)$ to get a weighted mean, $$ \log E_i \approxeq \frac{\sum_{j=1}^P w(Z_{i, j})\left[g(Z_{i,j}) - \log \Delta t_j\right]}{\sum_{j=1}^P w(Z_{i, j})}. $$

Having recovered the irradiance of each pixel in an image, we can now use it in various applications such as reusing the irradiance in a computer rendering or in viewing an image – however, we must compress the high dynamic range into the range displayable by a device.

- Clipping
- $I = \min (E, I)$
- Causes the loss of detail in highly irradiant areas.

- Scaling
- $I = \frac{E}{\max E}$
- Causes the loss of detail in areas of low irradiance.

- Reinhard Global Operator
- $I = \frac{E}{1 + E}$
- Better than scaling and clipping as it increasly sharply at small values (boosting), and compresses the high irradiance areas.

- Gamma Compression
- $I’’ = (I’)^\gamma, I’ = \frac{E}{\max E}$
- We first perform scaling, then a gamma correction to boost the smaller values like Reinhard.
- Colour gets washed out.

- Gamma Compression on Intensity
- Intensity is given by $L = \frac{1}{3}(R + G + B)$
- Then, gamma compression on intensity $L’’ = (L’)^\gamma, L’ = \frac{L}{\max L}$
- Colour is given by $C = \frac{1}{L}\left\{R, G, B\right\}$
- Then, the final image is computed via $I’’ = L’’ C$
- This preserves colour, but instead washes out intensity – or lowers the constrast of the image.

### Chiu et. al (1993)§

In Chiu^{1}, the authors propose the following method:

$$ I’ = I - (I \star f(x)) = I - J, $$

where $f(x)$ is the Gaussian function,

$$ f(x) = \frac{1}{\sqrt{2\pi}\sigma^2}e^{-\frac{x^2}{2\sigma^2}}. $$

Their process can be visualised by the following image,

Their motivation was to use a lowpass filter (here, we used the Gaussian as an illustration) to isolate ranges of radiance that are perceptually important.

Unfortunately, this approach leads to a “halo” effect around sharp irradiance transitions due to undesirable easing of a transition in a step-like function.

To understand why, we turn towards the definition of the convolution,

$$ J(p) = \sum_{q} f(p - q) I(q). $$

We may notice that $f(p - q)$ is a weighting on $I(q)$ that is dependent on the spatial distance between the two pixels $p$ and $q.$

Visually, we can see the effect that this spatial weighting has on the step-like result that we desire

### Tomasi and Manduchi (1998)§

In Tomasi^{2}, they explore a method called “bilateral filtering” to reduce the halo effect.

The method is given by the following adjustment the convolution calculation

$$ J(p) = \frac{1}{k(p)} \sum_q f_s(p - q) f_c (I(p) - I(q)) I(q), \tag{1a} $$

where

$$ k(p) = \sum_q f_s(p - q) f_c(I(p) - I(q)). \tag{1b} $$

The $f_s$ and $f_c$ functions represent the spatial and colour filters – now instead of weighting just the spatial distance of two pixels, we also use the “colour distance”.

### Durand and Dorsey (2002); Paris and Durand (2006)§

In Durand^{3}, they explore a method of computing an approximation of the bilateral filter that is $\mathcal{O}(1)$ per pixel.

A further improvement upon Durand^{3} was made in Paris^{4}.
The authors note that the product of the spatial and range Gaussian defines a higher dimensional Gaussian in the 3D product space between the domain and range of the image.

Visualization of the weights in the 3D product space in Durand^{3}

However, the definition of the bilateral filter in **eq. 1** is not a convolution as it is a summation over the 2D spatial domain.
The authors introduce a new dimension $\zeta$ for the intensity for each point in the product space in order to define a summation of the 3D space.

We now define for each point $(x, y, \zeta)$ an intensity $I$, and we let $\mathcal{R}$ be the interval for which intensites are defined. Also, we define the Kronecker symbol $\delta(\zeta)$ by $\delta(0) = 1, \delta(\zeta) = 0.$

$$ J(p) = \frac{1}{k(p)} \sum_{q \in \mathcal{S}}\sum_{\zeta \in \mathcal{R}} f_s(||p - q||) f_c(|I(p) - \zeta|) \delta(\zeta - I(q)) I(q), $$

where

$$ k(p) = \sum_{q \in \mathcal{S}}\sum_{\zeta \in \mathcal{R}} f_s(||p - q||) f_c(|I(p) - \zeta|) \delta(\zeta - I(q)). $$

We can use homogeneous coordinates to express the same equation,

$$ \begin{pmatrix} W(p)J(p)\\ W(p) \end{pmatrix} = \sum_{q \in \mathcal{S}}\sum_{\zeta \in \mathcal{R}} f_s(||p - q||) f_c(I(p) - \zeta) \delta(\zeta - I(q)) \begin{pmatrix} W(q)I(q)\\ W(q) \end{pmatrix}, $$

where $W(q) = 1.$

Note that the $\delta(\zeta - I(q))$ terms are cancelled when $\zeta \neq I(q)$.

The summation is now over the space $\mathcal{S} \times \mathcal{R},$ the product $f_s f_c$ defines a separable Gaussian kernel $f_{s, c}$ on $\mathcal{S} \times \mathcal{R},$

$$ f_{s, c}: (x \in \mathcal{S}, \zeta \in \mathcal{R}) \to f_s(||x||) f_c(|\zeta|). $$

Then, we construct the functions $i$ and $w$,

$$ i: (x \in \mathcal{S}, \zeta \in \mathcal{R}) \to I(x), $$

$$ w: (x \in \mathcal{S}, \zeta \in \mathcal{R}) \to \delta(\zeta - I(x)). $$

Then, we rewrite the 3D summation

$$ \delta(\zeta - I(q)) \begin{pmatrix} W(q)I(q)\\ W(q) \end{pmatrix}= \begin{pmatrix} \delta(\zeta - I(q))W(q)I(q)\\ \delta(\zeta - I(q))W(q) \end{pmatrix}= \begin{pmatrix} w(q, \zeta)i(q, \zeta)\\ w(q, \zeta) \end{pmatrix}. $$

Using separability of the 3D Gaussian,

$$ \begin{pmatrix} W(p)J(p)\\ W(p) \end{pmatrix}= \sum_{(q, \zeta) \in \mathcal{S}\times\mathcal{R}} f_{s, c}(p - q, I(p) - \zeta) \begin{pmatrix} w(q, \zeta)i(q, \zeta)\\ w(q, \zeta) \end{pmatrix}. $$

This formula corresponds to the value of the convolution between $f_{s, c}$ and the two-dimensional function $(wi, w)$ at $(p, I(p))$,

$$ \begin{pmatrix} W(p)J(p)\\ W(p) \end{pmatrix}= \left[ g_{s, c} \otimes \begin{pmatrix} wi\\ w \end{pmatrix} \right] (p, I(p)). $$

The functions $i^b$ and $w^b$ are given by

$$ (w^b i^b, w^b) = g_{s, c} \otimes (wi, w). $$

Finally, the bilateral filter can be given by

$$ J(p) = \frac{w^b(p, I(p))i^b(p, I(p))}{w^b(p, I(p))}. $$

First, we perform *slicing* or evaluating $w^bi^b$ and $w^b$ at $(p, I(p)),$ then division in order to normalize the result.

K. Chiu, M. Herf, P Shirley, S. Swamy, C. Wang, K. Zimmerman. “Spatially Nonuniform Scaling Functions for High Contrast Images” (1993). ↩︎

C. Tomasi, R. Manduchi. “Bilateral Filtering for Gray and Color Images” (1998). ↩︎

F. Durand, J. Dorsey. “Fast Bilateral Filtering for the Display of High-Dynamic-Range Images” (2002). ↩︎ ↩︎ ↩︎

S. Paris, F. Durand. “A Fast Approximation of the Bilateral Filter using a Signal Processing Approach” (2006). ↩︎