The Pinhole§

In a scene, there are many rays of light coming from many different directions. If we attempt to capture all of the rays, we will end up with a superposition of rays from many different points in the scene for each point on the sensor. This superposition results in a blurry image.

One way around this is to block off some of the light, i.e. we let some light in through a pinhole and block all other incoming rays.

The opening in this barrier is called the aperature. In the pinhole model,

  • we capture rays that fall onto a single point, this point is called the centre of projection (COP)
  • an image is formed on the image plane or the sensor
  • the distance $d$ between the image plane and the COP is called the image distance.

Notice in the illustration that the ray from the top of the scene ends up at the bottom of the sensor, and the bottom ray at the top. The pinhole camera captures an inverted view of the world.

Doubling the Image Distance§

What would happen to:

  • the projected object height
  • the amount of light gathered

Using similar triangles, we can compute the change in projected object height,

$$ \frac{L}{s} = \frac{2d}{d} = 2. $$

Therefore, if we were to double the image distance we would double the projected height.

In the image above, suppose we were to translate the sensor $s$ such that it lies $2d$ units away from the COP and kept the sensor size the same. Assuming that at $d$ units away, we captured $1$ unit of light, then what happens at $2d$?

Keep in mind that the illustration above is a 1D representation, the sensor in actuality is a 2D square.

$$ \begin{align*} \frac{s^2}{(2s)^2} &= \frac{x}{1}\\ x &= \frac{1}{4} \end{align*} $$

Therefore, the amount of light captured at $2d$ is a quarter of that at $d$.

A Dimensional Reduction Machine§

The camera takes in a 3D scene and transforms it into a 2D image. In this reduction, we lose:

  • depth,
  • length,
  • angles.

Modeling Projection (Perspective)§

Coordinate System§

  1. We assume that the pinhole model is a good approximation
  2. The optical centre (COP) is put at the origin
  3. The image plane (projection plane) is placed in front of the COP
    • This causes the captured image to have the expected orientation, instead of the inversion that we encountered previously.
  4. The camera looks down the negative z-axis
    • This allows us to use a right-handed coordinate system.

Point Projection§

Using the following 1D representation,

we can derive $(x’, y’)$ as follows:

$$ \begin{align*} \frac{y’}{y} &= \frac{-d}{z}\\ y’ &= -d \frac{y}{z}, \end{align*} $$

$x’$ has the same relationship to $x$ as $y’$ to $y,$ i.e.:

$$ x’ = -d \frac{x}{z}. $$

Finally, perspective transformation from $\mathbb{R}^3$ to $\mathbb{R}^2$ is given by

$$ (x, y, z) \to \left(-d \frac{x}{z}, -d \frac{y}{z}\right). $$

Orthographic Projection§

This is also called parallel projection, here we simply drop the $z$ coordinate,

$$ (x, y, z) \to (x, y). $$

This is typically used in technical drawings and computer aided design (CAD). Unlike perspective projection, the length of lines can be trusted in orthographic.


We can reduce blurriness in an image by reducing the aperature size, however, this is not always practical as it also reduces the amount of light hitting the sensor. To get around the reduced light, we can increase the exposure time; this is not without caveat, as this can introduce motion blur. Finally, we can also get banding from diffraction effects with a small enough aperature. For these reasons, we instead use lenses.

With lenses, there is a specific distance at which objects are in focus. Other points project into a circle of confusion on the sensor.

Ideally, we want to have the focal plane on the object, thus reducing the size of the circle of confusion. In the above illustration, we achieve this by moving the sensor closer to the lens.

If we want to focus at infinity, we need to move the sensor to be the focal distance away from the lens.

Focal Length§

Focal Length

The focal length is defined to be the distance between the sensor and the lens when focused at infinity.

At the focal length, all parallel lines converge to the same point.

In the above image,

  • $d’$ is the object distance,
  • $d$ is the image distance,
  • $f$ is the focal distance.

Thin Lens Optics§

The thickness of the lens is negligible in comparison to the curvature radii. For well-behaved lenses, we can simplify the geometrical optics:

  • All parallel rays converge to a single point at the focal length
  • All rays going through the centre of the lens are not deviated


For every image plane there is a focal plane. For all ray originating from the same point on the focal plane, they converge to a single point on the image plane.

Thin Lens Formula§

Assuming that we have a thin lens, we can find a formula for the relation between the image, object, and focal distances.

Using similar triangles from the above illustration, we get the relation

$$ \frac{y’}{y} = \frac{d’}{d}. $$

Then, with some algebraic steps: $$ \begin{align*} \frac{y’}{y} &= \frac{f}{d - f}\\ df &= d’d - d’f\\ df + d’f &= d’d\\\ f(d + d’) &= d’d\\ \frac{d + d’}{d’d} &= \frac{1}{f}. \end{align*} $$

We get the relation,

$$ \frac{1}{d’} + \frac{1}{d} = \frac{1}{f} $$

At Infinity

If $d’ = \infty$, or we want to focus at infinity:

$$ f = F_1 \implies d = F_1. $$

The Impossible

Sps. $f = F_1, d’ = F_1$ and $F_1 = 24\text{mm}$ and we want to have a focal plane close to the camera.

$$ \frac{1}{F_1} + \frac{1}{d} = \frac{1}{F_1} $$

Notice that this implies $d = \infty$ which is absurd; i.e. there exists a minimum focusable distance.

Depth of Field§

The depth of field (DoF) is a region for which objects appear to be focused in the image.

We can increase DoF by reducing the size of the aperature, the DoF is directly proportional to the aperature diameter; i.e. increasing DoF by 2 corresponds to a decrease of diameter by 2.

Again, using similar triangles: $$ \begin{align*} \frac{\ell}{h} &= \frac{d}{c} \implies \ell = \frac{dh}{c}\\ \frac{\ell}{h/2} &= \frac{d’}{c} \implies \ell = \frac{d’h}{2c}\\ \frac{dh}{c} &= \frac{d’h}{2c} \implies d’ = 2d. \end{align*} $$

Finally, we can note that the depth of field is directly proportional to the focusing distance, i.e. if we focusing distance by 3, the DoF decreases by a factor of 3 as well.



The aperature is the lens opening, which is controlled by the diaphram.

Typically, the diameter of the aperature is expressed as a fraction of the focal length.

  • f/2.0 on a 50mm means that the aperature diameter is 25mm
  • $D = \frac{f}{N}$, where $D$ is the aperature diameter, $f$ is the focal length, and $N$ is the F-number.
  • The smaller the F-number, the larger the aperature.

Going from f/2.0 to f/4.0 causes the area to be divided by 4.

Typical F-numbers:

  • f/2.0, f/2.8, f/4, f/5.6, f/8, f/11, f/16, f/22, f/32

Each increase in F-number in the above sequence roughly correlates to a halving of the aperature area.

Field of View§

Field of View

The field of view (FoV) is defined to be the angle of which the camera is able to see the world.

FoV depends on two factors,

  1. focal length
    • FoV is inversely proportional to focal length
  2. sensor size
    • FoV is directly proportional to sensor size.

We can find the relation between FoV, focal distance and sensor size as follows:

$$ \begin{align*} \tan\left(\frac{\theta}{2}\right) &= \frac{h/2}{f}\\ \theta &= 2\arctan\left(\frac{h}{2f}\right). \end{align*} $$

Modern phone and DSLR cameras have similar FoV, but phones have considerably smaller sensors and focal distances.

Notice the fraction $\frac{h}{2f}$ inside the arctangent, in cameras the smaller focal distances cancels out the effect of the smaller sensors which yields in a similar FoV to DSLRs.

Focal Length and Perspective§

(todo: add picture)


In the movie Vertigo, they extensively used the dolly zoom effect in which the subject remains the same size but the background appears to be moving away.

This is achieved by increasing focal length and simultaneously moving away from the subject.

Camera Settings§


Exposure is the amount of light that hits the sensor, affected by shutter speed and aperature size.

Exposure can be approximated by shutter time times lens area.

Shutter Speed

The shutter speed controls how long the sensor is exposed, nearly a linear effect on exposure.

Typically, we use the following shutter times:

  • 1/30, 1/60, 1/125, 1/250, 1/500, …

Each stop corresponds to a halving or doubling of light that hits the sensor.

To increase by one stop, we can either increase the aperature by 2 or shutter speed by 2.

As a rule of thumb, the limit of the shutter speed is inversely proportional to the focal length, i.e. 1/500s for focal length of 500mm.

Global vs Rolling Shutter§

Global shutters take a snapshot of the entire scene at a single moment in time, these cameras are very rare.

Rolling shutters progressively scans the scene either vertically or horizontally.

(todo: add other notes on image formation and quantisation)