For my final project in CSCE-451, I reimplemented Position Based Fluids (PBF) by Macklin and Müller (2013)1.

Essentially, PBF is just an extension of smoothed-particle hydrodynamics (SPH) into the Position Based Dynamics framework (Müller 2007).

In contrast to SPH, the major benefit that PBF gives is stability, allowing us to take larger physics time-steps. Typically SPH would require multiple physics substeps for one animation frame, PBF can take just one physics step per frame.


The PBF algorithm is very simple, and is easily parallelisable.

Euler Integration§


for all particles $i$ do:   apply forces $\mathbf{v}_i \gets \mathbf{v}_i + \Delta t\ \mathbf{F}(\mathbf{x}_i)$   predict position $\mathbf{x}^{\star}_i \gets \mathbf{x}_i + \Delta t \mathbf{v}_i$

First, the algorithm does an unconstrained step by applying external forces and particle velocities; but the particles do not move quite like a liquid.

Particle Sorting§


for all particles $i$ do:   find neighboring particles $N_i(\mathbf{x}^{\star}_i)$

A lot of our computations are dependent on a particle’s neighbours.

  • Naively, for each particle we look at every other particle, this ends up being $\mathcal{O}\left(n^2\right).$
  • A better way is to discretise the simulation space into a uniform grid of bins, insert the particles to the bins, then sort.
    • With counting sort, this step becomes $\mathcal{O}(nk)$2.
    • Count sort can be parallelised with a prefix sum implementation3.

Jacobi Optimizer§


while iter < solverIterations do:   for all particles $i$ do:     compute densities $\rho_i$   for all particles $i$ do:     compute Lagrange multipliers $\lambda_i$   for all particles $i$ do:     compute $\Delta\mathbf{p}_i$   for all particles $i$ do:     update positions $\mathbf{x}^{\star}_i \gets \mathbf{x}^{\star}_i + \Delta\mathbf{p}_i$

This step projects the particles’ positions back into their expected fluid position using a constraint function.


Next, define the constraint function by $$ C_i(\mathbf{p}_i) = \frac{\rho_i}{\rho_0} - 1. $$

Due to the integration step we have that $C(\mathbf{p}_1, \dots, \mathbf{p}_n) \neq 0.$ As a result, we optimize for a positional correction $\Delta\mathbf{p} \approx \nabla C_i(\mathbf{p})\lambda$ such that $C(\mathbf{p} + \Delta\mathbf{p}) = 0,$ where $\lambda$ is a Lagrange multiplier given by

$$ \lambda_i = -\frac{C_i(\mathbf{p})}{\sum_k\|\nabla_{\mathbf{p}_k}C_i\|^2 + \varepsilon}. $$


The $\Delta\mathbf{p}$ computation is straightforward, $$ \Delta\mathbf{p}_i = \frac{1}{\rho_0}\sum_j\left(\lambda_i + \lambda_j + s\right)\nabla W(\mathbf{p}_i - \mathbf{p}_j, h) $$

where $s$ is a tensile instability term, and $\nabla W(\cdot)$ is the gradient of the spiky kernel (Müller 2003)4.

The tensile instability adds an artificial pressure that encourages particles to clump together to mimic surface tension.

Typically, the optimizer is not run until convergence – rather, it is run for a few (4-6) steps which gives results that look good visually.

Viscosity and Vorticity§

Viscosity and Vorticity

for all particles $i$ do:   update velocities $\mathbf{v}_i \gets \frac{1}{\Delta t}\left(\mathbf{x}^{\star}_i - \mathbf{x}_i\right)$   apply XSPH viscosity $\mathbf{v}_i \gets \mathbf{v}_i + \frac{c}{\rho_i} \nu_i$   apply vorticity confinement $\mathbf{v}_i \gets \mathbf{v}_i + \varepsilon\Delta t\,\left(\hat{\eta}_i \times \omega_i\right)$   update position $\mathbf{x}_i \gets \mathbf{x}^{\star}_i$

To make the results look more visually appealing, XSPH viscosity and vorticity confinement is applied.

Vorticity confinement reintroduces energy that was lost in the integration and optimization steps making the simulation look nicer, however, this technique is not physically based.


I used CUDA to implement the entire algorithm on the GPU to exploit its highly parallel nature.

Originally, I had a lot of issues with performance in simulating ~150K particles in real time.

  1. I had a lot of CPU $\longleftrightarrow$ GPU data transfers which introduced some latency.
    • Instead, I used the CUDA/OpenGL interop features to directly map the OpenGL VBOs to the PBF simulation buffer.
  2. GPUs are optimised for single precision.
    • A lot of the library functions I was using were causing type promotion from floats to doubles, preventing this gave me a 10x speed up.

My renderer simply reuses the code from the particle simulation labs, but I replaced the shaders with one that did Blinn-Phong shading on point-sprite spheres based on Green (2010)5.

Initially, I wanted to implement screen space fluid rendering5 with an anisotropic kernel6. Unfortunately, I only managed to implement the kernel and not the shaders.