This note is an introduction to simulating colliding particles in a box. This system is complex enough to have some interesting problems, but simple enough to have great solutions. The emphasis is on how we structure data to model the system efficiently.
Simulation of realistic physical systems is often challenging. In some cases that challenge could come just from deriving the equations that the system is supposed to obey, but when it comes to solving those equations numerically, the challenge can take on new dimensions, especially at scale and when the components of your system start interacting with each other. Here we’ll narrow the scope and just look at hows this complexity emerges in something simple: particles colliding in a box.
We want to delve into where the difficulty in this system comes from, what properties of the system allow us to simplify it, how we structure our data to solve it in practise.
The model
So to recap: this is Newtonian mechanics. Particles mostly move in straight lines. Sometimes they hit the walls or each other. They have diameter D. The box is square with sides of length L. The state of the system consists of the positions and velocities of all the particles, and the simulation proceeds in time steps of length Delta t. Each step is a tick of the clock in which we first update all the velocities, and then update all the positions.
Updating positions should be easy. It just requires looping over all the particles and calculating x + v Delta t. This takes O(N) work. The problem is to figure out what those velocities should actually be, and that requires checking which particles collided. How could we do this? Take particle i and iterate over all N-1 others to check their distance from i. If any of these distances are less than D, then the particles have collided and we should update the velocities. But we have N(N-1)/2 pairs of particles to check! This is a lot of work to do when we scale up the number of particles.
# Naive all-pairs: check every pair, O(N^2)
for i in 0 .. N-1:
for j in i+1 .. N-1:
if distance(particle[i], particle[j]) < D:
resolve_collision(i, j)
Locality
We want to avoid having to check every pair, so what are we doing wrong?
Since particles only interact when they touch each other, we shouldn’t need to consider particles beyond a local neighbourhood. There is no chance that they will be colliding and checking them is a waste of time. But we currently have no way of looking up which particles are close apart from via a brute-force search. We need to be able to look up particles by position, so we should create some kind of structure to organise our particles that has spatial information in the index. For example we could have a grid of cells and each cell contains a list of the particles currently found in that cell.
With such a structure in place we would update velocities by iterating over particles, and check for collisions only with nearby particles. This organises collision detection as a pipeline with a broad phase in which possible collisions are identified in some local neighbourhood, followed by a narrow phase in which collisions are exactly determined and their consequences handled.
This spatial data structure doesn’t have to be complex. We ask two things of it:
- We should be able to find out which particles are in a given neighbourhood.
- We should be able to update it efficiently after the particles have moved. Earlier we mentioned a grid as an example of a spatial data structure. That won’t always be appropriate for collision detection, but in the current case it is particularly well suited because of a second simplifying property: all the particles are the same size, D. In this case the neighbourhood of a particle is just the cell in which it is found, and the 8 cells around that. To check for collisions we only need to check this neighbourhood. Any cells beyond that are too far away for any collision to be possible.
Does the grid help?
This broad phase -> narrow phase collision detection should hopefully reduce the required work by quite a lot. But exactly how much work is still required? And now we have an extra data structure to maintain. How much complexity does that add?
Let’s consider the first question. As before, we iterate over N particles, but now we only check for collisions with particles found in the 3x3 block of cells around it, each of side length D. How many particles is that? The density of particles in the whole box is N / L^2, so a neighbourhood of area 9 D^2 holds about 9 N D^2 / L^2 particles on average. There is still an N hiding in that expression: multiplied over all N particles it gives 9 N^2 D^2 / L^2 pairs, an N^2 scaling again.
What helps us here is a physical fact: the particles are hard and cannot overlap, so two centres can never be closer than D. A cell of side D then has room for only about one particle, however we try to pack the box, and its 3x3 neighbourhood for only about nine. Each particle therefore has O(1) neighbours to check whatever the configuration, so the total work is bounded by about 9 N: linear in N. This is a huge improvement in the scalability of our collision detection algorithm.
# Broad-phase collision detection on a uniform grid
build_grid(particles) # bin each particle into its cell
for i in 0 .. N-1:
c = cell_of(particle[i])
for each cell n in the 3x3 block around c:
for each j in grid[n]:
if j > i and distance(particle[i], particle[j]) < D:
resolve_collision(i, j)
Maintaining the grid
But what about our grid? Did we get rid of our scalability issue, or did we just shift it somewhere else?
There are a couple of complications here. First, since the particles move at each step we are going to need to update the grid as well, or else it will have stale data. Second, because every cell contains a variable number of particles, we don’t know how much space the data for each cell should take.
One blunt way of dealing with this would be to use variable length arrays (such as std::vector) to store the contents of each cell, and store pointers to these cell arrays in another array representing the grid. So we have a grid array storing pointers to cell arrays. The grid array allows spatial lookup of a pointer, which then directs us to the contents of the cell. Updating the grid requires us to iterate over all N particles to check that they are assigned to the right cells. If any of them have moved from one cell to another then we need to remove the reference from one cell and add it to another. Overall this is N work.
That said, the addition of an extra layer of indirection via the cell pointers can degrade performance. It would be a constant factor, but it could be significant. Meanwhile, the variable length of the cell arrays can add overhead via allocation and deallocation: another constant factor which is bad for performance.
We are still fortunate to have pushed the computational complexity down from O(N^2) to O(N), but this grid has some drawbacks: we don’t know in advance how much memory it should take up and consequently it is not stored contiguously.
Contiguous arrays
However, there is a solution which improves on this. First think that the number of particles in the box does not change. It would be possible to take all the variable length cell arrays from the previous solution and stack them together in a single flat array of length N. The trick is to track where one cell ends and the next one begins. Basically we want to store offsets for each cell. But these offsets are just a cumulative sum of all the particles in all the cells in order. We maintain an array of particle counts in each cell in order, and then turn this into an array of offsets using a prefix scan, which calculates the offset of cell j as offset[j] = sum over k < j of count[k]. For a grid of C cells, a serial scan does this in O(C) work, but the scan is also parallelisable: it has O(C) work and a work depth of only O(log C), so it can be calculated in O(log C) time on a parallel machine. With the offsets in hand, one final pass scatters the particles into the flat array: we sweep them once more and drop each particle’s index into its cell’s run, advancing a per-cell cursor so that particles sharing a cell land in consecutive slots.
Although our array is not arranged as a grid anymore, it is a compressed version of that, and any queries we might have made of the grid can be made by looking up an offset in the offset array, and then using this to access the right cell in the cell array. We find an offset for cell (x, y) using index x + n_x y, where n_x is the number of cells along the x-direction. If this layout looks familiar, it is essentially compressed sparse row (CSR) storage, the scheme used to hold sparse matrices: an array of offsets indexing into one flat array of entries, with each row, here each cell, laid out as a contiguous run.
One thing to note here is that we haven’t gotten rid of the pointer indirection. When we look up an offset in the offset array, we are effectively looking up a pointer, which we then use to find data in the cell array. The main advantage is that the cell array is now a single contiguous memory buffer that is allocated once. Since it is contiguous we make better use of spatial locality and keep the cache hot.
What we covered
Particles in a box have simple physics, yet they already confront us with the difficulty of many interacting entities: a naive all-pairs approach scales as O(N^2). We got around this issue using three simplifying properties of the system:
- local interactions: particles only interact when they touch
- a consistent scale: all the particles are the same size
- hard particles: they cannot overlap
Together these let us replace the all-pairs check with a local spatial query: bin the particles into a uniform grid and look only inside each particle’s own neighbourhood. Local interactions make the neighbourhood sufficient, a consistent scale makes a single cell size sufficient, and non-overlap caps how many particles a cell can hold, which is what pins the total work at O(N). The rest of our effort went into engineering the grid itself, so that maintaining it stays O(N) and its contents stay contiguous in memory, leaving only a single layer of indirection behind.
So far this has just been theory, but in another note we might actually write the code. Then we’ll see what kind of numbers we get. When we do this we’ll have the opportunity to experiment with the different representations of the grid to see which one performs best in practice. We’ll also get to experiment with different data layouts for the particles and this will introduce us to the concept of cache lines and tradeoffs between structures of arrays and arrays of structures.