How to simulate the Universe on a computer
Published:
Hello there! In this post, I will write a bit about cosmological simulations. Yes, it’s a promise-I’ve been meaning to do this since I created this blog, and it’s been a while since my last post.
Why do people try to simulate our Universe on computers?
First things first: we have always been passionate about understanding our Universe, and what better way to do so than by mimicking it using the tools we have: computational power and cosmology? But why rely on numerical simulations? Well, if you’ve read a bit about cosmology, you’ll know that understanding how matter behaves on small scales is quite challenging. The “simple theory” (linear cosmology) works remarkably well for describing large-scale structures (Gaussian fields), but it breaks down at smaller scales, where nonlinearities start to dominate.
Figure 1: Linear and nonlinear power spectrum comparison. The spectra are presented for redshift zero. The impact of nonlinearities is evident starting from k≳0.1 h/Mpc.
For example, in the case of the power spectrum - a summary statistic that describes how much matter (or power) exists at different scales (k) - we can use linear theory to model the distribution of matter up to k≲0.1 h/Mpc. Beyond this point, the power spectrum becomes nonlinear. See Figure 1.
To predict clustering on small scales, the main approach is through numerical simulations. These simulations employ high-resolution, multi-scale schemes and are routinely used on massively parallel computers to increase their size and complexity, aiming to better describe the Universe. Given the complexity of the Universe and its various components, these numerical solutions can be classified into two main categories: N-body (or DM-only) simulations and hydrodynamical (or DM plus baryons) simulations. N-body simulations involve solely DM, with gravity being the only force acting on the particles. On the other hand, hydrodynamical simulations incorporate both DM and baryonic matter (e.g., gas), allowing for the inclusion of phenomena such as feedback from supernova explosions and supermassive black holes (BH), magnetic effects, and more. Let’s discuss a bit more about both sets of simulations.
N-body simulations
N-body (or DM-only) simulations are numerical solutions of a very high number (N) of DM particles interacting gravitationally within a finite volume, and evolving over a long period of time (or a large range of redshifts). Essentially, they provide an alternative path to solutions of the collisionless Boltzmann equations coupled with Poisson’s equation. Examples of such simulations include the Millennium, Dark Sky, and Bolshoi simulations, which are widely used by the scientific community to study large-scale structures and the behavior of DM on large volumes. Another notable example is the Quijote project, which consists of a collection of 43,100 full N-body simulations designed to provide an extensive data set of cosmological simulations for machine learning (ML) applications.
In this post I will present one of the techniques employed to solve the problem of simulating cosmological structures: the particle mesh (PM) algorithm. Other methods, such as particle-particle schemes or hybrid schemes, also exist, each with its own advantages and disadvantages.
The PM method is a relatively simple approach that I used myself when first delving into cosmology. I coded my own version of the method, largely following the principles outlined an amazing collection of notes from Dr. Andrey Kravtsov. PM codes utilize a mesh to represent density and potential fields, with the resolution of the simulation limited by the size of this mesh. Despite its simplicity, the PM method offers several advantages. It is fast, requiring fewer operations per particle per time step, compared to other methods. Additionally, PM simulations can handle very large numbers of particles efficiently.
Numerical N-body algorithms allow the study of nonlinear gravitational evolution of complex particle systems. These simulations model the time evolution of a given system by determining and tracking the trajectories of particles, taking into account their mutual gravitational interactions. Thus, a PM code solves both the Poisson equation ∇2Φ=4πGΩm,0ρcrita−1δ, as well as the equations of motion of the particles, dxda=p˙aa2 dpda=−∇Φ˙a.
These equations are written in terms of comoving coordinates, i.e., x=r/a, where r represents the proper particle’s fluid position, and p=av=a2˙x denotes the particle momenta, where v=u−Hr=a˙x is the peculiar velocity, and u is the proper velocity (including the Hubble flow).
It is convenient to define code variables, i.e., dimensionless variables, that we will denote with tildes according to: ˜x≡xr0=rar0,
˜p≡pv0=avv0,
˜Φ≡Φϕ0,
˜ρ≡a3ρρ0.
The quantities with subscript zero correspond to physical variables responsible for removing the units from the code variables and are defined as r0≡LBOXNg,
t0≡r0t0,
ρ0≡ρcritΩm,0,
Φ0≡r20t20=v20,
where LBOX is the box size, measured in Mpc/h, Ng is the number of grid cells in each direction, and NTg=N3g is the total number of grid cells. In dimensionless variables, we can re-write the equations as ˜∇2˜Φ=32Ωm,0a,
d˜xda=f(a)˜pa2,
d˜pda=−f(a)˜∇,
where ˜δ=˜ρ−1 and f(a)≡H0/˙a.
The main idea of the PM code is to solve these equations in three main steps:
- Solve the Poisson Equation using the density field, estimated with current particle positions;
- Advance momenta ˜p, using the potential computed in the first step;
- Update particle positions ˜x, using the advanced momenta.
Implementation stage: solving the equations
The PM method exploits the fact that the Poisson equation for gravitational potential can be found in real space by convolving the density contrast with the Green’s function ˜ϕ(˜x)=∫d3˜x′ G(˜x−˜x′) ˜δ(˜x′).
The choice of the particular Green’s function G is driven by the fact that we use periodic boundary conditions (PBC). In Fourier space, the convolution is then replaced by a simple multiplication: ˜ϕ(˜x)=G(k) ˜δ(k). To obtain the density contrast ˜δ(k) in Fourier space, first it is necessary to obtain ˜δ(˜x) in real space, which arises from the density in real space ˜ρ(˜x).
The density field
In PM algorithms, particles are assumed to have a certain size, mass, shape, and internal density. This determines the interpolation scheme used to assign densities to grid cells. A common choice is the Cloud In Cell (CIC) method, where particles are represented as cubes (in 3D) of uniform density and of one grid cell size.
The algorithm described above is relatively computationally cheap, accurate, and is commonly used in PM codes. In this method, the shape function of a particle in 1 dimension is defined as S(˜x)=1Δ˜x,|˜x|<Δ˜x/2 S(˜x)=0,otherwise, for a cell size of Δ˜x.
Then, the mass fraction of particle at ˜xp, assigned to a cell at ˜xijk, is the shape function averaged over this cell: W(˜xp−˜xijk)=∫˜xijk+Δ˜x/2˜xijk−Δ˜x/2d˜x′ S(˜xp−˜x′). In 3 dimensions this process generalizes to W(˜xp−˜xijk)=W(˜xp−˜xijk) W(˜yp−˜yijk) W(˜zp−˜zijk), such that the density ˜ρijk in the corresponding cell is given by ˜ρijk=Np∑p=1˜mpW(˜xp−˜xijk),
where NTp=N3p is the total number of particles, Np is the number of particles “on each direction”, and mp is the particle mass. In practice, this is achieved by looping over particles and assigning their density to neighboring cells, rather than summing over all particles for each cell individually.
The density contrast field and its Fourier transform
With the grid densities ˜ρi,j,k(˜x) on hand, the next step is to obtain the grid density contrasts ˜δi,j,k(˜x) and convert them to Fourier space. This transformation is typically accomplished using FFT algorithms, which efficiently compute the discrete Fourier Transform and its inverse. By applying the FFT to the grid density contrasts, we obtain the them in Fourier space, denoted as ˜δi,j,k(˜k).
The gravitational potential
Now we only need the Green function to obtain the gravitational field ˜Φ(k). The Green function is given by G(k)=−3Ωm,08a[sin2(kx2)+sin2(ky2)+sin2(kz2)]−1, where kx=2πlLBOX, ky=2πmLBOX, kz=2πnLBOX, for the components (l,m,n). These equations are in code units, hence LBOX=Ng. Then, the gravitational potential is solved by transforming the result back to real space to obtain ˜Φ(˜x) discretized at cell centers. Note that, when using these gravitational potentials, there is an artifact, a singularity at l=m=n=0, which is avoided by setting ˜Φ000=0.
The acceleration
After obtaining the gravitational field in real space ˜Φ(˜x), discretized at cell centers, it is time to obtain the acceleration at each grid point. This is simply given by ˜a(~xi)=−˜∇˜Φ(~xi).
This step precedes the updating of the particles’ positions and momenta, because it requires the accelerations at each particle’s position. Thus, to obtain the accelerations at the particle positions ˜gp, we interpolate the acceleration at grid points ˜a(~xi) onto the particle positions ˜xpj, using the CIC interpolation. During the density assignment, for a given particle, the acceleration at each point is interpolated from the cells to which the particle contributed to the density.
Updating particles positions and momenta
We arrive now at the final stage of the PM method: updating particle positions and momenta. This is achieved using leapfrog integration, which is a numerical method for integrating differential equations in a dynamical system. Leapfrog integration updates positions and velocities (or momenta) at interleaved time points (or scale factor points), staggered in such a way that they “leapfrog” over each other.
Thus, using the leapfrog integration, we have updated momenta and positions as ˜pn+1/2=˜pn−1/2+f(an)˜gnΔa
˜xn+1=˜pn+a−2n+1/2f(an+1/2)˜pn+1/2Δa,
where n represents the “time” step, f(an) is computed at an, Δa is the step in the scalar factor, an=ai+nΔa
is the evolution in the scale factor according to the “time” steps, and an+1/2=an+Δa/2. In the present case, we always update the momentum first, in a half time step before update the positions.
Therefore, the Particle Mesh (PM) method involves repeating a series of steps for each time step of the simulation. The main scheme of the PM method typically consists of the following three blocks, which are repeated iteratively:
- Find density on the mesh using the Cloud-In-Cell (CIC) technique. This step involves assigning the density of particles to grid cells using the CIC interpolation scheme.
- Solve the Poisson equation using two 3-dimensional Fast Fourier Transforms (FFTs). After obtaining the density distribution on the mesh, the Poisson equation is solved in Fourier space using FFTs to calculate the gravitational potential.
- Advance momenta and positions of the particles. Finally, the momenta and positions of the particles are updated using leapfrog integration based on the calculated gravitational potential.
These blocks are repeated for each time step of the simulation to evolve the system over time.
The simulations were conducted using the following cosmological parameters: H0=73.0 km s−1 Mpc−1, Ωm=0.311051, and ΩΛ=0.68887. I ran four simulations, which explains the error bars seen in the final products, that I will present in a few minutes. I simulated a box of side LBOX=128 Mpc/h, with 1283 particles and 2563 cells. The simulations started from a=0.02 or z=49, up to a=1.0 or z=0. The ICs were set using the Multi Scale Initial Conditions MUSIC
(https://www-n.oca.eu/ohahn/MUSIC/) code. To define the initial spectra using MUSIC
we utilized CAMB.
Figure 2: A snapshot of my N-body simulation. The animation can be seen in the youtube link
N-body power spectrum
The primary outcome obtained from the N-body simulation was the {\em power spectrum}, denoted as P(k), at various stages and configurations of the simulation. Specifically, we computed the power spectrum based on FFTs performed on the density contrasts. We then calculated the quadratic modulus of these transformations and averaged the results over the k values: P(k)=⟨|˜δ(k)|2⟩=1NkNk∑i=1|˜δ(ki)|2, where i represents the bin index of k, i.e., ki lies within the interval [k,k+Δk], k=√k2x+k2y+k2z, and Nk represents the number of points where ki falls within the respective bin.
The power spectrum was computed for two distinct epochs: a=0.02 and a=1.0, to capture the evolution of the simulation. The results are illustrated in Figures 3 and 4, where a comparison is made with the linear and nonlinear spectra from CAMB
. Error bars in the plots represent the standard deviation of the obtained spectra for four realizations of the simulation conducted under the fiducial Cosmology. Additionally, ⟨std⟩ denotes the average of these standard deviation values. The vertical lines denote the confidence interval of k values, computed as kmin=2πLBOXandkmax=πNp2LBOX, where LBOX is the size of the box of the simulation, kmin is the minimum value of k (also related to the simulation resolution), and kmax is the maximum value of k, representing the Nyquist scale, which expresses the minimum separation between the particles.
Figure 3: DM power spectrum from N-body simulation. The power spectrum is presented on the top, for a=0.02 (or z=49) and on the bottom for a=1.0 (or z=0). Both spectra were obtained for a box of side LBOX=128 Mpc/h, with NTp=1283 particles, and NTg=2563 cells. The vertical dotted lines indicate the maximum and minimum values for k. They are respectively: kmin≃0.05h/Mpc and kmax≃1.57h/Mpc. In both plots we compare the obtained power spectrum to the theoretical linear and nonlinear spectra from CAMB
.
The overall trend of the power spectra aligns well with the model obtained from CAMB
. Particularly, at a=0.02, the spectrum matches exactly with the linear spectrum (and nonlinear as well, given the absence of structures at this stage). Conversely, at a=1.0, the resulting spectrum closely resembles the nonlinear spectrum across all scales, indicating the emergence of structure and the breakdown of linear theory on small scales.
Approximated methods for DM simulations
N-body simulations are the state-of-art of gravitational dynamics for DM particles. However, their computational demands often limit their utility for extensive runs required for comparisons with real surveys, such as parameter estimation and covariance matrices. To address this challenge, numerous approximate methods have emerged to expedite results. Techniques like EZMocks, PINOCCHIO, ExSHalos, and others aim to generate DM halo catalogs using semi-analytical approximations or by emulating N−body simulations.
Hydrodynamical simulations
Hydrodynamical simulations play a pivotal role in comprehending galaxy formation and evolution in the Universe. Unlike DM-only simulations, hydrodynamical simulations incorporate ordinary matter, encompassing all components. As a result, they serve as the foundation of what we refer to as the halo-galaxy connection, directly bridging the information content between two key components: DM halos and galaxy properties. This integration is essential for deciphering a wide array of observed galaxy characteristics, including spatial clustering, mass distribution, stellar mass, size, color and star formation rate, among other properties.
Numerous recent initiatives are pushing the boundaries of hydrodynamical simulations, introducing new variations such as Astrid, SIMBA, IllustrisTNG, Magneticum, and SWIFT-EAGLE. A remarkable project is the CAMELS (Cosmology and Astrophysics with MachinE Learning Simulations) suite, comprising 12,903 cosmological simulations – 5,164 N-body and 7,712 state-of-the-art (magneto-)hydrodynamic simulations. Primarily designed to serve as a data set for ML analyses, CAMELS encompasses all the aforementioned simulations, focusing on small boxes of 25 Mpc/h.
Indeed, while N-body simulations focus on the gravitational evolution of DM particles, hydrodynamical simulations encompass the evolution of all components, including the gravitational evolution of matter and the hydrodynamical evolution of gas. In some cases, these simulations also account for the interaction of gas with evolving radiation and magnetic fields. Initially, the baryon component, representing the visible Universe, consists mainly of gas, primarily hydrogen and helium. Some of this gas material ends up in stars during the process of structure formation. However, at the core of hydrodynamical simulations lie numerical solutions governing ideal, collisional, and non-conducting gases. Modeling the cosmic gas can be approached through three main branches: the Eulerian formulation, the Lagrangian formulation, or a hybrid of both. In the Lagrangian formulation, the following equations govern the fluid dynamics DρDt=−ρ∇⋅v, DvDt=−1ρ∇P, DeDt=1ρ∇⋅pv, where D/Dt≡∂/∂t+v⋅∇ denotes the Lagrangian derivative, ρ is the density, v denotes de velocity vector, P=(γ−1)ρu (with γ being the heat capacity ratio and u being the internal energy) denotes the thermodynamic pressure, and e=u+v2/2 is the total energy per unit mass.
This formulation assumes an observer that follows an individual fluid part, specified by its properties such as density ρ, as it moves through space and time. It can also be viewed as a mesh-free technique for approximating the continuum dynamics of fluids by samplings particles (an interpolation of points).
Due to limited numerical resolution of hydrodynamical simulations, which are among the most computationally expensive simulations in Cosmology and Astrophysics, certain physical processes must be included by hand. These processes are known as subgrid physical processes or sub-resolution models. They bridge the gap between the scales that can be treated numerically, typically above interstellar medium structure scales (around 3 kpc), and those addressed by these subgrid routines, which extend below the scale of star clusters (around 0.3 kpc). These subgrid models are a critical component of hydrodynamical simulations, introducing different parameters that must be tuned to ensure that their final products align with observational data.
It is not within the scope of the present thesis to delve into all the intricacies of subgrid physical processes. However, we can mention some of them, including:
- Gas cooling. This process dissipates the internal energy of gas through mechanisms such as ionization processes. It is often tabulated as a function of density, temperature, redshift, and composition for phenomena like photoionization. It is used by EAGLE and IllustrisTNG.
- Element abundance evolution. This process tracks the time release of individual elements from various nucleosynthetic channels. It is employed in simulations like SIMBA, Magneticum, EAGLE, and IllustrisTNG.
- Feedback processes. These processes involve the balance of inflows and outflows that regulate phenomena such as supernovae (SN) and active galactic nuclei (AGN) activity. They may be influenced by mechanisms like stellar winds and radiation pressure.
- Magnetic fields, cosmic rays, dust and others. These subgrid processes play crucial roles in shaping the evolution of galaxies and the intergalactic medium in hydrodynamical simulations.
Figure 4: A comparison of four hydrodynamic simulations (IllustrisTNG, SIMBA, Astrid and Magneticum) that have the same initial conditions but have been run with four different codes and subgrid models. Gas density and temperature are shown in blue and red colors, respectively. This showcase the different subgrid physics throught the different simulations. The link to video can be found here.
All these subgrid physical components require calibration, which is typically based on either physical arguments or observations – i.e., calibration aims at reproducing properties of galaxy populations. The most commonly used galaxy property for this calibration is the galaxy stellar mass, which is employed to calibrate feedback associated with stellar evolution. However, in simulations like EAGLE, galaxy size has also been used to reproduce galaxy scaling relations. Additionally, properties such as star formation rate and halo gas fractions are utilized in simulations like IllustrisTNG. It is important to note that the values chosen for these parameters may vary depending on the resolution of the simulation being considered. Consequently, this calibration serves as compelling evidence of the success of hydrodynamical simulations when compared to real observations.
Approximated methods to galaxies and halo-galaxy connection
As we have seen, hydrodynamic simulations represent the best of current simulation capabilities, as they provide a direct reproduction of observable properties of galaxies, which are (for the most part) the objects actually observed in the sky. Unlike DM halos, galaxies are baryonic entities, making hydrodynamic simulations invaluable for understanding their formation and evolution. However, these simulations are also the most computationally expensive, surpassing even N-body simulations in terms of computational resources required. Given their significance, there has been a concerted effort to develop approximate methods for predicting galaxy properties based on DM halo information. This subfield, often referred to as the halo-galaxy connection, aims at extracting galaxy-related information from DM halo simulations (or vice-versa).
The halo-galaxy connection refers to the relationship between the multivariate distribution of galaxy and halo properties derived from observations and simulations. Galaxies form and evolve within DM halos, and many of their properties are intrinsically related to the halo environment and clustering properties. For example, red galaxies tend to populate the centers of halos and are generally older, while blue galaxies are more often found in the outskirts of halos, and are typically younger. Modeling approaches to establish this link generally fall into two categories: physical and empirical models. Physical models include hydrodynamical simulations and Semi-Analytical Models (SAMs), which aim to capture the underlying physical processes governing galaxy formation within halos. On the other hand, empirical models such as Subhalo Abundance Matching (SHAM) and Halo Occupation Distribution (HOD) models are more data-driven and rely on statistical correlations between galaxy and halo properties observed in simulations or surveys.
SAMs approximate various physical processes using analytic prescriptions that can be tracked through the merger history of DM halos, and many codes are examples of this idea, such as Santa Cruz, GAEA, and L-Galaxies. SHAMs, on the other hand, establish a relationship between the mass of a galaxy and the abundance of the DM halos it typically inhabits. Finally, Decorated Halo Occupation Distribution models (decorated HODs) introduce additional halo properties besides mass, such as concentration, to determine the probability density distribution for the number of galaxies within their hosting halos.
Takeaway messages
N-body simulations model the gravitational interactions between a large number of particles, typically used to study the evolution of structures in the Universe, such as galaxy formation. These simulations track the motion of individual particles under the influence of gravity, providing insights into how large and small-scale structures evolve over time.
Hydrodynamic simulations, on the other hand, incorporate both gravity and the physics of fluids (gas dynamics), allowing for a more comprehensive modeling of astrophysical phenomena. These simulations are crucial for understanding the behavior of gas in galaxies, star formation, and other processes where matter is not just moving under gravity, but also undergoing physical interactions like pressure, temperature, and radiation.
Together, N-body and hydrodynamic simulations provide powerful tools for understanding the complex, dynamic Universe!