Macrocosm — Large-Scale Structure & Cosmic Web Simulator

A high-performance, multi-physics cosmological engine written in pure C (C17), designed to evolve billions of dark matter particles coupled to a baryonic SPH fluid under a cosmologically expanding spacetime, and to render the resulting cosmic web through physically-based volumetric path integration.

1. Project Vision

Macrocosm is an engine that simulates the gravitational and hydrodynamic evolution of the large-scale structure of the Universe, from near-homogeneous Gaussian initial conditions at redshift z ≈ 50 down to z = 0, reproducing the filamentary network of dark matter halos, baryonic gas, walls and voids that defines the cosmic web.

The core engineering commitments are:

Pure C17 — no C++ runtime, no hidden allocations, deterministic control over memory layout and instruction scheduling.
Extreme numerical rigor — every physical law is integrated with a documented, peer-reviewed numerical scheme; no heuristic shortcuts.
Colossal scale — target budget of 10⁸–10⁹ particles on a single workstation (64-core CPU + discrete GPU), scaling to clusters via optional MPI.
Photorealism — the output is not a diagnostic plot. It is a cinematic, physically-grounded image of cosmic structure in formation.

The engine decomposes cleanly into three physical layers (cosmology, gravity, hydrodynamics), three engineering layers (memory, parallelism, rendering) and a clean public C API defined under include/macrocosm/.

2. Scientific Scope

2.1 Phenomena Modeled

Regime	Physical process	Numerical treatment
Cosmological background	FLRW expansion, `a(t)` from Friedmann equations	Tabulated integration + cubic spline
Initial conditions	Gaussian random field, CAMB-style P(k), growth `D₊(a)`	Zel'dovich approximation on a regular Lagrangian grid
Collisionless gravity	Vlasov–Poisson system of the dark matter fluid	TreePM: Barnes–Hut short-range + FFT long-range
Baryonic hydrodynamics	Euler equations with adiabatic index γ	Smoothed Particle Hydrodynamics (M₄ cubic-spline kernel)
Radiative processes	Optically thin cooling function Λ(T, Z)	Sub-cycled implicit step per particle
Halo identification	Non-linear collapse	Friends-of-Friends with linking length `b = 0.2 · mean_sep`
Visualization	Emission / absorption of the combined density–temperature field	Compute-shader ray marching with delta tracking

2.2 Out of Scope (Phase 1)

Full general-relativistic gravity (Newtonian limit in comoving coordinates is used).
Ideal MHD of the intergalactic medium (reserved for a future Macrocosm-MHD branch).
Neutrino transport and non-cold dark matter species.
Radiative transfer beyond optically thin cooling.

3. Mathematical Foundation

All equations below are the load-bearing equations of the engine. Every one is implemented verbatim in a module listed in Section 4.

3.1 Cosmological Background

Spatial coordinates are comoving, x = r / a(t). The scale factor satisfies the Friedmann equation:

H²(a) = H₀² [ Ωₘ a⁻³ + Ω_r a⁻⁴ + Ω_Λ + Ω_k a⁻² ]

with Planck 2018 fiducial parameters (Ωₘ=0.315, Ω_Λ=0.685, h=0.674). A one-dimensional table of a(t) is precomputed at startup by 4th-order Runge–Kutta and queried via binary search during evolution.

3.2 Gravity — Vlasov–Poisson in Comoving Coordinates

The collisionless Boltzmann equation for the phase-space distribution f(x,p,t), together with the Poisson equation for the perturbed potential φ:

dp/dt = −m ∇φ ,       dx/dt = p / (m a²)
∇² φ  = 4 π G a² (ρ − ρ̄)

Symplectic integration by kick–drift–kick Leapfrog with a(t)-dependent drift factors (Quinn et al. 1997).

3.3 TreePM Force Decomposition

The gravitational potential is split in Fourier space by a Gaussian filter of width r_s ≈ 1.25 · Δx_mesh:

φ(k) = φ_long(k) + φ_short(k)
φ_long(k)  = φ(k) · exp(−k² r_s²)     → solved on the PM grid via FFT
φ_short(r) = Σ_j G m_j / |r − r_j| · erfc(|r − r_j| / 2 r_s)   → tree walk

Long range: real-to-complex FFT (FFTW3) on a 1024³ mesh; CIC mass assignment and force interpolation with the mandatory deconvolution of the CIC window function.
Short range: Barnes–Hut octree with multipole acceptance criterion θ ≤ 0.5, truncated beyond ~4.5 r_s.

3.4 Hydrodynamics — SPH

Density, pressure and force for each gas particle i:

ρ_i = Σ_j m_j W(|r_i − r_j|, h_i)
P_i = (γ − 1) ρ_i u_i
dv_i/dt = −Σ_j m_j [ P_i/ρ_i² + P_j/ρ_j² + Π_ij ] ∇_i W_ij
du_i/dt = ½ Σ_j m_j [ P_i/ρ_i² + P_j/ρ_j² + Π_ij ] (v_i − v_j) · ∇_i W_ij

using the M₄ cubic-spline kernel W(r,h) (Monaghan 1992) and the Monaghan–Gingold artificial viscosity Π_ij with switches α=1, β=2. Timesteps follow the Courant–Friedrichs–Lewy condition Δt_i = C_CFL h_i / (c_s,i + |v_i|) with C_CFL = 0.3.

3.5 Linear Algebra Layer

Vector and tensor operations that appear in the PM solver, in SPH gradient reconstruction and in the rendering matrix chain are all routed through a single header, include/macrocosm/linalg.h, compiled with explicit SIMD intrinsics (AVX2 baseline, AVX-512 where available):

3×3 symmetric tensors (stress, velocity gradient ∇v, deformation).
Batched 4-wide / 8-wide dot products, cross products, fused-multiply-add over SoA arrays.
1024³ complex FFT wrapped around FFTW3 with pencil decomposition ready for future MPI parallelization.

Numerical references: Golub & Van Loan, Matrix Computations (4th ed.) for conditioning analysis of the Poisson inversion; Trefethen & Bau, Numerical Linear Algebra for the FFT accuracy bounds used in the convergence tests.

4. Engine Architecture

4.1 Module Map

macrocosm/
├── include/macrocosm/          Public C API (opaque handles, PODs only)
│   ├── types.h                 Fixed-width types, SoA particle views, flags
│   ├── linalg.h                SIMD vector/tensor primitives
│   ├── cosmology.h             a(t), H(a), D+(a), Friedmann table
│   ├── particles.h             Allocation, Morton sort, domain queries
│   ├── octree.h                Barnes–Hut tree build & multipole walk
│   ├── pm_solver.h             FFT-based long-range Poisson solver
│   ├── treepm.h                Force combiner, force splitting
│   ├── sph.h                   Neighbour search, density, pressure, forces
│   ├── integrator.h            KDK Leapfrog, adaptive timesteps
│   └── render.h                Splatting, ray march, framebuffer output
├── src/
│   ├── core/        cosmology / particles / integrator
│   ├── gravity/     octree / pm_solver / treepm
│   ├── hydro/       sph
│   ├── parallel/    domain decomposition + thread pools
│   └── render/      volumetric pipeline (GPU dispatch + host-side staging)
└── shaders/         GLSL 4.30+ compute + composite vertex/fragment

4.2 Memory Design

Every particle array is a Struct-of-Arrays (SoA). No Array-of-Structures is permitted outside of debug I/O paths.

typedef struct {
    /* hot, streamed in the integrator inner loop */
    double *restrict x, *restrict y, *restrict z;        /* comoving position */
    double *restrict vx, *restrict vy, *restrict vz;     /* canonical momentum / m·a² */
    float  *restrict ax, *restrict ay, *restrict az;     /* acceleration */

    /* warm, read by SPH / render */
    float  *restrict mass, *restrict h, *restrict rho, *restrict u;

    /* cold, touched once per step */
    uint64_t *restrict morton;
    uint32_t *restrict id, *restrict type;

    size_t   n;
    size_t   capacity;
} ParticleSoA;

Key decisions:

64-byte alignment on every * returned by the allocator so that AVX-512 loads (_mm512_load_pd) are guaranteed aligned — no fallback path.
Morton (Z-order) sorting of particles every N_sort ≈ 32 steps. This preserves spatial locality, pushes octree-neighbor pairs into the same L2 cache line, and turns the tree walk into an almost-linear streaming load.
Huge pages (MADV_HUGEPAGE / VirtualAlloc with MEM_LARGE_PAGES) for arrays above 256 MiB to reduce TLB misses.
Arena allocator per module. No malloc in the hot path. All scratch buffers (neighbor lists, FFT workspace, tree nodes) are preallocated at init time from a contiguous arena.

4.3 Parallelism Model

Three-tier parallelism, composable and independent:

Tier	Technology	Scope
Instruction-level	AVX2 / AVX-512 intrinsics	Inner loops of force, density, SPH gradients
Thread-level	OpenMP 5 `parallel for`, plus a custom work-stealing pool for the tree walk	All CPU loops over particles
Process-level (optional, Phase 4)	MPI 4 with Peano–Hilbert domain decomposition	Multi-node cluster runs

The tree walk is the only kernel where the naïve OpenMP loop hits a load imbalance wall (dense halos vs. low-density voids); it uses an explicit work-stealing deque (one per hardware thread), with the queues seeded by a Peano–Hilbert space-filling curve partition of the particle set so that stolen work is cache-adjacent to locally owned work.

All floating-point behavior is deterministic across runs at fixed thread count — no reduction of forces in non-deterministic order. Reductions are performed via a tree of partial sums over fixed-width SIMD lanes.

5. Rendering Pipeline

The visual output is a physically-grounded image of the density and temperature fields. It does not plot particles — particles are only the carriers of the underlying fluid fields.

5.1 Stage 1 — Volumetric Splatting

A GPU compute shader projects every SPH particle into a sparse 3D texture using its kernel W(r, h) as a footprint. Four channels are accumulated: gas density ρ, internal energy u (proxy for temperature), mean metallicity Z (reserved), and dark-matter projected density ρ_DM (one float per voxel, separate texture).

Splat accumulation uses imageAtomicAdd on 32-bit float representations (bit-cast), or on a pre-sorted particle list using segment reduction when strict determinism is required.

5.2 Stage 2 — Physical Ray Marching

A second compute shader ray-marches from the camera through the volume. The transport equation solved along each ray is the classical emission–absorption integral (Kajiya 1986, Production Volume Rendering Wrenninge 2012):

L(ray) = ∫₀^∞ j(s) · exp(−∫₀^s κ(s') ds') ds

Where j(s) is the emission coefficient derived from a physical temperature mapping of u(s):

Planck black-body emission at effective temperature T(u), integrated over the Johnson B, V, R bands and color-matched to sRGB with the CIE 1931 2° observer.
Bremsstrahlung boost in regions with T > 10⁶ K (cluster cores) reproducing the X-ray appearance of intracluster gas.

κ(s) is dominated by Thomson scattering in hot phases and by dust extinction (Calzetti law) in cool high-density regions.

5.3 Stage 3 — Compositing

A fullscreen quad samples the HDR output, applies ACES tonemapping and a per-pixel bloom kernel for the brightest halo cores, and writes the final sRGB framebuffer. The fullscreen-quad + compute-texture pattern is inherited directly from the companion projects (see Section 8.1).

6. Build Environment

Language: C17 (-std=c17), no GNU extensions in public headers.
Compiler: GCC 13+, Clang 17+, or MSVC 19.37+.
Build system: CMake ≥ 3.21, single CMakeLists.txt at the root.
Required dependencies: FFTW3 (double precision), OpenGL 4.5, GLEW, GLFW 3, OpenMP.
Optional dependencies: MPI (Phase 4), CUDA/HIP (Phase 5 — heterogeneous TreePM short-range kernel).
Aggressive math flags, enabled by default in Release: -O3 -march=native -ffast-math -funroll-loops -ftree-vectorize -fopenmp (GCC/Clang) — /O2 /fp:fast /arch:AVX2 /openmp:llvm (MSVC).

7. Development Roadmap

The project is developed strictly in phases. No phase is allowed to begin before the previous one passes its verification test.

Phase 0 — Scaffolding ✅ complete

Directory tree, CMakeLists.txt, architectural README.md.
Zero code in hot paths. Placeholder main.c that links the toolchain.

Phase 1 — Core Data Structures ✅ complete

Delivered in this phase:

Public headers under include/macrocosm/: types.h (fixed-width types, MC_RESTRICT / MC_ALIGN / MC_STATIC_ASSERT, status enum, particle-species enum), mem.h (cross-platform 64-byte aligned allocator), linalg.h (scalar vec3 / mat3 / sym3 primitives plus batched SoA routines), particles.h (SoA container), morton.h (21-bit-per-axis Z-order encoder), radix_sort.h (parallel LSD radix sort).
Implementations under src/core/: linalg.c (axpy, dot, pairwise tree sum — forward error O(log₂ n · ε · Σ|xᵢ|) after Higham 2002 Thm 4.1; deterministic at fixed thread count), particles.c (aligned allocator + gather-based permutation over all 16 SoA fields), morton.c (BMI2 pdep/pext fast path with scalar fallback per Raman & Wise 2008), radix_sort.c (stable 8-bit-digit LSD sort with per-thread / per-bin exclusive scan per Wassenberg & Sanders 2011).
Verification driver in src/main.c: allocates 2²⁰ particles, fills positions from a deterministic 64-bit LCG (Knuth TAoCP 3.2.1.3), encodes Morton keys, sorts, applies the permutation to every SoA field, and asserts (a) monotonicity of the sorted keys and (b) bit-exact round-trip of morton and id post-permutation. Exits non-zero on any failure.
Deferred to Phase 6: huge-page allocation (MADV_HUGEPAGE / MEM_LARGE_PAGES) and the per-module arena allocator — both are memory-footprint optimizations that only become measurable at the 10⁸-particle production scale and are not required for Phase 1 correctness.

Phase 2 — Cosmology and Initial Conditions

a(t) integrator and D₊(a) table.
Gaussian random field generator with a user-supplied P(k).
Zel'dovich displacement of a regular grid.
Verification: the measured P(k) of the generated particles matches the input spectrum to within 2% in every bin.

Phase 3 — TreePM Gravity + Leapfrog

Barnes–Hut octree with multipole expansion up to quadrupole.
PM solver on 512³ (dev) / 1024³ (production) mesh.
KDK Leapfrog integrator with cosmological drift factors.
Verification: Zel'dovich test (linear growth of a plane wave) matches the analytic D₊(a) to < 1% at z = 0.

Phase 4 — SPH Baryons

Neighbour search piggybacking on the tree.
Density loop, pressure force, artificial viscosity, adiabatic energy.
Radiative cooling sub-step.
Verification: Sod shock tube and Evrard collapse test reproduce published reference solutions.

Phase 5 — Rendering

Splat compute shader and 3D sparse texture.
Physical ray-march compute shader with black-body + bremsstrahlung.
ACES tonemap and bloom pass.
Verification: render a Gaussian blob with known analytic column density and confirm radiometric correctness to < 5%.

Phase 6 — Production Scaling

Peano–Hilbert domain decomposition across OpenMP work-stealing queues.
Optional MPI transport layer.
Optional CUDA/HIP short-range tree walk.
Continuous regression on the Santa Barbara Cluster Comparison setup.

8. Provenance & References

8.1 Companion Projects (directly reused)

Macrocosm builds on two existing sibling projects in this workspace. Their design decisions and, in specific cases, their code are adapted here:

../atoms/ — Hydrogen Quantum Orbital Visualizer.
- Reused concept: inverse-transform sampling via a precomputed CDF tabulated with lower_bound, applied in atoms/src/atom_realtime.cpp (sampleR, sampleTheta). In Macrocosm this pattern is redeployed in the initial-conditions generator to sample displacement amplitudes from an arbitrary P(k) without ever storing the full field in memory.
- Reused concept: the inferno / fire colormap (heatmap_fire in atom_realtime.cpp) — adapted to temperature-based emission in the render pipeline as a fallback for the black-body path when physical color mapping is disabled.
- Reused concept: the SSBO-backed fullscreen-quad render pattern used in atoms/src/atom_raytracer.cpp — mirrored by our compositing stage.
../black_hole/ — Schwarzschild Raytracer.
- Reused concept: the GLSL compute-shader + UBO + image-store pipeline of black_hole/geodesic.comp and black_hole.cpp is the direct template for Macrocosm's render compute shaders. Binding points, dispatch strategy (16×16 local size, ceil-divided global size), and adaptive resolution while camera is moving are all adopted.
- Reused concept: the classical 4th-order Runge–Kutta integrator used to step rays through curved spacetime is structurally identical to the integrator we use for a(t) and for the optional volumetric-ray refraction term.
- Reused concept: the CPU-side Newtonian N-body loop inside black_hole.cpp (pairwise G·m₁·m₂/r²) is our reference naïve baseline against which the TreePM force error is measured in Phase 3 verification.

8.2 Scientific References

Citations are listed here in the exact form they will appear in code comments at the site where each technique is applied.

Springel, V. (2005). The cosmological simulation code GADGET-2. MNRAS 364, 1105. — applied in: src/gravity/treepm.c for the short/long-range force split and in src/core/integrator.c for the cosmological Leapfrog drift/kick factors.
Barnes, J. & Hut, P. (1986). A hierarchical O(N log N) force-calculation algorithm. Nature 324, 446. — applied in: src/gravity/octree.c (tree construction and multipole acceptance criterion).
Hockney, R. W. & Eastwood, J. W. (1988). Computer Simulation Using Particles. — applied in: src/gravity/pm_solver.c (CIC mass assignment and CIC deconvolution in Fourier space).
Monaghan, J. J. (1992). Smoothed Particle Hydrodynamics. ARAA 30, 543. — applied in: src/hydro/sph.c (M₄ kernel, pressure force).
Price, D. J. (2012). Smoothed Particle Hydrodynamics and Magnetohydrodynamics. J. Comp. Phys. 231, 759. — applied in: src/hydro/sph.c (artificial viscosity switches and energy equation).
Quinn, T., Katz, N., Stadel, J., Lake, G. (1997). Time stepping N-body simulations. arXiv:astro-ph/9710043. — applied in: src/core/integrator.c (cosmological drift factor ∫ dt/a²).
Zel'dovich, Ya. B. (1970). Gravitational instability: An approximate theory for large wavelength perturbations. A&A 5, 84. — applied in: src/core/cosmology.c (initial displacement field).
Sutherland, R. S. & Dopita, M. A. (1993). Cooling functions for low- density astrophysical plasmas. ApJS 88, 253. — applied in: src/hydro/sph.c sub-step (Λ(T, Z) tabulated cooling).
Planck Collaboration (2018). Cosmological parameters. A&A 641, A6. — applied in: src/core/cosmology.c (fiducial Ωₘ, Ω_Λ, h).
Frigo, M. & Johnson, S. G. (2005). The design and implementation of FFTW3. Proc. IEEE 93, 216. — applied in: src/gravity/pm_solver.c (FFT backend).
Warren, M. S. & Salmon, J. K. (1993). A parallel hashed oct-tree N-body algorithm. Proc. Supercomputing '93. — applied in: src/parallel/domain.c (Peano–Hilbert domain decomposition).

8.3 Engineering References

Golub, G. H. & Van Loan, C. F. Matrix Computations, 4th ed. — conditioning and numerical stability of the discrete Poisson operator in src/gravity/pm_solver.c.
Trefethen, L. N. & Bau, D. Numerical Linear Algebra. — error analysis of the batched SIMD reductions in src/core/integrator.c.
Drepper, U. (2007). What Every Programmer Should Know About Memory. — cache blocking, alignment and NUMA strategies in src/parallel/threads.c and include/macrocosm/particles.h.
Kajiya, J. T. (1986). The rendering equation. SIGGRAPH '86. — basis of the emission–absorption integral in shaders/raymarch.comp.
Wrenninge, M. (2012). Production Volume Rendering. — delta-tracking and transfer-function design used in shaders/raymarch.comp.

This document is the architectural foundation of the project. All subsequent implementation is required to be justifiable in terms of a section above. Any deviation must be reflected here in the same commit.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
include/macrocosm		include/macrocosm
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Macrocosm — Large-Scale Structure & Cosmic Web Simulator

1. Project Vision

2. Scientific Scope

2.1 Phenomena Modeled

2.2 Out of Scope (Phase 1)

3. Mathematical Foundation

3.1 Cosmological Background

3.2 Gravity — Vlasov–Poisson in Comoving Coordinates

3.3 TreePM Force Decomposition

3.4 Hydrodynamics — SPH

3.5 Linear Algebra Layer

4. Engine Architecture

4.1 Module Map

4.2 Memory Design

4.3 Parallelism Model

5. Rendering Pipeline

5.1 Stage 1 — Volumetric Splatting

5.2 Stage 2 — Physical Ray Marching

5.3 Stage 3 — Compositing

6. Build Environment

7. Development Roadmap

Phase 0 — Scaffolding ✅ complete

Phase 1 — Core Data Structures ✅ complete

Phase 2 — Cosmology and Initial Conditions

Phase 3 — TreePM Gravity + Leapfrog

Phase 4 — SPH Baryons

Phase 5 — Rendering

Phase 6 — Production Scaling

8. Provenance & References

8.1 Companion Projects (directly reused)

8.2 Scientific References

8.3 Engineering References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages