Skip to content

Commit 489cbfa

Browse files
CopilotWenyinWei
andcommitted
Replace custom parallel classes with standard library examples
Co-authored-by: WenyinWei <[email protected]>
1 parent 66df73f commit 489cbfa

21 files changed

+1448
-3812
lines changed

docs/STANDARD_PARALLELISM.md

Lines changed: 245 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,245 @@
1+
# Integrating Standard Parallelism Libraries with diffeq
2+
3+
This document shows how to use standard parallelism libraries with the diffeq library, addressing the feedback to avoid custom parallel classes and use proven standard libraries instead.
4+
5+
## Overview
6+
7+
Instead of creating custom parallel classes, we recommend using established standard libraries:
8+
9+
- **std::execution** - C++17/20 execution policies
10+
- **OpenMP** - Cross-platform shared memory multiprocessing
11+
- **Intel TBB** - Threading Building Blocks for advanced parallel algorithms
12+
- **NVIDIA Thrust** - GPU acceleration without writing CUDA kernels
13+
14+
## Quick Examples
15+
16+
### 1. Basic Parallel Integration with std::execution
17+
18+
```cpp
19+
#include <examples/standard_parallelism.hpp>
20+
#include <execution>
21+
#include <algorithm>
22+
23+
// Simple harmonic oscillator
24+
struct HarmonicOscillator {
25+
void operator()(const std::vector<double>& y, std::vector<double>& dydt, double t) const {
26+
dydt[0] = y[1]; // dx/dt = v
27+
dydt[1] = -y[0]; // dv/dt = -x (ω=1)
28+
}
29+
};
30+
31+
// Multiple initial conditions in parallel
32+
std::vector<std::vector<double>> initial_conditions(1000);
33+
// ... fill with different initial conditions ...
34+
35+
HarmonicOscillator system;
36+
diffeq::examples::StandardParallelODE<std::vector<double>, double>::integrate_multiple_conditions(
37+
system, initial_conditions, 0.01, 100
38+
);
39+
```
40+
41+
### 2. Parameter Sweep (Beyond Initial Conditions)
42+
43+
```cpp
44+
// Vary system parameters in parallel
45+
std::vector<double> frequencies = {0.5, 1.0, 1.5, 2.0, 2.5};
46+
std::vector<std::vector<double>> results;
47+
48+
diffeq::examples::StandardParallelODE<std::vector<double>, double>::parameter_sweep(
49+
[](const std::vector<double>& y, std::vector<double>& dydt, double t, double omega) {
50+
dydt[0] = y[1];
51+
dydt[1] = -omega*omega*y[0]; // Parameterized frequency
52+
},
53+
{1.0, 0.0}, frequencies, results, 0.01, 100
54+
);
55+
```
56+
57+
### 3. OpenMP for CPU Parallelism
58+
59+
```cpp
60+
#include <omp.h>
61+
62+
std::vector<std::vector<double>> states(1000);
63+
// ... initialize states ...
64+
65+
#pragma omp parallel for
66+
for (size_t i = 0; i < states.size(); ++i) {
67+
auto integrator = diffeq::integrators::ode::RK4Integrator<std::vector<double>, double>(system);
68+
for (int step = 0; step < 100; ++step) {
69+
integrator.step(states[i], 0.01);
70+
}
71+
}
72+
```
73+
74+
### 4. GPU Acceleration with Thrust (NO Custom Kernels!)
75+
76+
```cpp
77+
#include <thrust/for_each.h>
78+
#include <thrust/execution_policy.h>
79+
#include <thrust/device_vector.h>
80+
81+
// Copy to GPU
82+
thrust::device_vector<std::vector<double>> gpu_states = host_states;
83+
84+
// GPU parallel execution without writing kernels!
85+
thrust::for_each(thrust::device, gpu_states.begin(), gpu_states.end(),
86+
[=] __device__ (std::vector<double>& state) {
87+
// Your ODE integration code here
88+
// (integrator needs to be GPU-compatible)
89+
});
90+
91+
// Copy back to host
92+
thrust::copy(gpu_states.begin(), gpu_states.end(), host_states.begin());
93+
```
94+
95+
### 5. Intel TBB for Advanced Parallelism
96+
97+
```cpp
98+
#include <tbb/parallel_for_each.h>
99+
100+
tbb::parallel_for_each(states.begin(), states.end(),
101+
[&](std::vector<double>& state) {
102+
auto integrator = diffeq::integrators::ode::RK4Integrator<std::vector<double>, double>(system);
103+
for (int step = 0; step < 100; ++step) {
104+
integrator.step(state, 0.01);
105+
}
106+
});
107+
```
108+
109+
## Benefits of This Approach
110+
111+
### ✅ Advantages
112+
- **Proven Libraries**: Use optimized, well-tested standard libraries
113+
- **No Learning Curve**: No custom classes to learn
114+
- **Flexibility**: Vary parameters, integrators, callbacks, devices
115+
- **Hardware Specific**: Choose the right tool for each use case
116+
- **GPU Support**: Thrust provides GPU acceleration without writing kernels
117+
- **Standard Detection**: Use standard library functions for hardware detection
118+
119+
### ❌ What We Avoid
120+
- Custom "Facade" classes
121+
- Reinventing parallel algorithms
122+
- Custom hardware detection code
123+
- Restricting flexibility to only initial conditions
124+
125+
## Choosing the Right Library
126+
127+
| Use Case | Recommended Library | Why |
128+
|----------|-------------------|-----|
129+
| Simple parallel loops | `std::execution::par` | Built into C++17, no dependencies |
130+
| CPU-intensive computation | OpenMP | Mature, cross-platform, great CPU scaling |
131+
| Complex task dependencies | Intel TBB | Advanced algorithms, work-stealing |
132+
| GPU acceleration | NVIDIA Thrust | GPU without custom kernels |
133+
| Mixed workloads | Combination | Use the right tool for each part |
134+
135+
## Real-World Examples
136+
137+
### Monte Carlo Simulations
138+
```cpp
139+
// Parameter sweep with different random seeds
140+
std::vector<int> seeds(1000);
141+
std::iota(seeds.begin(), seeds.end(), 1);
142+
143+
std::for_each(std::execution::par, seeds.begin(), seeds.end(),
144+
[&](int seed) {
145+
std::mt19937 rng(seed);
146+
// Run simulation with this random number generator
147+
// Each thread gets its own RNG state
148+
});
149+
```
150+
151+
### Robotics Control Systems
152+
```cpp
153+
// Real-time control with different controller parameters
154+
#pragma omp parallel for schedule(static)
155+
for (size_t i = 0; i < control_parameters.size(); ++i) {
156+
auto controller = create_controller(control_parameters[i]);
157+
auto integrator = diffeq::integrators::ode::RK4Integrator<State, double>(controller);
158+
159+
// Simulate control system
160+
for (int step = 0; step < simulation_steps; ++step) {
161+
integrator.step(robot_state[i], dt);
162+
}
163+
}
164+
```
165+
166+
### Multi-Physics Simulations
167+
```cpp
168+
// Different physics models running simultaneously
169+
#pragma omp parallel sections
170+
{
171+
#pragma omp section
172+
{
173+
// Fluid dynamics
174+
integrate_fluid_system();
175+
}
176+
177+
#pragma omp section
178+
{
179+
// Structural mechanics
180+
integrate_structural_system();
181+
}
182+
183+
#pragma omp section
184+
{
185+
// Heat transfer
186+
integrate_thermal_system();
187+
}
188+
}
189+
```
190+
191+
## Hardware Detection (Standard Way)
192+
193+
Instead of custom hardware detection, use standard library functions:
194+
195+
```cpp
196+
// OpenMP thread count
197+
int num_threads = omp_get_max_threads();
198+
199+
// CUDA device count
200+
int device_count = 0;
201+
cudaGetDeviceCount(&device_count);
202+
bool gpu_available = (device_count > 0);
203+
204+
// TBB automatic initialization
205+
tbb::task_scheduler_init init; // Uses all available cores
206+
```
207+
208+
## Integration with Existing diffeq Code
209+
210+
The beauty of this approach is that **your existing diffeq code doesn't change**:
211+
212+
```cpp
213+
// This works exactly as before
214+
auto integrator = diffeq::integrators::ode::RK4Integrator<std::vector<double>, double>(system);
215+
integrator.step(state, dt);
216+
217+
// Just wrap it in standard parallel constructs when you need parallelism
218+
std::for_each(std::execution::par, states.begin(), states.end(),
219+
[&](auto& state) {
220+
auto integrator = diffeq::integrators::ode::RK4Integrator<std::vector<double>, double>(system);
221+
integrator.step(state, dt);
222+
});
223+
```
224+
225+
## Building and Dependencies
226+
227+
### CMake Integration
228+
```cmake
229+
# For std::execution
230+
target_compile_features(your_target PRIVATE cxx_std_17)
231+
232+
# For OpenMP
233+
find_package(OpenMP REQUIRED)
234+
target_link_libraries(your_target OpenMP::OpenMP_CXX)
235+
236+
# For Intel TBB
237+
find_package(TBB REQUIRED)
238+
target_link_libraries(your_target TBB::tbb)
239+
240+
# For NVIDIA Thrust (comes with CUDA)
241+
find_package(CUDA REQUIRED)
242+
target_link_libraries(your_target ${CUDA_LIBRARIES})
243+
```
244+
245+
This approach gives you maximum flexibility while leveraging proven, optimized libraries instead of reinventing the wheel.

0 commit comments

Comments
 (0)