GPU implementation (almost complete) #46

kartikmandar · 2025-05-12T10:03:10Z

It has some CUDA conflicts right now that I have to sort out. While testing cupy and cupy-cuda11x and using CUDA directly, I have messed up my env that I have to sort it out before testing. I will probably try this on collab/runpod if this problem persists.

I will add request for review when I feel its error free and tutorial is working fine.

- Implement direct GPU evaluation for AiryBeam and UVBeam interpolation - Achieve 40-2000x speedup while maintaining numerical accuracy

- Test fftvis (CPU/GPU) and matvis (CPU/GPU) produce identical results - Fix GPU implementation slice handling and loop structure - All implementations match within numerical precision

TODO: Fix GPU-CPU accuracy differences (max diff ~56.3) TODO: Fix tutorial typos and remove debugging cells

- Update test_use_gpu_function to handle cached GPU availability correctly - Fix CPU simulation tests to match current fftvis-matvis API compatibility - Update GPU tests to use current CuPy API (OutOfMemoryError, random.random) - Increase Ray object store memory to meet minimum requirements

Compare fftvis (CPU/GPU) vs matvis performance across varying sources, times, frequencies, and baselines

- Renamed modules: cpu_beams→beams, cpu_nufft→nufft, etc (keeping cpu_simulate/gpu_simulate) - Fixed all imports to use new module names - Resolved conflicts in 4 test files

- Fixed CoordinateRotation.select_chunk() call to use correct number of arguments (2 instead of 3) - Fixed backend parameter compatibility by separating CPU-specific parameters in wrapper - Fixed test to use correct beam function names for CPU vs GPU comparison

- Fix type checking to handle BeamInterface-wrapped beams - Add polarized flag to cache key to prevent data collision - Fix CuPy compatibility with wrap mode for azimuth coordinates - GPU and CPU now match within numerical precision (3e-9)

… saving - Add 4-5 more tiers for each benchmark parameter (sources, times, frequencies, antennas) - Add GPU vs CPU speedup to all speedup comparison plots - Add system information collection (CPU, RAM, GPU specs) - Add comprehensive results saving with timestamped directory - Include visibility consistency checks between all backends - Fix baseline calculation to match matvis (all N×N baselines) - Generate summary reports with key findings and speedup statistics

Simplified GPU NUFFT fallback hierarchy by removing the incorrect Type 1+2 decomposition. Now throws informative error with installation instructions for cufinufft==2.4.0b1 and finufft==2.4.0rc1 when Plan-based Type 3 fails.

- Add conftest.py with centralized GPU availability detection - Skip GPU tests when cupy is not available - Add fallbacks for optional dependencies (tabulate, matplotlib) - Fix conditional imports in GPU test modules

kartikmandar · 2025-06-04T00:03:12Z

I am focusing on increasing test coverage and writing better tests now.

GPU was 2.7x slower than CPU for high frequency counts due to sequential processing. Now batch multiple frequencies per NUFFT call. - Add batch NUFFT functions for 2D and 3D transforms - Process frequencies in batches (up to 128) based on GPU memory - Reduce kernel launches from O(n_freq) to O(n_freq/batch_size) Results: 240 frequencies now 5.78x faster (79.58s → 13.77s), GPU now 2.5x faster than CPU.

kartikmandar added 2 commits May 12, 2025 12:42

add gpu tutorial

a118c4a

Implement GPU backend using cupy and cufinufft

a6f4b5f

tyler-a-cox mentioned this pull request May 13, 2025

Prepare for GPU support #22

Closed

kartikmandar added 4 commits May 19, 2025 12:21

add setup call in gpu simulate

0d6a250

some cleaning of code

d47f319

correct gpu_nufft_transforms and compare with cpu_nufft_transforms

9d5d156

Add some tests to compare cpu_beams and gpu_beams

1d4ec5e

kartikmandar requested a review from Copilot May 30, 2025 10:20

This comment was marked as resolved.

Sign in to view

kartikmandar added 11 commits May 30, 2025 16:34

Add GPU acceleration for beam evaluation

6e7c7d8

- Implement direct GPU evaluation for AiryBeam and UVBeam interpolation - Achieve 40-2000x speedup while maintaining numerical accuracy

replace print by logging.warning

39b3f3a

Add comprehensive four-way visibility simulation tests

8bbc9e6

- Test fftvis (CPU/GPU) and matvis (CPU/GPU) produce identical results - Fix GPU implementation slice handling and loop structure - All implementations match within numerical precision

Some performance tests for gpu vs cpu

f547ee6

Add GPU tutorial updates and performance comparison notebook

e3909d2

TODO: Fix GPU-CPU accuracy differences (max diff ~56.3) TODO: Fix tutorial typos and remove debugging cells

Add comprehensive performance benchmark notebook

4d068ec

Compare fftvis (CPU/GPU) vs matvis performance across varying sources, times, frequencies, and baselines

Merge upstream/main: Update module names and resolve conflicts

9b88195

- Renamed modules: cpu_beams→beams, cpu_nufft→nufft, etc (keeping cpu_simulate/gpu_simulate) - Fixed all imports to use new module names - Resolved conflicts in 4 test files

Revert select_chunk fix - method requires both chunk and time args

f066728

Fix select_chunk calls in GPU implementation and syntax error

d8da811

kartikmandar requested review from Copilot, steven-murray and tyler-a-cox June 1, 2025 11:02

This comment was marked as resolved.

Sign in to view

kartikmandar requested a review from Copilot June 2, 2025 09:40

This comment was marked as duplicate.

Sign in to view

This comment was marked as resolved.

Sign in to view

resolved copilot review suggestions

17aeb38

kartikmandar force-pushed the gpu-implementation branch from 63bcecc to 17aeb38 Compare June 3, 2025 08:46

This comment was marked as resolved.

Sign in to view

kartikmandar added 4 commits June 3, 2025 23:27

Fix GPU beam interpolation issues

7680d70

- Fix type checking to handle BeamInterface-wrapped beams - Add polarized flag to cache key to prevent data collision - Fix CuPy compatibility with wrap mode for azimuth coordinates - GPU and CPU now match within numerical precision (3e-9)

Edit some failing tests for now

5c58170

kartikmandar changed the title ~~GPU implementation (not complete)~~ GPU implementation (almost complete) Jun 3, 2025

Fix CI test failures by adding GPU dependency checks

1e47c8c

- Add conftest.py with centralized GPU availability detection - Skip GPU tests when cupy is not available - Add fallbacks for optional dependencies (tabulate, matplotlib) - Fix conditional imports in GPU test modules

This comment was marked as resolved.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU implementation (almost complete) #46

GPU implementation (almost complete) #46

Uh oh!

kartikmandar commented May 12, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

kartikmandar commented Jun 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GPU implementation (almost complete) #46

Are you sure you want to change the base?

GPU implementation (almost complete) #46

Uh oh!

Conversation

kartikmandar commented May 12, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

kartikmandar commented Jun 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants