Skip to content

Conversation

@kartikmandar
Copy link
Collaborator

It has some CUDA conflicts right now that I have to sort out. While testing cupy and cupy-cuda11x and using CUDA directly, I have messed up my env that I have to sort it out before testing. I will probably try this on collab/runpod if this problem persists.

I will add request for review when I feel its error free and tutorial is working fine.

@kartikmandar kartikmandar requested a review from Copilot May 30, 2025 10:20

This comment was marked as resolved.

- Implement direct GPU evaluation for AiryBeam and UVBeam interpolation
- Achieve 40-2000x speedup while maintaining numerical accuracy
- Test fftvis (CPU/GPU) and matvis (CPU/GPU) produce identical results
- Fix GPU implementation slice handling and loop structure
- All implementations match within numerical precision
TODO: Fix GPU-CPU accuracy differences (max diff ~56.3)
TODO: Fix tutorial typos and remove debugging cells
- Update test_use_gpu_function to handle cached GPU availability correctly
- Fix CPU simulation tests to match current fftvis-matvis API compatibility
- Update GPU tests to use current CuPy API (OutOfMemoryError, random.random)
- Increase Ray object store memory to meet minimum requirements
Compare fftvis (CPU/GPU) vs matvis performance across varying sources, times, frequencies, and baselines
- Renamed modules: cpu_beams→beams, cpu_nufft→nufft, etc (keeping cpu_simulate/gpu_simulate)
- Fixed all imports to use new module names
- Resolved conflicts in 4 test files
- Fixed CoordinateRotation.select_chunk() call to use correct number of arguments (2 instead of 3)
- Fixed backend parameter compatibility by separating CPU-specific parameters in wrapper
- Fixed test to use correct beam function names for CPU vs GPU comparison

This comment was marked as resolved.

@kartikmandar kartikmandar requested a review from Copilot June 2, 2025 09:40

This comment was marked as duplicate.

steven-murray

This comment was marked as resolved.

@kartikmandar

This comment was marked as resolved.

- Fix type checking to handle BeamInterface-wrapped beams
- Add polarized flag to cache key to prevent data collision
- Fix CuPy compatibility with wrap mode for azimuth coordinates
- GPU and CPU now match within numerical precision (3e-9)
… saving

- Add 4-5 more tiers for each benchmark parameter (sources, times, frequencies, antennas)
- Add GPU vs CPU speedup to all speedup comparison plots
- Add system information collection (CPU, RAM, GPU specs)
- Add comprehensive results saving with timestamped directory
- Include visibility consistency checks between all backends
- Fix baseline calculation to match matvis (all N×N baselines)
- Generate summary reports with key findings and speedup statistics
Simplified GPU NUFFT fallback hierarchy by removing the incorrect Type 1+2
decomposition. Now throws informative error with installation instructions
for cufinufft==2.4.0b1 and finufft==2.4.0rc1 when Plan-based Type 3 fails.
@kartikmandar kartikmandar changed the title GPU implementation (not complete) GPU implementation (almost complete) Jun 3, 2025
- Add conftest.py with centralized GPU availability detection
- Skip GPU tests when cupy is not available
- Add fallbacks for optional dependencies (tabulate, matplotlib)
- Fix conditional imports in GPU test modules
@codecov

This comment was marked as resolved.

@kartikmandar
Copy link
Collaborator Author

I am focusing on increasing test coverage and writing better tests now.

GPU was 2.7x slower than CPU for high frequency counts due to sequential
processing. Now batch multiple frequencies per NUFFT call.

- Add batch NUFFT functions for 2D and 3D transforms
- Process frequencies in batches (up to 128) based on GPU memory
- Reduce kernel launches from O(n_freq) to O(n_freq/batch_size)

Results: 240 frequencies now 5.78x faster (79.58s → 13.77s), GPU now
2.5x faster than CPU.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants