Skip to content

Speed up build_ellipse_model #2046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

taranu
Copy link

@taranu taranu commented May 20, 2025

Like #1288 (which seems abandoned), this speeds up build_ellipse_model to address #1168 . However, it is entirely written in Cython and does not require a separate shared library. In local testing on Python 3.12, it's about 130x faster than the current pure Python implementation. I also made the tolerance for one of the unit tests much stricter, because I found that it was passing even when the output was incorrect.

There is a potentially premature optimization to specialize the function depending on whether harmonic arrays are passed or not. The C code generated on my system looked correct, but I didn't actually find any difference in performance between high_harmonics=False and True.

I also attempted to parallelize it with Cython's prange. However, in my testing on multiple Linux systems, I found it never actually used more than one thread. Additionally, I couldn't figure out how to allow for a num_threads=None value in the Cython build_ellipse_model_c function, since it's typed as an int. Thus, I'm making this a draft PR to start. If someone more versed in Cython/prange could have a look at that, it would be helpful. Otherwise, the parallel support could simply be dropped.

@taranu taranu force-pushed the cython-build-ellipse-model branch 2 times, most recently from acbef6d to c1f4e2f Compare May 22, 2025 18:50
@taranu
Copy link
Author

taranu commented May 23, 2025

I tested this on two different systems and found that num_threads does nothing because it was compiling without -fopenmp and not linking against libomp (or libgomp). After manually recompiling, parallelism works fine - or at least, runtimes are about ~3.2x faster with 4 threads.

I see that there are openmp helpers in extension-helpers, which is a requirement and enabled in pyproject.toml here, but I'm unclear on how to encourage pip install to add the relevant compiler flags. @astrofrog, could you advise?

@astrofrog
Copy link
Member

Even though extension-helpers includes some OpenMP-related helpers, we haven't really been using them in astropy packages, as we haven't really done much investigation of how this works in terms of building wheels and whether it could cause issues on some platforms. I would personally suggest removing the parallel aspect from this PR for now until we understand better whether we should be using OpenMP or other mechanisms.

@taranu taranu force-pushed the cython-build-ellipse-model branch from c1f4e2f to 5948c24 Compare May 30, 2025 19:17
@taranu taranu force-pushed the cython-build-ellipse-model branch from 5948c24 to 79bd3a6 Compare May 30, 2025 19:57
@taranu
Copy link
Author

taranu commented May 30, 2025

I would personally suggest removing the parallel aspect from this PR for now until we understand better whether we should be using OpenMP or other mechanisms.

Thanks for the reply. I was able to compile and install the parallel version via CFLAGS="${CFLAGS} -fopenmp" LDFLAGS="${LDFLAGS} -lomp" pip install ., but like you said, I have no idea how to test that across platforms, so I dropped the parallel part from the PR, which should be reviewable now. I'll re-add it on a separate branch if anyone wants to use it in the meantime.

@taranu taranu marked this pull request as ready for review May 30, 2025 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants