Skip to content

Conversation

@zka26
Copy link
Contributor

@zka26 zka26 commented Oct 31, 2025

Summary

Parallel builds of Example 5 create a race: good and bad both define ml_mod.mod into the same folder and in parallel, one target overwrites the other's .mod, which results in error.

Reproduction

$ cmake --build . --parallel 16
[ 16%] Building Fortran object CMakeFiles/example5_simplenet_infer_fortran_good.dir/good/fortran_ml_mod.f90.o
[ 33%] Building Fortran object CMakeFiles/example5_simplenet_infer_fortran_bad.dir/bad/fortran_ml_mod.f90.o
f951: Fatal Error: Cannot rename module file ‘ml_mod.mod0’ to ‘ml_mod.mod’: No such file or directory
compilation terminated.
gmake[2]: *** [CMakeFiles/example5_simplenet_infer_fortran_bad.dir/build.make:88: CMakeFiles/example5_simplenet_infer_fortran_bad.dir/bad/fortran_ml_mod.f90.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/example5_simplenet_infer_fortran_bad.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
Error copying Fortran module "ml_mod.mod".  Tried "ML_MOD.mod" and "ml_mod.mod".
gmake[2]: *** [CMakeFiles/example5_simplenet_infer_fortran_good.dir/depend.make:8: CMakeFiles/example5_simplenet_infer_fortran_good.dir/ml_mod.mod.stamp] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/example5_simplenet_infer_fortran_good.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2

Fixes #306

Give each target its own Fortran module output directory and add it to the include path to avoid collisions.

After

Both example5_simplenet_infer_fortran_good and example5_simplenet_infer_fortran_bad build reliably and parallel.

$ cmake --build . --parallel 16
[ 33%] Building Fortran object CMakeFiles/example5_simplenet_infer_fortran_bad.dir/bad/fortran_ml_mod.f90.o
[ 33%] Building Fortran object CMakeFiles/example5_simplenet_infer_fortran_good.dir/good/fortran_ml_mod.f90.o
[ 50%] Building Fortran object CMakeFiles/example5_simplenet_infer_fortran_bad.dir/bad/simplenet_infer_fortran.f90.o
[ 66%] Building Fortran object CMakeFiles/example5_simplenet_infer_fortran_good.dir/good/simplenet_infer_fortran.f90.o
[ 83%] Linking Fortran executable example5_simplenet_infer_fortran_bad
[100%] Linking Fortran executable example5_simplenet_infer_fortran_good
[100%] Built target example5_simplenet_infer_fortran_good
[100%] Built target example5_simplenet_infer_fortran_bad

Environment

OS: Ubuntu 24.04.3 LTS on WSL2
Kernel: 6.6.87.2-microsoft-standard-WSL2
GNU Fortran (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
cmake version 3.28.3
GNU Make 4.3
Python 3.12.3
PyTorch 2.9.0+cpu cuda: False

Note

pt2ts.py saved the model as saved_simplenet_cpu.pt on my run, while the Fortran example expected saved_simplenet_model.pt. I solved this by renaming.

@jatkinson1000
Copy link
Member

Thanks @zka26

I'll do a local check of this now, given it isn't run by the CI tests

Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zka26 I have run locally and confirm that this is a fix.

I have made 2 suggestions that will allow you to pass the linting which is currently failing.
A small comment in the CMake to note that we need this additional code to place identically named modules in separate directories to avoid conflict might also be a nice touch.

The failing tests are not your fault, and I have opened a patch to resolve this at #461
Once that is merged you should be able to rebase and pass the tests.

Let me know once you make the style changes (feel free to accept the suggestions).

@jatkinson1000
Copy link
Member

@zka26 I have also opened a PR at #462 to fix the other issue you raised regarding the name of the TorchScript file (thank you for noting that). Feel free to give it a look.

@jatkinson1000 jatkinson1000 added the hacktoberfest Issues open for Hacktoberfest contributions label Oct 31, 2025
@zka26
Copy link
Contributor Author

zka26 commented Oct 31, 2025

@jatkinson1000 Thanks! I have accepted the suggestions and added comments about the additional code. I will rebase once #461 is merged.

Also thanks for opening #462, it addresses exactly what I ran into.

@jatkinson1000
Copy link
Member

Hi @zka26 Thanks for that.
The fixes should now be merged, so if you can do a rebase onto the latest main that shouldl then pass and we can merge 🤞.

@zka26 zka26 force-pushed the fix/example5-fortran-mod-dir-race branch from a7c8808 to 0c967a3 Compare November 3, 2025 10:57
Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zka26

You already pre-empted the review I was writing to fix the executeable names (I guess you checked locally).

Made a couple of suggestions to keep the close brackets on the final line to be consistent with the rest of the file (though I can see an argument for own-line being clearer).

Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about this @zka26 I'm not sure why the linter doesn't like that particular statement, but testing locally these should now pass!

@jatkinson1000 jatkinson1000 added the hacktoberfest-accepted PRs approved for Hacktoberfest contribution label Nov 3, 2025
Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we got there eventually!

Thanks for the submission @zka26 your contribution is really appreciated.

I've marked with hacktoberfest-accepted if you need that and will merge shortly!

@jatkinson1000 jatkinson1000 merged commit a9950c3 into Cambridge-ICCS:main Nov 3, 2025
7 checks passed
@zka26
Copy link
Contributor Author

zka26 commented Nov 3, 2025

@jatkinson1000 Thanks! I am glad to see the greens now. Thank you for the opportunity and for the quick suggestions, so it was very fast to apply those changes.

@jatkinson1000
Copy link
Member

@zka26 out of interest can I ask how you found this code?
If you are using it for any work we like to know and are happy to feature it on our website.
If you just found us for hacktoberfest contributions that's also fine!

@zka26
Copy link
Contributor Author

zka26 commented Nov 3, 2025

@jatkinson1000 I found the repo through Hacktoberfest.
I noticed first the issue about the incorrect example numbers in the example READMEs, then I browsed the open issues and came across this one. Thanks for asking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hacktoberfest Issues open for Hacktoberfest contributions hacktoberfest-accepted PRs approved for Hacktoberfest contribution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cmake race condition - looping example

2 participants