Public API#85
Conversation
There was a problem hiding this comment.
Just a few comments/questions so far - one of the main things that bothers me in general and that I don't think there is a solution to is that I can do e.g.:
from astropy.cosmology.realizations import ScienceState
Obviously this isn't public API, and ScienceState isn't in __all__, but there's not real way to prevent users from relying on it, and we can't rename all imports in a module to include a _ prefix. So I think that we should probably also make it so that our API docs are also an authoritative source of public API and ensure that it's consistent with the other rules. We should also make sure that all public APIs are documented in the docs (we don't check that this is the case right now).
| 2. All modules must have an ``__all__`` attribute, even if it is empty. The | ||
| ``__all__`` attribute defines the public and private interface of the module | ||
| in which it is defined. Anything in ``__all__`` is public, including | ||
| underscore-prefixed symbols. Anything not in ``__all__`` is private in that |
There was a problem hiding this comment.
although I think we should probably disallow underscore-prefixed symbols in __all__?
There was a problem hiding this comment.
I agree that practically there should not, but I strongly think __all__ should be definitive, so if we have an underscore-prefixed object that we want to make public the mandatory steps are (in this order):
- put it in
__all__. - update the docs to reflect
__all__. - super strongly encouraged to remove the underscore prefix (updating steps 1 & 2).
This is adopting the Scipy disambiguation of PEP 8, adding primarily the mandate that empty __all__ be included in modules with no Public API.
| - Clearly state if a documented object is actually private. | ||
| 3. **Add prefixes**: 1. Add prefixes to all modules that are not public. 2. Add | ||
| prefixes to all classes, functions, and attributes that are not public. | ||
| :note:`I'm less enthusiastic on this point.` |
There was a problem hiding this comment.
I'm in favor of this as long as - as described above - we don't need to add underscore to symbols that are already in an underscore module for instance (so e.g. nothing in astropy.io.fits._tiled_compression needs to have an underscore prefix.
There was a problem hiding this comment.
I am fine with that. I have reservations about going all in on underscore prefixing everything. Not needing to underscore symbols in private modules (which still have an __all__) SGTM.
|
After further thought, I think what is going to be really important here is to define what public API is from the perspective of a user - that is, a user won't know what |
|
The problem with |
|
What do you propose for submodules that currently define |
|
Maybe in that case .core should technically be private? (._core) |
👍. That is the suggestion of PEP 8 -- and that |
So this would be one of the largest changes to Astropy. PEP8 and thus Scipy and static typing all say that # __init__.py
# No __all__ is defined
from .core import *# core.py
__all__ = ["Foo"]
class Foo: ...Means that # __init__.py
__all__ = ["Foo"] # Foo is public in this module, even if it is defined elsewhere.
from .core import Foo# core.py
__all__ = [] # nothing is public in this module. Please look elsewhere.
class Foo: ...Essentially we need to move the contents of various Update The better option is to rename # __init__.py
from . import _core, ...
from ._core import *
...
__all__ = [] + _core.__all__ + ...# _core.py (formerly core.py)
__all__ = ["Foo"]
class Foo: ...This still supports |
I think Yes. To make sure we're on the same page, I think communicating it this way to users should be the logical consequence of the deeper rules:
Having all these rules means that a user can only get to public symbols though other public symbols -- that "anything that can be accessed through tab completion in IPython and which does not have _ prefixes anywhere" and "anything in the API docs" (unless explicitly stated) is public. It's not the source or our definition of public API, it's the consequence. |
I agree, |
If we want to control strictly what's public and private yes, basically all submodules should be private (renamed with underscore) and public API exported in the subpackages'
With this solution yes, this would require a lot of changes, and may be painful to do. So I don't like this solution. To summarize:
I don't like option 1 because it moves the list of exported functions from the module itself to another place, and requires a lot of changes which can be prone to errors. Option 2 is more reasonable. Then as you say, the underscore prefix is also just a convention, but that's the closest thing to a private scope in Python's land. And autocomplete respect it, so when users browse the functions in their shell they will see only public API. As a users I never checked the content of [1] They also kept |
|
(Some of this developed from discussion at the coordination meeting (including @nstarman, @saimn , @astrofrog , @tepickering, @pllim, @nden, @WilliamJamieson), although I don't think I can say that all of my points above are consensus of those folks, it's some mix of that and just straight up my own opinion.) I very much agree with @astrofrog's point here:
Which I think has up until this point (in an uncodified way) is that whatever the docs say is the public API. So I think it makes sense to codify that as the "true answer". @nstarman's point was that if we follow the rules here, it's the same between those, and that it's only an aberration if these are different. But I was/am concerned about the inevitable state when something isn't working right. So I think what we settled on is that we start by saying as of when this APE is accepted, the docs are the "true" public API, but this APE presents a plan to get to a state where the rules highlighted in this APE lead naturaly to the docs just reflecting the same thing as these rules. I still personally think we should have it true that the final source of authority is the documentation, because that's more user-facing of a contract, as @astrofrog says. But I think if we say that's the reality now, and we might re-visit it after this APEs plan is implemented, that's a reasonable compromise. Two more opinions to offer:
And one question:
|
I would say not. Even for things like removing astropy-helpers, it was a tedious campaign with writing up a transition guide and opening some PRs downstream. For things like this that would break API, it is a non-starter. |
|
@astrofrog @eteq @saimn @pllim, I've updated this APE based on the discussions we had at the conference an in this thread. LMK what you think! |
@saimn I've been working on this over at |
I also like Option 2 a lot. It works because With this as the template, the example from #85 (comment) becomes # __init__.py
__all__ = ["Foo"] # Foo is public in this module, even if it is defined in `_core`, which is private.
from ._core import Foo# _core.py
__all__ = ["Foo"] # Foo is "public" in this module, but this module is private.
class Foo: ...
@eteq, I believe @astrofrog's comment largely answers this question. |
@eteq, I agree.
I added a section on a pre-commit CI check. It actually looks to be fairly simple to check that a public module has corresponding documentation since we have |
| not have a uniform and systematic approach to communicating what is public vs | ||
| internal, then we cannot expect users and especially their tools to know what is | ||
| public vs internal. | ||
|
|
There was a problem hiding this comment.
From astropy/astropy#15169, I recall another source of issues that has bitten astropy in the past: I/O registration. When deep sub-modules look public, e.g. astropy.cosmology.flrw.lambdacdm.LambdaCDM then that is generally used as the class pathway in I/O rather than the better astropy.cosmology.LambdaCDM. When the class is moved it breaks the I/O and needs a whole backwards-compatibility logic to support files using the previous path. Cleaning up public vs internal, e.g. to astropy.cosmology._flrw.lambdacdm.LambdaCDM would make this an obviously bad way to serialize the class and code authors would get it "right" (astropy.cosmology.LambdaCDM) the first time. Clear public API makes for more stable code.
probably deserves a few sentences.
There was a problem hiding this comment.
This is slightly orthogonal to the issue here: really, one needs to set __module__ to avoid this - registration just followed that, not a choice made by coders. Note that numpy in fact does this - which partially is annoying, since one no longer knows where the function is defined.
There was a problem hiding this comment.
For me the pain point was YAML serialization because I used f"{class.__module__}.{class.__qualname__}", which looks reasonable as code, but led to the problems described above. If, when making the tests, I had noticed a private path I would have fixed the problem from the start.
I believe this applies more generally, if something is defined in a private module but imported to a public location then it may be imported from there again, e.g. as part of I/O. Serializing using only the public import location is much better than what I did, which seemed reasonable at the time. Setting __module__ is one way to force public locations in serialization (which I agree has issues); a path map, or str regex / replace operation are other means.
There was a problem hiding this comment.
Yes, that one is an issue for examples and sphinx too - I really dislike that when we made representations a module (which was definitely a good idea!), we had to make so many updates to the docs. It would also be nice to use regex for things like __construct_mixin_classes
For me, this is still separate from the APE, since this has annoyed me amply already! But I can see that coming from an environment where files with leading underscores have more meaning, you could have avoided the issues.
There was a problem hiding this comment.
Agreed we can leave it out of this APE. I think the point I made "If, when making the tests, I had noticed a private path I would have fixed the problem from the start." means adopting this APE will automatically force most of these I/O issues to be fixed
|
Thanks for writing this! I think it is good to describe what formalizing the status quo would entail:
Implementing this is very little work and would give consistency, without breaking anything. A few more general points:
Overall, I'm fairly strongly against this. But being consistent within |
@mhvk, if I understand this point correctly, this is similar to point 2 of the implementation section. I think where this APE differs is that it aims to be explicit everywhere and not have anything be implicitly private (or public).
This is definitely true in general. For me, the benefits outweigh the costs for 3 reasons:
True. Python's PEP-8 recommendations for how to structure a module to signal public vs private API is not wonderful for people using tab-completion in a terminal-based text editor. However, the tab-completion problem for developers is actually a feature for users, because they won't see private API. (is there a way to set up tab-completion aliases so that the old paths might point to the new ones on a local macine? A quick google found https://www.gnu.org/software/bash/manual/html_node/Programmable-Completion.html, indicating such things are possible and their might be convenient tools to set this up for devs that use terminal-based editors)
So To the broader point, top-level modules, e.g. becomes There is an alternative, which is to group related components into sub-modules. Whether this happens is of course up to the maintainers of each module. Personally I like hierarchical organization as it conveys information about a file by dint of its location.
Sounds like a reasonable estimate.
We are more closely aligned with
Fair enough! Thanks for the open mind and valuable discussion! |
|
Note: if we take cues from "upstream" packages, |
|
This was discussed as part of "State of APEs" at Coordination Meeting 2024. I think reactions were mixed and I cannot see any clear action items on how to move this forward (or if we should). |
|
One idea I raised was that at the very least if we cannot reach consensus on changing current code, we should see if we can agree on rules for any new code? |
|
Re: #85 (comment) For completeness, my response to that idea in the meeting was that if it is only recommendation for new code, I do not think we need an APE, but rather we can modify the dev docs. |
|
There was indeed no consensus on the underscore prefixes, in large part because, contrary to what I thought at least, things like There also seemed to be consensus that in the end the documentation should be the ultimate arbiter, since that is what users would normally see (and the mistake of documenting the
Finally, I'd say there was no consensus either on new vs old code, or subpackages doing different things, with the latter having the advantage of maintainers being able to set a policy they feel is best, but the disadvantage that then there is no package-wide logic anymore at all, while currently there is (with |
|
It is private, indeed. But the feeling at the coordination meeting was that breaking people's code for code style purity is too big a price to pay. And it is not likely we would ever define But for astropy as a whole, continuity and consistency are important too. But nothing stopping us from making it clearer what is public and not, by defining appropriate |
🎉. That would be excellent.
A great thing to do, no matter the outcome of this APE. When first proposed, one of the counter-arguments was that "upstream" libraries haven't done this. But now both When I first wrote this APE it was to make an argument "why we should do this". Now that our upstreams have done the same thing, IMO the argument shifts to "why aren't we doing this?". Prima facie we should. Documentation is important, but it is most certainly not how any of our upstream libraries define their public API. The point of public-facing documentation is to document what is public, not to make it public. Just like how we generate documentation from docstrings (prioritizing that the code contains its on documentation) so too does Python, our upstream libraries, and most everyone else makes it so that public/private is a product of the code, not imposed on it. |
I see 3 options here I agree with @nstarman that Since, as you guys pointed out, we may eventualy have to move part the private code from Footnotes
|
|
Numpy was in a different state, with, e.g., some parts of Overall, @neutrinoceros made the right list, and I guess the conclusion in the coordination meeting was that In the meantime, there's nothing stopping us from incrementally ensuring that docstrings and |
|
Also keep in mind that NumPy has the backing of private industry (e.g., NVidia). Astronomy does not. I have started seeing pipelines pinning |
astrofrog
left a comment
There was a problem hiding this comment.
One thing we could consider doing in any case would be for common cases of misuse (such as astropy.units.quantity) we could use batchpr to search through GitHub and open PRs to fix these incorrect imports.
It's also worth considering another option d, which is to do The Right Thing ™️ for new sub-packages and code, to at least not increase the problem.
|
With the acceptance of the revisions to APE 1, we're going to start following the new APE proposal process. While all of the APE editors (for now exactly the same as the CoCo) will contribute to the process, one of the APE editors will be the point-person (chief APE editor) shepherding your team through the process. Your chief APE editor is Erik Tollerud (@eteq). You can review the next steps in the APE process here, but just to briefly summarize them here:
Please let us (either your chief APE editor or the full CoCo) know if you have any questions! |
1b3bbcf to
efa7588
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces a new APE document defining Astropy’s policy for what constitutes the public versus internal API, with an emphasis on consistency between __all__, naming conventions, and documentation, and on making the API machine-detectable for tools.
Changes:
- Adds a full APE text describing the problem of ambiguous public/private interfaces in Python and within Astropy.
- Proposes concrete rules around
__all__, underscore prefixes, documentation requirements, and “locally public” vs. truly public symbols, including worked examples and a summary table. - Outlines a phased implementation plan (documentation snapshot,
__all__updates, deprecations, tooling, and naming changes) plus examples from Astropy, NumPy, and SciPy.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0085a63 to
2ef3062
Compare
Signed-off-by: nstarman <nathanielstarkman@gmail.com> Co-authored-by: Clément Robert <cr52@protonmail.com> Signed-off-by: nstarman <nstarman@users.noreply.github.com>
|
@eteq I've updated this APE. |
Up for discussion!
Very much a work in progress. Hopefully refined by discussion at the upcoming conference.