Skip to content

Commit 266bca6

Browse files
committed
Expand the README.
1 parent a746b81 commit 266bca6

File tree

1 file changed

+62
-1
lines changed

1 file changed

+62
-1
lines changed

README.md

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,64 @@
1-
Coherent deps (dependencies) is a library of tools used by the [Coherent System](https://bit.ly/coherent-system) for providing insights into the dependencies used by a code base, in particular resolving imports to the dependencies that supply those imports. This sophisticated behavior enables Python projects to avoid the need to declare dependencies and to instead focus on the implementation. The Coherent OSS community provides this library separately to make the functionality available for alternative uses.
1+
Coherent deps (dependencies) provides insights into the dependencies used by a code base, resolving imports to the dependencies that supply those imports. The Coherent OSS community presents this library to make the functionality available for a variety of uses.
2+
3+
The [Coherent System](https://bit.ly/coherent-system) implements automatic dependency inference using coherent.deps, allowing Python projects to avoid the need to declare dependencies and to instead focus on the implementation.
4+
5+
[pip-run](https://pypi.org/project/pip-run) leverages this library to enable on-demand installation of dependencies required by scripts, as implied by their imports.
26

37
See the code documentation for details. See also the [usage in `coherent.build`](https://github.com/coherent-oss/coherent.build/blob/a95e65df11c86658a689a7b7f5f6626321802f7e/discovery.py#L162-L175) for the usage by the build backend.
8+
9+
## Design Overview
10+
11+
`coherent.deps` is implemented primarily by two modules, `imports` and `pypi`.
12+
13+
`imports` performs the function of statically analyzing code to detect imports from a codebase and separate `stdlib` imports from third-party packages backed by dependencies.
14+
15+
`pypi` provides the package index support, providing a mapping of imports to package names in PyPI. It leverages a world-readable MongoDB database hosted in MongoDB Atlas to implement the mapping. Each PyPI project gets an entry in the `coherent builder:pypi:distributions` collection.
16+
17+
Each entry has the following structure:
18+
19+
- `_id`: the MongoDB generated unique ID
20+
- `id`: the normalized name of the package (e.g. `typing-extensions` or `cherrypy`)
21+
- `name`: the package's canonical name (e.g. `typing_extensions` or `CherryPy`)
22+
- `updated`: The last time this package was updated from PyPI
23+
- `downloads`: The number of downloads in the past 30 days as returned by the pypinfo query.
24+
- `roots`: A collection of root importable names presented by the package (e.g. `requests` or `coherent.deps` but not `requests.errors`).
25+
- `error`: Any error that occurred when attempting to process the package.
26+
27+
Only `id` and `downloads` are required. An entry without `updated` is due to be updated by the `process` routine. An entry without `roots` or `name` has never been processed.
28+
29+
## Maintaining the mapping
30+
31+
There is a subpackage, `distributions`, which contains two scripts, `load` and `process` (invoked by `python -m coherent.deps.distributions.{load,process}`) to load the distributions from a "top downloads" summary and then process those by loading their data from PyPI.
32+
33+
To get the full set of "top downloaded" packages that contain at least one download, run this query:
34+
35+
```
36+
🐚 pipx run --python 3.13 pypinfo --json --indent 0 --limit 800000 --days 30 "" project > ~/Downloads/top-pypi-packages-30-days.json
37+
```
38+
39+
Note that this pypinfo script requires a Google API key and with a very high limit like 800000, will cost several dollars to run, so the maintainer only runs it about twice a year.
40+
41+
Then, to refresh the database with the downloaded dataset:
42+
43+
```
44+
🐚 py -3.13 -m pip-run coherent.deps -- -m coherent.deps.distributions.load ~/Downloads/top-pypi-packages-30-days.json
45+
```
46+
47+
This process will ensure that all packages are up-to-date with their latest download stats.
48+
49+
From there, ensure that any newly-added packages are processed:
50+
51+
```
52+
🐚 py -3.13 -m pip-run coherent.deps -- -m coherent.deps.distributions.process
53+
```
54+
55+
Note that only those entries without an `updated` field will be processed. To re-process packgaes that may have grown stale, clear the `updated` field on those entries. For example, to mark stale any entries older than 6 months:
56+
57+
```
58+
max_age = datetime.timedelta(days=6*30)
59+
filter = {'updated': {'$lt': datetime.today() - max_age}}
60+
op = {'$unset': 'updated'}
61+
collection.update_many(filter, op)
62+
```
63+
64+
Thereafter, re-run the `process` routine, which will re-process the packages without the `updated` field.

0 commit comments

Comments
 (0)