From f26f5304a9f6958407605e145b2e0f3fb0e0875e Mon Sep 17 00:00:00 2001 From: Alexey Tereshenkov <50622389+AlexTereshenkov@users.noreply.github.com> Date: Mon, 12 Feb 2024 00:31:15 +0000 Subject: [PATCH 1/2] docs: export dependency graph as adjacency list --- .../using-pants/project-introspection.mdx | 66 +++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/docs/docs/using-pants/project-introspection.mdx b/docs/docs/using-pants/project-introspection.mdx index a2222714278..5d228a6d231 100644 --- a/docs/docs/using-pants/project-introspection.mdx +++ b/docs/docs/using-pants/project-introspection.mdx @@ -127,6 +127,72 @@ To include the original target itself, use `--closed`: helloworld/main.py:lib ``` +## Export dependency graph + +Both `dependencies` and `dependents` goals have the `--format` option allowing you to export data in multiple formats. +Exporting information about the dependencies and dependents in JSON format will produce the +[adjacency list](https://en.wikipedia.org/wiki/Adjacency_list) of your dependency graph: + +```bash +$ pants dependencies --format=json \ + helloworld/greet/greeting.py \ + helloworld/translator/translator_test.py + +{ + "helloworld/greet/greeting.py:lib": [ + "//:reqs#setuptools", + "//:reqs#types-setuptools", + "helloworld/greet:translations", + "helloworld/translator/translator.py:lib" + ], + "helloworld/translator/translator_test.py:tests": [ + "//:reqs#pytest", + "helloworld/translator/translator.py:lib" + ] +} +``` + +This has various applications, and you could analyze, visualize, and process the data further. Sometimes, a fairly +straightforward `jq` query would suffice, but for anything more complex, it may make sense to write a small program +to process the exported graph. For instance, you could: + +* find tests with most transitive dependencies + +```bash +$ pants dependencies --filter-target-type=python_test --format=json :: \ + | jq -r 'to_entries[] | "\(.key)\t\(.value | length)"' \ + | sort -k2 +``` + +* find build targets that no one depends on + +```bash +$ pants dependents --filter-target-type=resource --format=json :: \ + jq -r 'to_entries[] | select(.value | length == 0)' +``` + +* find project source files that transitively lead to most tests + +```python +# depgraph.py +import json + +with open("data.json") as fh: + data = json.load(fh) + +for source, dependents in data.items(): + print(source, len([d for d in dependents if d.startswith("tests/")])) +``` + +```bash +$ pants dependents --transitive --format=json cheeseshop:: > data.json +$ python3 depgraph.py | sort -k2 +``` + +For more sophisticated graph querying, you may want to look into graph libraries such as [`networkx`](https://networkx.org/). +In a larger repository, it may make sense to track the health of the dependency graph and use the output +of the graph export to identify parts of your codebase that would benefit from refactoring. + ## `filedeps` - find which files a target owns `filedeps` outputs all of the files belonging to a target, based on its `sources` field. From ca5d65b333f4424eb89b271616416d9ef010ba84 Mon Sep 17 00:00:00 2001 From: Alexey Tereshenkov <50622389+AlexTereshenkov@users.noreply.github.com> Date: Mon, 19 Feb 2024 08:36:50 +0000 Subject: [PATCH 2/2] Respond to review --- docs/docs/using-pants/project-introspection.mdx | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/docs/using-pants/project-introspection.mdx b/docs/docs/using-pants/project-introspection.mdx index 5d228a6d231..b66323f6671 100644 --- a/docs/docs/using-pants/project-introspection.mdx +++ b/docs/docs/using-pants/project-introspection.mdx @@ -161,17 +161,17 @@ to process the exported graph. For instance, you could: ```bash $ pants dependencies --filter-target-type=python_test --format=json :: \ | jq -r 'to_entries[] | "\(.key)\t\(.value | length)"' \ - | sort -k2 + | sort -k2 -n ``` -* find build targets that no one depends on +* find resources that only a few other targets depend on ```bash $ pants dependents --filter-target-type=resource --format=json :: \ - jq -r 'to_entries[] | select(.value | length == 0)' + | jq -r 'to_entries[] | select(.value | length < 2)' ``` -* find project source files that transitively lead to most tests +* find files within the `src/` directory that transitively lead to the most tests ```python # depgraph.py @@ -185,8 +185,8 @@ for source, dependents in data.items(): ``` ```bash -$ pants dependents --transitive --format=json cheeseshop:: > data.json -$ python3 depgraph.py | sort -k2 +$ pants dependents --transitive --format=json src:: > data.json +$ python3 depgraph.py | sort -k2 -n ``` For more sophisticated graph querying, you may want to look into graph libraries such as [`networkx`](https://networkx.org/).