Skip to content

Add pathfinding extras module#3427

Open
mgberg wants to merge 3 commits into
RDFLib:mainfrom
mgberg:mg-pathfinding
Open

Add pathfinding extras module#3427
mgberg wants to merge 3 commits into
RDFLib:mainfrom
mgberg:mg-pathfinding

Conversation

@mgberg

@mgberg mgberg commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

Summary of changes

I created this utility for a few different use cases of my own and was curious if there was interest in including it in RDFLib for others to use as part of the extras submodule.

This PR adds rdflib.extras.pathfinding, which provides a utility for Dijkstra pathfinding over Graphs. The primary entry point is find_paths(). It is pure Python with no additional dependencies.

Motivation

SPARQL property paths answer reachability questions, but they cannot filter by or return the intermediate nodes along a path, and they cannot be used to capture per-step context (predicates, variable bindings, weights). Furthermore, SPARQL does not have a built-in capability to find shortest paths; RDFLib users who need this functionality today must extract a subgraph and run a separate algorithm in another library like NetworkX.

Several graph databases offer built-in pathfinding that goes beyond SPARQL property paths and SPARQL queries (e.g. Neo4j, Stardog's path queries, GraphDB path search service, AllegroGraph's SNA library), but RDFLib currently does not. find_paths brings comparable capabilities directly into RDFLib, combining the expressiveness of SPARQL graph patterns with Dijkstra-style pathfinding in a single call.

What it does

find_paths accepts flexible specifications for start nodes, end nodes, and the path (hop definition), each of which can be a fixed node, an iterable of nodes, a SPARQL WHERE-clause body, or None (unbound). It returns an ordered list of PathResult objects, each containing PathStep entries that carry the node, edge predicate, SPARQL variable bindings, and weighted length for every hop.

Weighted shortest-path support is provided via a heapq-based Dijkstra implementation. For unweighted paths this degrades to effectively breadth-first search. SPARQL patterns can be used at every layer (start selection, per-hop expansion, and end validation) and are compiled once and reused across the traversal. The traversal automatically starts from whichever side (start or end) is more constrained, reducing the number of paths to search.

The module docstring in rdflib/extras/pathfinding.py contains the full capability list, parameter reference, data type descriptions, and examples. Please refer to that for more detail. A test file is also included.

I've also added myself to the CONTRIBUTORS file, which I've forgotten to do in prior PRs.

Checklist

  • Checked that there aren't other open pull requests for
    the same change.
  • Checked that all tests and type checking passes.
  • If the change adds new features or changes the RDFLib public API:
  • Considered granting push permissions to the PR branch,
    so maintainers can fix minor issues and keep your PR up to date.

@nicholascar

Copy link
Copy Markdown
Member

This is some inspired stuff @mgberg! I've wanted something like this from time-to-time but haven't ever had the courage to tackle it!

I hope we can work out that Python 3.11 on Ubuntu fail case and get this merged in. Given that it's a single variant failure, I don't think it will be a serious failure!

@mgberg

mgberg commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

I hope we can work out that Python 3.11 on Ubuntu fail case and get this merged in. Given that it's a single variant failure, I don't think it will be a serious failure!

I looked at the log for that job and it said

ROOT: tox-gh-actions won't override envlist because envlist is explicitly given via TOXENV or -e option
ROOT: HandledError| provided environments not found in configuration file:
py311-extensive-doc
...
task: Failed to run task "gha:validate": task: Failed to run task "tox": exit status 254

I'm not sure why that would be the case, I'm not familiar enough with GitHub actions to know what the issue is here.

@gjhiggins

Copy link
Copy Markdown
Contributor

I hope we can work out that Python 3.11 on Ubuntu fail case and get this merged in. Given that it's a single variant failure, I don't think it will be a serious failure!

I seriously doubt this singleton validation check failure is in any way related to the code in this PR. Github Actions appears to have been consistently been failing validation on Ubuntu/Py3.11 for over 6 weeks now, even on straightforward dependency bumps, as evidenced by the failing checks reported in

#3421 (Mar 23)

and

#3428 (last week)

Cheers,

Graham

@gjhiggins

Copy link
Copy Markdown
Contributor

I hope we can work out that Python 3.11 on Ubuntu fail case and get this merged in. Given that it's a single variant failure, I don't think it will be a serious failure!

I seriously doubt this singleton validation check failure is in any way related to the code in this PR.

I can now confirm that the Python 3.11 on Ubuntu fail case was caused by an acquired infeclicity in RDFLib's own github actions and is completely unrelated to this PR.

Nice job @mgberg, btw.

Cheers,

Graham

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants