Add pathfinding extras module#3427
Conversation
|
This is some inspired stuff @mgberg! I've wanted something like this from time-to-time but haven't ever had the courage to tackle it! I hope we can work out that Python 3.11 on Ubuntu fail case and get this merged in. Given that it's a single variant failure, I don't think it will be a serious failure! |
I looked at the log for that job and it said I'm not sure why that would be the case, I'm not familiar enough with GitHub actions to know what the issue is here. |
I seriously doubt this singleton validation check failure is in any way related to the code in this PR. Github Actions appears to have been consistently been failing validation on Ubuntu/Py3.11 for over 6 weeks now, even on straightforward dependency bumps, as evidenced by the failing checks reported in #3421 (Mar 23) and #3428 (last week) Cheers, Graham |
I can now confirm that the Python 3.11 on Ubuntu fail case was caused by an acquired infeclicity in RDFLib's own github actions and is completely unrelated to this PR. Nice job @mgberg, btw. Cheers, Graham |
Summary of changes
I created this utility for a few different use cases of my own and was curious if there was interest in including it in RDFLib for others to use as part of the
extrassubmodule.This PR adds
rdflib.extras.pathfinding, which provides a utility for Dijkstra pathfinding overGraphs. The primary entry point isfind_paths(). It is pure Python with no additional dependencies.Motivation
SPARQL property paths answer reachability questions, but they cannot filter by or return the intermediate nodes along a path, and they cannot be used to capture per-step context (predicates, variable bindings, weights). Furthermore, SPARQL does not have a built-in capability to find shortest paths; RDFLib users who need this functionality today must extract a subgraph and run a separate algorithm in another library like NetworkX.
Several graph databases offer built-in pathfinding that goes beyond SPARQL property paths and SPARQL queries (e.g. Neo4j, Stardog's path queries, GraphDB path search service, AllegroGraph's SNA library), but RDFLib currently does not.
find_pathsbrings comparable capabilities directly into RDFLib, combining the expressiveness of SPARQL graph patterns with Dijkstra-style pathfinding in a single call.What it does
find_pathsaccepts flexible specifications for start nodes, end nodes, and the path (hop definition), each of which can be a fixed node, an iterable of nodes, a SPARQL WHERE-clause body, orNone(unbound). It returns an ordered list ofPathResultobjects, each containingPathStepentries that carry the node, edge predicate, SPARQL variable bindings, and weighted length for every hop.Weighted shortest-path support is provided via a
heapq-based Dijkstra implementation. For unweighted paths this degrades to effectively breadth-first search. SPARQL patterns can be used at every layer (start selection, per-hop expansion, and end validation) and are compiled once and reused across the traversal. The traversal automatically starts from whichever side (start or end) is more constrained, reducing the number of paths to search.The module docstring in
rdflib/extras/pathfinding.pycontains the full capability list, parameter reference, data type descriptions, and examples. Please refer to that for more detail. A test file is also included.I've also added myself to the
CONTRIBUTORSfile, which I've forgotten to do in prior PRs.Checklist
the same change.
so maintainers can fix minor issues and keep your PR up to date.