Skip to content

Commit f9997ba

Browse files
committed
docs: move Recipes and Graph target docs to docs folder
1 parent 22fafea commit f9997ba

3 files changed

Lines changed: 150 additions & 149 deletions

File tree

README.md

Lines changed: 5 additions & 149 deletions
Original file line numberDiff line numberDiff line change
@@ -24,14 +24,14 @@ Python library for [httpx](https://www.python-httpx.org/)-based SPARQL Query and
2424
## Installation
2525
`sparqlx` is a [PEP 621](https://peps.python.org/pep-0621/)-compliant package and available on PyPI.
2626

27-
```shell
28-
pip install sparqlx
29-
```
3027

28+
## Docs
3129

32-
## Usage
30+
- [RDFLib Integration](docs/rdflib_integration.md)
31+
- [Recipes](docs/recipes.md)
32+
- [SPARQL 1.2 Protocol Client Implementation](docs/sparql_protocol_implementation.md)
3333

34-
> Also see the [Recipes](#Recipes) section below.
34+
## Usage
3535

3636
### SPARQLWrapper.query
3737

@@ -316,147 +316,3 @@ This will run the specified update operations asynchronously with an internally
316316
},
317317
]
318318
```
319-
320-
321-
### `rdflib.Graph` Targets
322-
323-
Apart from targeting remote SPARQL query and update endpoints, `SPARQLWrapper` also supports running SPARQL operations against `rdflib.Graph` objects.
324-
325-
```python
326-
import httpx
327-
from rdflib import Graph
328-
from sparqlx import SPARQLWrapper
329-
330-
query = "select ?x ?y where {values (?x ?y) {(1 2) (3 4)}}"
331-
sparql_wrapper = SPARQLWrapper(sparql_endpoint=Graph())
332-
333-
result: httpx.Response = sparql_wrapper.query(query)
334-
```
335-
336-
The feature essentially treats `rdflib.Graph` as a SPARQL endpoint i.e. SPARQL operations are delegated to an in-memory graph object using a custom transport that builds and returns an `httpx.Response`.
337-
338-
> Note that response streaming is currently not supported for `rdflib.Graph` targets.
339-
340-
#### RDF Source Constructor
341-
342-
The `SPARQLWrapper` class features an alternative constructor, `sparqlx.SPARQLWrapper.from_rdf_source`, that, given a `sparqlx.types.RDFParseSource`, parses the RDF source into an `rdflib.Graph` and returns a `SPARQLWrapper` instance targeting that graph object.
343-
kwargs are forwarded to the rdflib.Graph.parse methods.
344-
345-
```python
346-
from sparqlx import SPARQLWrapper
347-
348-
query = """
349-
select distinct ?s
350-
where {
351-
?s ?p ?o .
352-
filter (contains(str(?s), 'Spacetime'))
353-
}
354-
"""
355-
356-
wrapper = SPARQLWrapper.from_rdf_source(
357-
rdf_source="https://cidoc-crm.org/rdfs/7.1.3/CIDOC_CRM_v7.1.3.rdf"
358-
)
359-
360-
result = wrapper.query(
361-
query=query,
362-
convert=True,
363-
)
364-
365-
print(result) # [{'s': URIRef('http://www.cidoc-crm.org/cidoc-crm/E92_Spacetime_Volume')}]
366-
```
367-
368-
The `sparqlx.types.RDFParseSource` is the exact type expected by the `source` parameter of `rdflib.Graph.parse`.
369-
370-
> `sparqlx.SPARQLWrapper.from_rdf_source` creates an `rdflib.Dataset` internally in order to support RDF Quad sources.
371-
372-
373-
## Recipes
374-
375-
The following is a loose collection of `sparqlx` recipes.
376-
377-
Some of those recipes might become `sparqlx` features in the future.
378-
379-
380-
### JSON Response Streaming
381-
382-
The example below uses [ijson](https://github.com/ICRAR/ijson) to process a `sparqlx.SPARQLWrapper.query_stream` byte stream.
383-
384-
Note that `ijson` currently requires an adapter for Iterator input, see issue [#58](https://github.com/ICRAR/ijson/issues/58#issuecomment-917655522).
385-
386-
```python
387-
from collections.abc import Iterator
388-
389-
import ijson
390-
from sparqlx import SPARQLWrapper
391-
392-
393-
qlever_wikidata_endpoint = "https://qlever.cs.uni-freiburg.de/api//wikidata"
394-
sparql_wrapper = SPARQLWrapper(sparql_endpoint=qlever_wikidata_endpoint)
395-
396-
json_result_stream: Iterator[bytes] = sparql_wrapper.query_stream(
397-
query="select ?s ?p ?o where {?s ?p ?o} limit 100000"
398-
)
399-
400-
class IJSONIteratorAdapter:
401-
def __init__(self, byte_stream: Iterator[bytes]):
402-
self.byte_stream = byte_stream
403-
404-
def read(self, n):
405-
if n == 0:
406-
return b""
407-
return next(self.byte_stream, b"")
408-
409-
adapter = IJSONIteratorAdapter(byte_stream=json_result_stream)
410-
json_result_iterator: Iterator[dict] = ijson.items(adapter, "results.bindings.item")
411-
412-
print(next(json_result_iterator))
413-
```
414-
415-
The `json_result_iterator` generator yields Python dictionaries holding SPARQL JSON response bindings coming from a byte stream. Buffering and incremental parsing is done by `ijson`.
416-
417-
### Graph Response Streaming
418-
419-
The following example processes a stream of RDF graph data coming from a SPARQL CONSTRUCT response.
420-
421-
It uses an Iterator chunking facility `ichunk` to implement a generator that yields sized sub-graphs from a streamed graph response.
422-
To avoid incremental RDF parsing and possibly skolemization, `ntriples` are requested with line-based streaming.
423-
424-
425-
```python
426-
from collections.abc import Iterator
427-
from itertools import chain, islice
428-
from typing import cast
429-
430-
import httpx
431-
from rdflib import Graph
432-
from sparqlx import SPARQLWrapper
433-
434-
435-
def ichunk[T](iterator: Iterator[T], size: int) -> Iterator[Iterator[T]]:
436-
_missing = object()
437-
chunk = islice(iterator, size)
438-
439-
if (first := next(chunk, _missing)) is _missing:
440-
return
441-
442-
yield chain[T]([cast(T, first)], chunk)
443-
yield from ichunk(iterator, size=size)
444-
445-
446-
releven_sparql_endpoint = "https://graphdb.r11.eu/repositories/RELEVEN"
447-
sparql_wrapper = SPARQLWrapper(sparql_endpoint=releven_sparql_endpoint)
448-
449-
graph_result_stream: Iterator[bytes] = sparql_wrapper.query_stream(
450-
query="construct {?s ?p ?o} where {?s ?p ?o} limit 100000",
451-
response_format="ntriples",
452-
streaming_method=httpx.Response.iter_lines,
453-
)
454-
455-
def graph_result_iterator(size: int = 1000) -> Iterator[Graph]:
456-
for chunk in ichunk(graph_result_stream, size=size):
457-
graph = Graph()
458-
for ntriple in chunk:
459-
graph.parse(data=ntriple, format="ntriples")
460-
461-
yield graph
462-
```

docs/rdflib_integration.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# RDFLib Integration
2+
3+
## SPARQL Result Conversion
4+
[todo]
5+
6+
## `rdflib.Graph` Targets
7+
8+
Apart from targeting remote SPARQL query and update endpoints, `SPARQLWrapper` also supports running SPARQL operations against `rdflib.Graph` objects.
9+
10+
```python
11+
import httpx
12+
from rdflib import Graph
13+
from sparqlx import SPARQLWrapper
14+
15+
query = "select ?x ?y where {values (?x ?y) {(1 2) (3 4)}}"
16+
sparql_wrapper = SPARQLWrapper(sparql_endpoint=Graph())
17+
18+
result: httpx.Response = sparql_wrapper.query(query)
19+
```
20+
21+
The feature essentially treats `rdflib.Graph` as a SPARQL endpoint i.e. SPARQL operations are delegated to an in-memory graph object using a custom transport that builds and returns an `httpx.Response`.
22+
23+
> Note that response streaming is currently not supported for `rdflib.Graph` targets.
24+
25+
### RDF Source Constructor
26+
27+
The `SPARQLWrapper` class features an alternative constructor, `sparqlx.SPARQLWrapper.from_rdf_source`, that, given a `sparqlx.types.RDFParseSource`, parses the RDF source into an `rdflib.Graph` and returns a `SPARQLWrapper` instance targeting that graph object.
28+
kwargs are forwarded to the rdflib.Graph.parse methods.
29+
30+
```python
31+
from sparqlx import SPARQLWrapper
32+
33+
query = """
34+
select distinct ?s
35+
where {
36+
?s ?p ?o .
37+
filter (contains(str(?s), 'Spacetime'))
38+
}
39+
"""
40+
41+
wrapper = SPARQLWrapper.from_rdf_source(
42+
rdf_source="https://cidoc-crm.org/rdfs/7.1.3/CIDOC_CRM_v7.1.3.rdf"
43+
)
44+
45+
result = wrapper.query(
46+
query=query,
47+
convert=True,
48+
)
49+
50+
print(result) # [{'s': URIRef('http://www.cidoc-crm.org/cidoc-crm/E92_Spacetime_Volume')}]
51+
```
52+
53+
The `sparqlx.types.RDFParseSource` is the exact type expected by the `source` parameter of `rdflib.Graph.parse`.
54+
55+
> `sparqlx.SPARQLWrapper.from_rdf_source` creates an `rdflib.Dataset` internally in order to support RDF Quad sources.

docs/recipes.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Recipes
2+
3+
The following is a loose collection of `sparqlx` recipes.
4+
5+
Some of those recipes might become `sparqlx` features in the future.
6+
7+
8+
## JSON Response Streaming
9+
10+
The example below uses [ijson](https://github.com/ICRAR/ijson) to process a `sparqlx.SPARQLWrapper.query_stream` byte stream.
11+
12+
Note that `ijson` currently requires an adapter for Iterator input, see issue [#58](https://github.com/ICRAR/ijson/issues/58#issuecomment-917655522).
13+
14+
```python
15+
from collections.abc import Iterator
16+
17+
import ijson
18+
from sparqlx import SPARQLWrapper
19+
20+
21+
qlever_wikidata_endpoint = "https://qlever.cs.uni-freiburg.de/api//wikidata"
22+
sparql_wrapper = SPARQLWrapper(sparql_endpoint=qlever_wikidata_endpoint)
23+
24+
json_result_stream: Iterator[bytes] = sparql_wrapper.query_stream(
25+
query="select ?s ?p ?o where {?s ?p ?o} limit 100000"
26+
)
27+
28+
class IJSONIteratorAdapter:
29+
def __init__(self, byte_stream: Iterator[bytes]):
30+
self.byte_stream = byte_stream
31+
32+
def read(self, n):
33+
if n == 0:
34+
return b""
35+
return next(self.byte_stream, b"")
36+
37+
adapter = IJSONIteratorAdapter(byte_stream=json_result_stream)
38+
json_result_iterator: Iterator[dict] = ijson.items(adapter, "results.bindings.item")
39+
40+
print(next(json_result_iterator))
41+
```
42+
43+
The `json_result_iterator` generator yields Python dictionaries holding SPARQL JSON response bindings coming from a byte stream. Buffering and incremental parsing is done by `ijson`.
44+
45+
## Graph Response Streaming
46+
47+
The following example processes a stream of RDF graph data coming from a SPARQL CONSTRUCT response.
48+
49+
It uses an Iterator chunking facility `ichunk` to implement a generator that yields sized sub-graphs from a streamed graph response.
50+
To avoid incremental RDF parsing and possibly skolemization, `ntriples` are requested with line-based streaming.
51+
52+
53+
```python
54+
from collections.abc import Iterator
55+
from itertools import chain, islice
56+
from typing import cast
57+
58+
import httpx
59+
from rdflib import Graph
60+
from sparqlx import SPARQLWrapper
61+
62+
63+
def ichunk[T](iterator: Iterator[T], size: int) -> Iterator[Iterator[T]]:
64+
_missing = object()
65+
chunk = islice(iterator, size)
66+
67+
if (first := next(chunk, _missing)) is _missing:
68+
return
69+
70+
yield chain[T]([cast(T, first)], chunk)
71+
yield from ichunk(iterator, size=size)
72+
73+
74+
releven_sparql_endpoint = "https://graphdb.r11.eu/repositories/RELEVEN"
75+
sparql_wrapper = SPARQLWrapper(sparql_endpoint=releven_sparql_endpoint)
76+
77+
graph_result_stream: Iterator[bytes] = sparql_wrapper.query_stream(
78+
query="construct {?s ?p ?o} where {?s ?p ?o} limit 100000",
79+
response_format="ntriples",
80+
streaming_method=httpx.Response.iter_lines,
81+
)
82+
83+
def graph_result_iterator(size: int = 1000) -> Iterator[Graph]:
84+
for chunk in ichunk(graph_result_stream, size=size):
85+
graph = Graph()
86+
for ntriple in chunk:
87+
graph.parse(data=ntriple, format="ntriples")
88+
89+
yield graph
90+
```

0 commit comments

Comments
 (0)