Skip to content

Commit bbecb9d

Browse files
committed
feat: enhance Playwright MCP configuration and update documentation
- Added new configuration files for Playwright MCP in both host and Docker environments. - Updated `.env.example` with production RDF graph details and improved SPARQL query instructions. - Revised AGENTS.md and CLAUDE.md to clarify Playwright MCP usage and configuration options. - Enhanced documentation on Oxigraph named graphs and their querying conventions in PROJECT.md and SKILL.md. - Removed deprecated Docker Compose file and adjusted devcontainer setup for improved integration with Playwright MCP.
1 parent 1b8e997 commit bbecb9d

28 files changed

Lines changed: 330 additions & 104 deletions

File tree

.agents/PROJECT.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,9 @@ When wiring data or running query skills, use `openpulse.epfl.ch`. When citing t
4444
### Oxigraph (SPARQL) — RDF metadata
4545

4646
- **Endpoint:** `:7502`, behind a Caddy proxy that terminates HTTP-Basic auth (`/query` for reads, `/update` for writes)
47-
- **Contents:** ~300k triples across multiple named graphs (e.g. `http://open-pulse/repos`, `http://open-pulse/metadata`)
48-
- **Use it for:** Structured metadata, vocabulary/ontology queries, anything that benefits from `SELECT … WHERE { GRAPH ?g { … } }`
47+
- **Contents:** ~2.45M triples in the current production snapshot (`https://open-pulse.epfl.ch/graph/2026-05/hybrid`), plus utility graphs (`_backup/…`, `_links/identity`) and in-progress snapshots (`2026-06/hybrid`, …). **Default graph mode** — plain `{ ?s ?p ?o }` without a `GRAPH` clause — resolves to that production snapshot. Use explicit `GRAPH <…>` to pin a snapshot or reach non-default graphs. See `query-sparql` skill.
48+
- **Named-graph convention:** production snapshots live at `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid`; pipeline `sparql_upload` promotes the current month into both the named graph and the default graph. Inventory: `op-collections stats``sparql.named_graphs`.
49+
- **Use it for:** Structured metadata, vocabulary/ontology queries, repo stars/licenses/languages, contributions, ORCID↔GitHub bridges, scholarly articles
4950
- **Skill:** `query-sparql` (SELECT/ASK/CONSTRUCT/DESCRIBE). Updates are intentionally not supported by the skill — use `curl` explicitly if you need to mutate.
5051

5152
### OpenSearch — search & enriched indices
@@ -162,4 +163,4 @@ If a change makes one of these journeys harder (e.g. couples the design system t
162163
- **Data store query skills**: `.agents/skills/query-{neo4j,sparql,opensearch}/SKILL.md`
163164
- **CHAOSS health metrics**: `.agents/skills/query-chaoss/SKILL.md` (featured dashboard slugs above)
164165
- **Publishing**: README → *Publishing to GitHub Pages*
165-
- **Devcontainer**: `.devcontainer/`
166+
- **Devcontainer**: `.devcontainer/` (compose + images in `tools/image/docker/`)

.agents/skills/op-collections/SKILL.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,20 @@ Output is JSON (pretty-printed); `export` always streams the raw body so you can
7979

8080
Each row payload includes `db_path`, `table`, and `columns` so you can see the schema before filtering. `cstats` exposes `search.columns` (what `--q` matches) and `search.examples`.
8181

82+
## Oxigraph named graphs (`stats``sparql`)
83+
84+
The `sparql` block lists every named graph Oxigraph holds — authoritative sizes before writing SPARQL:
85+
86+
```json
87+
"named_graphs": [
88+
{ "uri": "https://open-pulse.epfl.ch/graph/2026-05/hybrid", "triples": 2453125 },
89+
{ "uri": "https://open-pulse.epfl.ch/graph/2026-06/hybrid", "triples": 328691 },
90+
91+
]
92+
```
93+
94+
**Convention:** production data lives in `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid` and is also exposed as Oxigraph's **default graph** — plain `{ ?s ?p ?o }` queries work without a `GRAPH` clause. Use explicit `GRAPH <…>` to pin a snapshot or reach utility graphs (`_backup/…`, `_links/identity`). Full modes and gotchas: `query-sparql` skill.
95+
8296
## Notes
8397

8498
- This is a **separate store** from the three query-* skills: the collections are the hub's curated DuckDB indices, not Neo4j/SPARQL/OpenSearch. Use `stats` to see all four side by side (`sparql`, neo4j, `opensearch`, `duckdb` blocks).

.agents/skills/op-extractor/SKILL.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,11 +81,11 @@ quest:
8181
skip_existing: true # don't re-process repos already done
8282
max_repos: 0 # 0 = no cap
8383
neo4j_upload: { enabled: false } # loads crawler graph → Neo4j (input_dir/input_filename)
84-
sparql_upload: { enabled: false } # uploads extracted RDF → Oxigraph
84+
sparql_upload: { enabled: false } # uploads extracted RDF → Oxigraph named graph (…/graph/{YYYY-MM}/hybrid)
8585
apply_grimoire_projects: { enabled: false }
8686
```
8787
88-
To run a GME-only pass: enable just `metadata_extractor` (point `input_dir` at a crawl's output), then usually `sparql_upload` to land the triples. A full ingest enables `crawler` → `metadata_extractor` → `neo4j_upload` + `sparql_upload`.
88+
To run a GME-only pass: enable just `metadata_extractor` (point `input_dir` at a crawl's output), then usually `sparql_upload` to land the triples into the Oxigraph named graph for that snapshot (`https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid`). A full ingest enables `crawler` → `metadata_extractor` → `neo4j_upload` + `sparql_upload`. After upload, confirm the graph size via `op-collections stats` → `sparql.named_graphs`.
8989

9090
## Run status fields
9191

.agents/skills/query-chaoss/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,7 @@ A **project metric** adds `repo_count`, `truncated`, `cached_at`, and an `aggreg
211211

212212
## Live state (verified 2026-06-10)
213213

214-
- The hub serves the **2026-05 snapshot**. The newest signals (`test_coverage`, `release_frequency`) and the issue/PR-based metrics (`first_response`, `cr_*`, `issues_*`) read `"—"` for most repos until a fresh re-extraction lands — **expect them sparse**.
214+
- The hub serves the **2026-05 snapshot**. SPARQL traces in `--include traces` query `GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid>` (see `query-sparql` named-graph convention). The newest signals (`test_coverage`, `release_frequency`) and the issue/PR-based metrics (`first_response`, `cr_*`, `issues_*`) read `"—"` for most repos until a fresh re-extraction lands — **expect them sparse**.
215215
- Repos are **GitHub-only**: `repo <owner> <repo>``/repositories/github.com/...`.
216216
- Projects are discipline/topic buckets of repos. **The set and count change over time, so always read it from `projects` — never hardcode a number.** At time of writing the largest are `info-eng` (~109 repos), `bioeng` (~95), `stats` (~63), with domain-relevant ones like `protein_ai_ecosystem` (~26), `bio` (~42), `chem` (~10). Use the exact `project` slug returned by `projects` (e.g. `protein_ai_ecosystem`, not `protein-ai`). `project-repos <project>` returns the project header plus both a `metrics[]` summary and a `repositories[]` list.
217217
- A browsable UI to explore first: `https://openpulse.epfl.ch/chaoss` (same auth).

.agents/skills/query-neo4j/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ Neo4j, never OpenSearch.
104104
- **Owner→repos for several orgs at once**: `MATCH (o:Org)-[:OWNS]->(r:Repo) WHERE o.login IN $logins RETURN o.login, r.full_name` — the basis for org-scoped catalogs.
105105
- **PR/issue/review/comment metrics**: always come from the edge types above. Count `DISTINCT u` for "people" and `count(*)` for "events"; there is no event date to bucket by.
106106
- **`DEPENDS_ON` is large** (~259k). Always scope it to a seed set (`WHERE r.full_name IN $urls`) and add `LIMIT`, or it returns the whole ecosystem.
107-
- **Affiliations**: `(:User)-[:AFFILIATED_WITH]->(:RorOrg)` mirrors the SPARQL `org:hasMembership` data; institutions are ROR-identified in both stores.
107+
- **Affiliations**: `(:User)-[:AFFILIATED_WITH]->(:RorOrg)` mirrors the SPARQL `org:hasMembership` data (default graph or `GRAPH <…/graph/{YYYY-MM}/hybrid>`); institutions are ROR-identified in both stores. See `query-sparql` for default vs named-graph modes.
108108

109109
## Conventions
110110

.agents/skills/query-opensearch/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ when matching.
8282
| `repo_name` | clone URL + `.git` (keyword; use directly in `terms`) |
8383
| `author_uuid` / `Author_uuid` | stable author identity (use for `cardinality`/`terms`) |
8484
| `author_name` | display name |
85-
| `author_org_name` | **almost always `"Unknown"`** — affiliation NOT resolved here; get orgs from SPARQL/Neo4j instead |
85+
| `author_org_name` | **almost always `"Unknown"`** — affiliation NOT resolved here; get orgs from SPARQL (default graph or named graph — see `query-sparql`) or Neo4j instead |
8686
| `grimoire_creation_date` | canonical commit timestamp (use for `date_histogram`, min/max) |
8787
| `author_date`, `commit_date` | raw git dates |
8888
| `lines_added`, `lines_removed`, `lines_changed`, `files` | churn (`sum`-able) |

.agents/skills/query-sparql/SKILL.md

Lines changed: 47 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -34,17 +34,52 @@ node .agents/skills/query-sparql/query.mjs -f query.rq
3434

3535
For `SELECT` the script flattens the SPARQL JSON Results envelope to a plain `[{var: value}, ...]` array. For `ASK`, `CONSTRUCT`, `DESCRIBE`, or any non-`json` accept, the response is passed through.
3636

37-
## Live graph state (verified 2026-06-05)
37+
## Default graph vs named graphs (verified 2026-06-10)
3838

39-
The data you want is in **one big named graph** — always wrap patterns in it:
39+
Oxigraph holds production RDF in **named graphs**, but the hub also configures a **default graph** so plain SPARQL (no `GRAPH` clause) works.
4040

41+
### Two query modes
42+
43+
| Mode | Syntax | When to use |
44+
|---|---|---|
45+
| **Default graph** | `{ ?s ?p ?o }` — no `GRAPH` wrapper | Most ad-hoc queries. Oxigraph resolves this to the **current production snapshot** (~2.45M triples today, same data as `…/graph/2026-05/hybrid`). |
46+
| **Named graph** | `GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid> { … }` | Pin a specific snapshot, query utility graphs, or compare graphs side by side. Required for `_backup/…`, `_links/identity`, or in-progress `2026-06/hybrid`. |
47+
48+
```sparql
49+
# Default graph mode — fine for everyday repo/metadata lookups
50+
SELECT ?name WHERE {
51+
<https://github.com/biopython/biopython> schema:name ?name .
52+
}
53+
54+
# Named graph mode — pin a snapshot or reach non-default graphs
55+
SELECT ?name WHERE {
56+
GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid> {
57+
<https://github.com/biopython/biopython> schema:name ?name .
58+
}
59+
}
4160
```
42-
GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid> { ... } # ~2.12M triples
43-
```
4461

45-
Other graphs: a small per-study `…/graph/authors/protein-ai` (~1.1k), plus a
46-
default (unnamed) graph. Querying without the `GRAPH` wrapper across everything
47-
is slow and mixes studies — scope to the hybrid graph.
62+
Default mode does **not** union every named graph — backups and in-progress snapshots are invisible unless you name them explicitly.
63+
64+
### Named-graph IRI pattern
65+
66+
| Kind | Pattern | Example |
67+
|---|---|---|
68+
| **Production snapshot** | `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid` | `…/graph/2026-05/hybrid` |
69+
| **Utility / backup** | `https://open-pulse.epfl.ch/graph/_…` | `…/_backup/2026-05-hybrid-prenorm`, `…/_links/identity` |
70+
71+
Pipeline `sparql_upload` (op-extractor) lands triples in the named graph for that month; the hub also promotes the current snapshot into the default graph. CHAOSS SPARQL traces may use either form.
72+
73+
### Current named graphs (live)
74+
75+
| Named graph | Triples | Role |
76+
|---|---|---|
77+
| `https://open-pulse.epfl.ch/graph/2026-05/hybrid` | ~2.45M | **Current production snapshot** — also what default-graph queries see |
78+
| `https://open-pulse.epfl.ch/graph/_backup/2026-05-hybrid-prenorm` | ~2.12M | Pre-normalisation backup — named graph only |
79+
| `https://open-pulse.epfl.ch/graph/2026-06/hybrid` | ~329k | In-progress next snapshot — named graph only |
80+
| `https://open-pulse.epfl.ch/graph/_links/identity` | ~204 | Cross-store identity links — named graph only |
81+
82+
Refresh sizes: `python .agents/skills/op-collections/query.py stats``sparql.named_graphs`, or the inventory query below.
4883

4984
## Prefixes used in this graph
5085

@@ -94,10 +129,11 @@ institution is `<https://ror.org/…>`. Match the full URL literal.
94129

95130
| Goal | SPARQL |
96131
|---|---|
97-
| Total triple count | `SELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o }` |
98132
| Named graphs + sizes | `SELECT ?g (COUNT(*) AS ?n) WHERE { GRAPH ?g { ?s ?p ?o } } GROUP BY ?g ORDER BY DESC(?n)` |
99-
| Predicates on a repo | `SELECT DISTINCT ?p WHERE { GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid> { <https://github.com/biopython/biopython> ?p ?o } }` |
100-
| Stars/forks for repos | `… { VALUES ?r { <…/repo1> <…/repo2> } ?r op:githubRepoStars ?s ; op:githubRepoForks ?f }` |
133+
| Triple count (default graph) | `SELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o }` |
134+
| Triple count (named graph) | `SELECT (COUNT(*) AS ?n) WHERE { GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid> { ?s ?p ?o } }` |
135+
| Predicates on a repo | `SELECT DISTINCT ?p WHERE { <https://github.com/biopython/biopython> ?p ?o }` |
136+
| Stars/forks for repos | `{ VALUES ?r { <…/repo1> <…/repo2> } ?r op:githubRepoStars ?s ; op:githubRepoForks ?f }` |
101137

102138
## Gotchas learned the hard way
103139

@@ -107,7 +143,7 @@ institution is `<https://ror.org/…>`. Match the full URL literal.
107143

108144
## Conventions
109145

110-
- Always include `LIMIT` on exploratory queries, and wrap in the hybrid `GRAPH`.
146+
- Always include `LIMIT` on exploratory queries. **Default graph mode** (`{ … }` without `GRAPH`) is fine for the current production snapshot; use an explicit `GRAPH <https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid>` when you need a specific snapshot or a non-default graph.
111147
- Updates require `SPARQL_AUTH` admin role and are destructive — never run them unless the user explicitly asks. Use curl, not these scripts.
112148
- Oxigraph default response is SPARQL XML; the scripts always set `Accept` explicitly.
113149
- A 504 from the proxy means the query timed out — reduce the result set, tighten the pattern, or switch to fetch-and-join.

.claude/PROJECT.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,9 @@ When wiring data or running query skills, use `openpulse.epfl.ch`. When citing t
4444
### Oxigraph (SPARQL) — RDF metadata
4545

4646
- **Endpoint:** `:7502`, behind a Caddy proxy that terminates HTTP-Basic auth (`/query` for reads, `/update` for writes)
47-
- **Contents:** ~300k triples across multiple named graphs (e.g. `http://open-pulse/repos`, `http://open-pulse/metadata`)
48-
- **Use it for:** Structured metadata, vocabulary/ontology queries, anything that benefits from `SELECT … WHERE { GRAPH ?g { … } }`
47+
- **Contents:** ~2.45M triples in the current production snapshot (`https://open-pulse.epfl.ch/graph/2026-05/hybrid`), plus utility graphs (`_backup/…`, `_links/identity`) and in-progress snapshots (`2026-06/hybrid`, …). **Default graph mode** — plain `{ ?s ?p ?o }` without a `GRAPH` clause — resolves to that production snapshot. Use explicit `GRAPH <…>` to pin a snapshot or reach non-default graphs. See `query-sparql` skill.
48+
- **Named-graph convention:** production snapshots live at `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid`; pipeline `sparql_upload` promotes the current month into both the named graph and the default graph. Inventory: `op-collections stats``sparql.named_graphs`.
49+
- **Use it for:** Structured metadata, vocabulary/ontology queries, repo stars/licenses/languages, contributions, ORCID↔GitHub bridges, scholarly articles
4950
- **Skill:** `query-sparql` (SELECT/ASK/CONSTRUCT/DESCRIBE). Updates are intentionally not supported by the skill — use `curl` explicitly if you need to mutate.
5051

5152
### OpenSearch — search & enriched indices
@@ -162,4 +163,4 @@ If a change makes one of these journeys harder (e.g. couples the design system t
162163
- **Data store query skills**: `.claude/skills/query-{neo4j,sparql,opensearch}/SKILL.md`
163164
- **CHAOSS health metrics**: `.claude/skills/query-chaoss/SKILL.md` (featured dashboard slugs above)
164165
- **Publishing**: README → *Publishing to GitHub Pages*
165-
- **Devcontainer**: `.devcontainer/`
166+
- **Devcontainer**: `.devcontainer/` (compose + images in `tools/image/docker/`)

.claude/skills/op-collections/SKILL.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,20 @@ Output is JSON (pretty-printed); `export` always streams the raw body so you can
7979

8080
Each row payload includes `db_path`, `table`, and `columns` so you can see the schema before filtering. `cstats` exposes `search.columns` (what `--q` matches) and `search.examples`.
8181

82+
## Oxigraph named graphs (`stats``sparql`)
83+
84+
The `sparql` block lists every named graph Oxigraph holds — authoritative sizes before writing SPARQL:
85+
86+
```json
87+
"named_graphs": [
88+
{ "uri": "https://open-pulse.epfl.ch/graph/2026-05/hybrid", "triples": 2453125 },
89+
{ "uri": "https://open-pulse.epfl.ch/graph/2026-06/hybrid", "triples": 328691 },
90+
91+
]
92+
```
93+
94+
**Convention:** production data lives in `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid` and is also exposed as Oxigraph's **default graph** — plain `{ ?s ?p ?o }` queries work without a `GRAPH` clause. Use explicit `GRAPH <…>` to pin a snapshot or reach utility graphs (`_backup/…`, `_links/identity`). Full modes and gotchas: `query-sparql` skill.
95+
8296
## Notes
8397

8498
- This is a **separate store** from the three query-* skills: the collections are the hub's curated DuckDB indices, not Neo4j/SPARQL/OpenSearch. Use `stats` to see all four side by side (`sparql`, neo4j, `opensearch`, `duckdb` blocks).

.claude/skills/op-extractor/SKILL.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,11 +81,11 @@ quest:
8181
skip_existing: true # don't re-process repos already done
8282
max_repos: 0 # 0 = no cap
8383
neo4j_upload: { enabled: false } # loads crawler graph → Neo4j (input_dir/input_filename)
84-
sparql_upload: { enabled: false } # uploads extracted RDF → Oxigraph
84+
sparql_upload: { enabled: false } # uploads extracted RDF → Oxigraph named graph (…/graph/{YYYY-MM}/hybrid)
8585
apply_grimoire_projects: { enabled: false }
8686
```
8787
88-
To run a GME-only pass: enable just `metadata_extractor` (point `input_dir` at a crawl's output), then usually `sparql_upload` to land the triples. A full ingest enables `crawler` → `metadata_extractor` → `neo4j_upload` + `sparql_upload`.
88+
To run a GME-only pass: enable just `metadata_extractor` (point `input_dir` at a crawl's output), then usually `sparql_upload` to land the triples into the Oxigraph named graph for that snapshot (`https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid`). A full ingest enables `crawler` → `metadata_extractor` → `neo4j_upload` + `sparql_upload`. After upload, confirm the graph size via `op-collections stats` → `sparql.named_graphs`.
8989

9090
## Run status fields
9191

0 commit comments

Comments
 (0)