You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: enhance Playwright MCP configuration and update documentation
- Added new configuration files for Playwright MCP in both host and Docker environments.
- Updated `.env.example` with production RDF graph details and improved SPARQL query instructions.
- Revised AGENTS.md and CLAUDE.md to clarify Playwright MCP usage and configuration options.
- Enhanced documentation on Oxigraph named graphs and their querying conventions in PROJECT.md and SKILL.md.
- Removed deprecated Docker Compose file and adjusted devcontainer setup for improved integration with Playwright MCP.
Copy file name to clipboardExpand all lines: .agents/PROJECT.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,8 +44,9 @@ When wiring data or running query skills, use `openpulse.epfl.ch`. When citing t
44
44
### Oxigraph (SPARQL) — RDF metadata
45
45
46
46
-**Endpoint:**`:7502`, behind a Caddy proxy that terminates HTTP-Basic auth (`/query` for reads, `/update` for writes)
47
-
-**Contents:**~300k triples across multiple named graphs (e.g. `http://open-pulse/repos`, `http://open-pulse/metadata`)
48
-
-**Use it for:** Structured metadata, vocabulary/ontology queries, anything that benefits from `SELECT … WHERE { GRAPH ?g { … } }`
47
+
-**Contents:**~2.45M triples in the current production snapshot (`https://open-pulse.epfl.ch/graph/2026-05/hybrid`), plus utility graphs (`_backup/…`, `_links/identity`) and in-progress snapshots (`2026-06/hybrid`, …). **Default graph mode** — plain `{ ?s ?p ?o }` without a `GRAPH` clause — resolves to that production snapshot. Use explicit `GRAPH <…>` to pin a snapshot or reach non-default graphs. See `query-sparql` skill.
48
+
-**Named-graph convention:** production snapshots live at `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid`; pipeline `sparql_upload` promotes the current month into both the named graph and the default graph. Inventory: `op-collections stats` → `sparql.named_graphs`.
-**Skill:**`query-sparql` (SELECT/ASK/CONSTRUCT/DESCRIBE). Updates are intentionally not supported by the skill — use `curl` explicitly if you need to mutate.
50
51
51
52
### OpenSearch — search & enriched indices
@@ -162,4 +163,4 @@ If a change makes one of these journeys harder (e.g. couples the design system t
162
163
-**Data store query skills**: `.agents/skills/query-{neo4j,sparql,opensearch}/SKILL.md`
163
164
-**CHAOSS health metrics**: `.agents/skills/query-chaoss/SKILL.md` (featured dashboard slugs above)
164
165
-**Publishing**: README → *Publishing to GitHub Pages*
165
-
-**Devcontainer**: `.devcontainer/`
166
+
-**Devcontainer**: `.devcontainer/` (compose + images in `tools/image/docker/`)
Copy file name to clipboardExpand all lines: .agents/skills/op-collections/SKILL.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -79,6 +79,20 @@ Output is JSON (pretty-printed); `export` always streams the raw body so you can
79
79
80
80
Each row payload includes `db_path`, `table`, and `columns` so you can see the schema before filtering. `cstats` exposes `search.columns` (what `--q` matches) and `search.examples`.
81
81
82
+
## Oxigraph named graphs (`stats` → `sparql`)
83
+
84
+
The `sparql` block lists every named graph Oxigraph holds — authoritative sizes before writing SPARQL:
**Convention:** production data lives in `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid` and is also exposed as Oxigraph's **default graph** — plain `{ ?s ?p ?o }` queries work without a `GRAPH` clause. Use explicit `GRAPH <…>` to pin a snapshot or reach utility graphs (`_backup/…`, `_links/identity`). Full modes and gotchas: `query-sparql` skill.
95
+
82
96
## Notes
83
97
84
98
- This is a **separate store** from the three query-* skills: the collections are the hub's curated DuckDB indices, not Neo4j/SPARQL/OpenSearch. Use `stats` to see all four side by side (`sparql`, neo4j, `opensearch`, `duckdb` blocks).
To run a GME-only pass: enable just `metadata_extractor` (point `input_dir` at a crawl's output), then usually `sparql_upload` to land the triples. A full ingest enables `crawler` → `metadata_extractor` → `neo4j_upload` + `sparql_upload`.
88
+
To run a GME-only pass: enable just `metadata_extractor` (point `input_dir` at a crawl's output), then usually `sparql_upload` to land the triples into the Oxigraph named graph for that snapshot (`https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid`). A full ingest enables `crawler` → `metadata_extractor` → `neo4j_upload` + `sparql_upload`. After upload, confirm the graph size via `op-collections stats` → `sparql.named_graphs`.
Copy file name to clipboardExpand all lines: .agents/skills/query-chaoss/SKILL.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -211,7 +211,7 @@ A **project metric** adds `repo_count`, `truncated`, `cached_at`, and an `aggreg
211
211
212
212
## Live state (verified 2026-06-10)
213
213
214
-
- The hub serves the **2026-05 snapshot**. The newest signals (`test_coverage`, `release_frequency`) and the issue/PR-based metrics (`first_response`, `cr_*`, `issues_*`) read `"—"` for most repos until a fresh re-extraction lands — **expect them sparse**.
214
+
- The hub serves the **2026-05 snapshot**. SPARQL traces in `--include traces` query `GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid>` (see `query-sparql` named-graph convention). The newest signals (`test_coverage`, `release_frequency`) and the issue/PR-based metrics (`first_response`, `cr_*`, `issues_*`) read `"—"` for most repos until a fresh re-extraction lands — **expect them sparse**.
215
215
- Repos are **GitHub-only**: `repo <owner> <repo>` → `/repositories/github.com/...`.
216
216
- Projects are discipline/topic buckets of repos. **The set and count change over time, so always read it from `projects` — never hardcode a number.** At time of writing the largest are `info-eng` (~109 repos), `bioeng` (~95), `stats` (~63), with domain-relevant ones like `protein_ai_ecosystem` (~26), `bio` (~42), `chem` (~10). Use the exact `project` slug returned by `projects` (e.g. `protein_ai_ecosystem`, not `protein-ai`). `project-repos <project>` returns the project header plus both a `metrics[]` summary and a `repositories[]` list.
217
217
- A browsable UI to explore first: `https://openpulse.epfl.ch/chaoss` (same auth).
Copy file name to clipboardExpand all lines: .agents/skills/query-neo4j/SKILL.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -104,7 +104,7 @@ Neo4j, never OpenSearch.
104
104
-**Owner→repos for several orgs at once**: `MATCH (o:Org)-[:OWNS]->(r:Repo) WHERE o.login IN $logins RETURN o.login, r.full_name` — the basis for org-scoped catalogs.
105
105
-**PR/issue/review/comment metrics**: always come from the edge types above. Count `DISTINCT u` for "people" and `count(*)` for "events"; there is no event date to bucket by.
106
106
-**`DEPENDS_ON` is large** (~259k). Always scope it to a seed set (`WHERE r.full_name IN $urls`) and add `LIMIT`, or it returns the whole ecosystem.
107
-
-**Affiliations**: `(:User)-[:AFFILIATED_WITH]->(:RorOrg)` mirrors the SPARQL `org:hasMembership` data; institutions are ROR-identified in both stores.
107
+
-**Affiliations**: `(:User)-[:AFFILIATED_WITH]->(:RorOrg)` mirrors the SPARQL `org:hasMembership` data (default graph or `GRAPH <…/graph/{YYYY-MM}/hybrid>`); institutions are ROR-identified in both stores. See `query-sparql` for default vs named-graph modes.
Copy file name to clipboardExpand all lines: .agents/skills/query-opensearch/SKILL.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -82,7 +82,7 @@ when matching.
82
82
|`repo_name`| clone URL + `.git` (keyword; use directly in `terms`) |
83
83
|`author_uuid` / `Author_uuid`| stable author identity (use for `cardinality`/`terms`) |
84
84
|`author_name`| display name |
85
-
|`author_org_name`|**almost always `"Unknown"`** — affiliation NOT resolved here; get orgs from SPARQL/Neo4j instead |
85
+
|`author_org_name`|**almost always `"Unknown"`** — affiliation NOT resolved here; get orgs from SPARQL (default graph or named graph — see `query-sparql`) or Neo4j instead |
86
86
|`grimoire_creation_date`| canonical commit timestamp (use for `date_histogram`, min/max) |
For `SELECT` the script flattens the SPARQL JSON Results envelope to a plain `[{var: value}, ...]` array. For `ASK`, `CONSTRUCT`, `DESCRIBE`, or any non-`json` accept, the response is passed through.
36
36
37
-
## Live graph state (verified 2026-06-05)
37
+
## Default graph vs named graphs (verified 2026-06-10)
38
38
39
-
The data you want is in **one big named graph**— always wrap patterns in it:
39
+
Oxigraph holds production RDF in **named graphs**, but the hub also configures a **default graph**so plain SPARQL (no `GRAPH` clause) works.
40
40
41
+
### Two query modes
42
+
43
+
| Mode | Syntax | When to use |
44
+
|---|---|---|
45
+
|**Default graph**|`{ ?s ?p ?o }` — no `GRAPH` wrapper | Most ad-hoc queries. Oxigraph resolves this to the **current production snapshot** (~2.45M triples today, same data as `…/graph/2026-05/hybrid`). |
46
+
|**Named graph**|`GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid> { … }`| Pin a specific snapshot, query utility graphs, or compare graphs side by side. Required for `_backup/…`, `_links/identity`, or in-progress `2026-06/hybrid`. |
47
+
48
+
```sparql
49
+
# Default graph mode — fine for everyday repo/metadata lookups
Pipeline `sparql_upload` (op-extractor) lands triples in the named graph for that month; the hub also promotes the current snapshot into the default graph. CHAOSS SPARQL traces may use either form.
72
+
73
+
### Current named graphs (live)
74
+
75
+
| Named graph | Triples | Role |
76
+
|---|---|---|
77
+
|`https://open-pulse.epfl.ch/graph/2026-05/hybrid`|~2.45M |**Current production snapshot** — also what default-graph queries see |
78
+
|`https://open-pulse.epfl.ch/graph/_backup/2026-05-hybrid-prenorm`|~2.12M | Pre-normalisation backup — named graph only |
79
+
|`https://open-pulse.epfl.ch/graph/2026-06/hybrid`|~329k | In-progress next snapshot — named graph only |
80
+
|`https://open-pulse.epfl.ch/graph/_links/identity`|~204 | Cross-store identity links — named graph only |
81
+
82
+
Refresh sizes: `python .agents/skills/op-collections/query.py stats` → `sparql.named_graphs`, or the inventory query below.
48
83
49
84
## Prefixes used in this graph
50
85
@@ -94,10 +129,11 @@ institution is `<https://ror.org/…>`. Match the full URL literal.
94
129
95
130
| Goal | SPARQL |
96
131
|---|---|
97
-
| Total triple count |`SELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o }`|
98
132
| Named graphs + sizes |`SELECT ?g (COUNT(*) AS ?n) WHERE { GRAPH ?g { ?s ?p ?o } } GROUP BY ?g ORDER BY DESC(?n)`|
99
-
| Predicates on a repo |`SELECT DISTINCT ?p WHERE { GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid> { <https://github.com/biopython/biopython> ?p ?o } }`|
@@ -107,7 +143,7 @@ institution is `<https://ror.org/…>`. Match the full URL literal.
107
143
108
144
## Conventions
109
145
110
-
- Always include `LIMIT` on exploratory queries, and wrap in the hybrid `GRAPH`.
146
+
- Always include `LIMIT` on exploratory queries. **Default graph mode** (`{ … }` without `GRAPH`) is fine for the current production snapshot; use an explicit `GRAPH <https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid>` when you need a specific snapshot or a non-default graph.
111
147
- Updates require `SPARQL_AUTH` admin role and are destructive — never run them unless the user explicitly asks. Use curl, not these scripts.
112
148
- Oxigraph default response is SPARQL XML; the scripts always set `Accept` explicitly.
113
149
- A 504 from the proxy means the query timed out — reduce the result set, tighten the pattern, or switch to fetch-and-join.
Copy file name to clipboardExpand all lines: .claude/PROJECT.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,8 +44,9 @@ When wiring data or running query skills, use `openpulse.epfl.ch`. When citing t
44
44
### Oxigraph (SPARQL) — RDF metadata
45
45
46
46
-**Endpoint:**`:7502`, behind a Caddy proxy that terminates HTTP-Basic auth (`/query` for reads, `/update` for writes)
47
-
-**Contents:**~300k triples across multiple named graphs (e.g. `http://open-pulse/repos`, `http://open-pulse/metadata`)
48
-
-**Use it for:** Structured metadata, vocabulary/ontology queries, anything that benefits from `SELECT … WHERE { GRAPH ?g { … } }`
47
+
-**Contents:**~2.45M triples in the current production snapshot (`https://open-pulse.epfl.ch/graph/2026-05/hybrid`), plus utility graphs (`_backup/…`, `_links/identity`) and in-progress snapshots (`2026-06/hybrid`, …). **Default graph mode** — plain `{ ?s ?p ?o }` without a `GRAPH` clause — resolves to that production snapshot. Use explicit `GRAPH <…>` to pin a snapshot or reach non-default graphs. See `query-sparql` skill.
48
+
-**Named-graph convention:** production snapshots live at `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid`; pipeline `sparql_upload` promotes the current month into both the named graph and the default graph. Inventory: `op-collections stats` → `sparql.named_graphs`.
-**Skill:**`query-sparql` (SELECT/ASK/CONSTRUCT/DESCRIBE). Updates are intentionally not supported by the skill — use `curl` explicitly if you need to mutate.
50
51
51
52
### OpenSearch — search & enriched indices
@@ -162,4 +163,4 @@ If a change makes one of these journeys harder (e.g. couples the design system t
162
163
-**Data store query skills**: `.claude/skills/query-{neo4j,sparql,opensearch}/SKILL.md`
163
164
-**CHAOSS health metrics**: `.claude/skills/query-chaoss/SKILL.md` (featured dashboard slugs above)
164
165
-**Publishing**: README → *Publishing to GitHub Pages*
165
-
-**Devcontainer**: `.devcontainer/`
166
+
-**Devcontainer**: `.devcontainer/` (compose + images in `tools/image/docker/`)
Copy file name to clipboardExpand all lines: .claude/skills/op-collections/SKILL.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -79,6 +79,20 @@ Output is JSON (pretty-printed); `export` always streams the raw body so you can
79
79
80
80
Each row payload includes `db_path`, `table`, and `columns` so you can see the schema before filtering. `cstats` exposes `search.columns` (what `--q` matches) and `search.examples`.
81
81
82
+
## Oxigraph named graphs (`stats` → `sparql`)
83
+
84
+
The `sparql` block lists every named graph Oxigraph holds — authoritative sizes before writing SPARQL:
**Convention:** production data lives in `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid` and is also exposed as Oxigraph's **default graph** — plain `{ ?s ?p ?o }` queries work without a `GRAPH` clause. Use explicit `GRAPH <…>` to pin a snapshot or reach utility graphs (`_backup/…`, `_links/identity`). Full modes and gotchas: `query-sparql` skill.
95
+
82
96
## Notes
83
97
84
98
- This is a **separate store** from the three query-* skills: the collections are the hub's curated DuckDB indices, not Neo4j/SPARQL/OpenSearch. Use `stats` to see all four side by side (`sparql`, neo4j, `opensearch`, `duckdb` blocks).
To run a GME-only pass: enable just `metadata_extractor` (point `input_dir` at a crawl's output), then usually `sparql_upload` to land the triples. A full ingest enables `crawler` → `metadata_extractor` → `neo4j_upload` + `sparql_upload`.
88
+
To run a GME-only pass: enable just `metadata_extractor` (point `input_dir` at a crawl's output), then usually `sparql_upload` to land the triples into the Oxigraph named graph for that snapshot (`https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid`). A full ingest enables `crawler` → `metadata_extractor` → `neo4j_upload` + `sparql_upload`. After upload, confirm the graph size via `op-collections stats` → `sparql.named_graphs`.
0 commit comments