sdsc-ordes
diff --git a/‎.agents/PROJECT.md‎
Lines changed: 4 additions & 3 deletions b/‎.agents/PROJECT.md‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎.agents/skills/op-collections/SKILL.md‎
Lines changed: 14 additions & 0 deletions b/‎.agents/skills/op-collections/SKILL.md‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎.agents/skills/op-extractor/SKILL.md‎
Lines changed: 2 additions & 2 deletions b/‎.agents/skills/op-extractor/SKILL.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.agents/skills/query-chaoss/SKILL.md‎
Lines changed: 1 addition & 1 deletion b/‎.agents/skills/query-chaoss/SKILL.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.agents/skills/query-neo4j/SKILL.md‎
Lines changed: 1 addition & 1 deletion b/‎.agents/skills/query-neo4j/SKILL.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.agents/skills/query-opensearch/SKILL.md‎
Lines changed: 1 addition & 1 deletion b/‎.agents/skills/query-opensearch/SKILL.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.agents/skills/query-sparql/SKILL.md‎
Lines changed: 47 additions & 11 deletions b/‎.agents/skills/query-sparql/SKILL.md‎
Lines changed: 47 additions & 11 deletions
diff --git a/‎.claude/PROJECT.md‎
Lines changed: 4 additions & 3 deletions b/‎.claude/PROJECT.md‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎.claude/skills/op-collections/SKILL.md‎
Lines changed: 14 additions & 0 deletions b/‎.claude/skills/op-collections/SKILL.md‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎.claude/skills/op-extractor/SKILL.md‎
Lines changed: 2 additions & 2 deletions b/‎.claude/skills/op-extractor/SKILL.md‎
Lines changed: 2 additions & 2 deletions
@@ -44,8 +44,9 @@ When wiring data or running query skills, use `openpulse.epfl.ch`. When citing t
 ### Oxigraph (SPARQL) — RDF metadata
 
 - **Endpoint:** `:7502`, behind a Caddy proxy that terminates HTTP-Basic auth (`/query` for reads, `/update` for writes)
-- **Contents:** ~300k triples across multiple named graphs (e.g. `http://open-pulse/repos`, `http://open-pulse/metadata`)
-- **Use it for:** Structured metadata, vocabulary/ontology queries, anything that benefits from `SELECT … WHERE { GRAPH ?g { … } }`
+- **Contents:** ~2.45M triples in the current production snapshot (`https://open-pulse.epfl.ch/graph/2026-05/hybrid`), plus utility graphs (`_backup/…`, `_links/identity`) and in-progress snapshots (`2026-06/hybrid`, …). **Default graph mode** — plain `{ ?s ?p ?o }` without a `GRAPH` clause — resolves to that production snapshot. Use explicit `GRAPH <…>` to pin a snapshot or reach non-default graphs. See `query-sparql` skill.
+- **Named-graph convention:** production snapshots live at `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid`; pipeline `sparql_upload` promotes the current month into both the named graph and the default graph. Inventory: `op-collections stats` → `sparql.named_graphs`.
+- **Use it for:** Structured metadata, vocabulary/ontology queries, repo stars/licenses/languages, contributions, ORCID↔GitHub bridges, scholarly articles
 - **Skill:** `query-sparql` (SELECT/ASK/CONSTRUCT/DESCRIBE). Updates are intentionally not supported by the skill — use `curl` explicitly if you need to mutate.
 
 ### OpenSearch — search & enriched indices
@@ -162,4 +163,4 @@ If a change makes one of these journeys harder (e.g. couples the design system t
 - **Data store query skills**: `.agents/skills/query-{neo4j,sparql,opensearch}/SKILL.md`
 - **CHAOSS health metrics**: `.agents/skills/query-chaoss/SKILL.md` (featured dashboard slugs above)
 - **Publishing**: README → *Publishing to GitHub Pages*
-- **Devcontainer**: `.devcontainer/`
+- **Devcontainer**: `.devcontainer/` (compose + images in `tools/image/docker/`)
@@ -79,6 +79,20 @@ Output is JSON (pretty-printed); `export` always streams the raw body so you can
 
 Each row payload includes `db_path`, `table`, and `columns` so you can see the schema before filtering. `cstats` exposes `search.columns` (what `--q` matches) and `search.examples`.
 
+## Oxigraph named graphs (`stats` → `sparql`)
+
+The `sparql` block lists every named graph Oxigraph holds — authoritative sizes before writing SPARQL:
+
+```json
+"named_graphs": [
+  { "uri": "https://open-pulse.epfl.ch/graph/2026-05/hybrid", "triples": 2453125 },
+  { "uri": "https://open-pulse.epfl.ch/graph/2026-06/hybrid", "triples": 328691 },
+  …
+]
+```
+
+**Convention:** production data lives in `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid` and is also exposed as Oxigraph's **default graph** — plain `{ ?s ?p ?o }` queries work without a `GRAPH` clause. Use explicit `GRAPH <…>` to pin a snapshot or reach utility graphs (`_backup/…`, `_links/identity`). Full modes and gotchas: `query-sparql` skill.
+
 ## Notes
 
 - This is a **separate store** from the three query-* skills: the collections are the hub's curated DuckDB indices, not Neo4j/SPARQL/OpenSearch. Use `stats` to see all four side by side (`sparql`, neo4j, `opensearch`, `duckdb` blocks).
 
@@ -81,11 +81,11 @@ quest:
       skip_existing: true                         # don't re-process repos already done
       max_repos: 0                                # 0 = no cap
     neo4j_upload:           { enabled: false }    # loads crawler graph → Neo4j (input_dir/input_filename)
-    sparql_upload:          { enabled: false }    # uploads extracted RDF → Oxigraph
+    sparql_upload:          { enabled: false }    # uploads extracted RDF → Oxigraph named graph (…/graph/{YYYY-MM}/hybrid)
     apply_grimoire_projects: { enabled: false }
 ```
 
-To run a GME-only pass: enable just `metadata_extractor` (point `input_dir` at a crawl's output), then usually `sparql_upload` to land the triples. A full ingest enables `crawler` → `metadata_extractor` → `neo4j_upload` + `sparql_upload`.
+To run a GME-only pass: enable just `metadata_extractor` (point `input_dir` at a crawl's output), then usually `sparql_upload` to land the triples into the Oxigraph named graph for that snapshot (`https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid`). A full ingest enables `crawler` → `metadata_extractor` → `neo4j_upload` + `sparql_upload`. After upload, confirm the graph size via `op-collections stats` → `sparql.named_graphs`.
 
 ## Run status fields
 
 
@@ -211,7 +211,7 @@ A **project metric** adds `repo_count`, `truncated`, `cached_at`, and an `aggreg
 
 ## Live state (verified 2026-06-10)
 
-- The hub serves the **2026-05 snapshot**. The newest signals (`test_coverage`, `release_frequency`) and the issue/PR-based metrics (`first_response`, `cr_*`, `issues_*`) read `"—"` for most repos until a fresh re-extraction lands — **expect them sparse**.
+- The hub serves the **2026-05 snapshot**. SPARQL traces in `--include traces` query `GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid>` (see `query-sparql` named-graph convention). The newest signals (`test_coverage`, `release_frequency`) and the issue/PR-based metrics (`first_response`, `cr_*`, `issues_*`) read `"—"` for most repos until a fresh re-extraction lands — **expect them sparse**.
 - Repos are **GitHub-only**: `repo <owner> <repo>` → `/repositories/github.com/...`.
 - Projects are discipline/topic buckets of repos. **The set and count change over time, so always read it from `projects` — never hardcode a number.** At time of writing the largest are `info-eng` (~109 repos), `bioeng` (~95), `stats` (~63), with domain-relevant ones like `protein_ai_ecosystem` (~26), `bio` (~42), `chem` (~10). Use the exact `project` slug returned by `projects` (e.g. `protein_ai_ecosystem`, not `protein-ai`). `project-repos <project>` returns the project header plus both a `metrics[]` summary and a `repositories[]` list.
 - A browsable UI to explore first: `https://openpulse.epfl.ch/chaoss` (same auth).
 
@@ -104,7 +104,7 @@ Neo4j, never OpenSearch.
 - **Owner→repos for several orgs at once**: `MATCH (o:Org)-[:OWNS]->(r:Repo) WHERE o.login IN $logins RETURN o.login, r.full_name` — the basis for org-scoped catalogs.
 - **PR/issue/review/comment metrics**: always come from the edge types above. Count `DISTINCT u` for "people" and `count(*)` for "events"; there is no event date to bucket by.
 - **`DEPENDS_ON` is large** (~259k). Always scope it to a seed set (`WHERE r.full_name IN $urls`) and add `LIMIT`, or it returns the whole ecosystem.
-- **Affiliations**: `(:User)-[:AFFILIATED_WITH]->(:RorOrg)` mirrors the SPARQL `org:hasMembership` data; institutions are ROR-identified in both stores.
+- **Affiliations**: `(:User)-[:AFFILIATED_WITH]->(:RorOrg)` mirrors the SPARQL `org:hasMembership` data (default graph or `GRAPH <…/graph/{YYYY-MM}/hybrid>`); institutions are ROR-identified in both stores. See `query-sparql` for default vs named-graph modes.
 
 ## Conventions
 
 
@@ -82,7 +82,7 @@ when matching.
 | `repo_name` | clone URL + `.git` (keyword; use directly in `terms`) |
 | `author_uuid` / `Author_uuid` | stable author identity (use for `cardinality`/`terms`) |
 | `author_name` | display name |
-| `author_org_name` | **almost always `"Unknown"`** — affiliation NOT resolved here; get orgs from SPARQL/Neo4j instead |
+| `author_org_name` | **almost always `"Unknown"`** — affiliation NOT resolved here; get orgs from SPARQL (default graph or named graph — see `query-sparql`) or Neo4j instead |
 | `grimoire_creation_date` | canonical commit timestamp (use for `date_histogram`, min/max) |
 | `author_date`, `commit_date` | raw git dates |
 | `lines_added`, `lines_removed`, `lines_changed`, `files` | churn (`sum`-able) |
 
@@ -34,17 +34,52 @@ node .agents/skills/query-sparql/query.mjs -f query.rq
 
 For `SELECT` the script flattens the SPARQL JSON Results envelope to a plain `[{var: value}, ...]` array. For `ASK`, `CONSTRUCT`, `DESCRIBE`, or any non-`json` accept, the response is passed through.
 
-## Live graph state (verified 2026-06-05)
+## Default graph vs named graphs (verified 2026-06-10)
 
-The data you want is in **one big named graph** — always wrap patterns in it:
+Oxigraph holds production RDF in **named graphs**, but the hub also configures a **default graph** so plain SPARQL (no `GRAPH` clause) works.
 
+### Two query modes
+
+| Mode | Syntax | When to use |
+|---|---|---|
+| **Default graph** | `{ ?s ?p ?o }` — no `GRAPH` wrapper | Most ad-hoc queries. Oxigraph resolves this to the **current production snapshot** (~2.45M triples today, same data as `…/graph/2026-05/hybrid`). |
+| **Named graph** | `GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid> { … }` | Pin a specific snapshot, query utility graphs, or compare graphs side by side. Required for `_backup/…`, `_links/identity`, or in-progress `2026-06/hybrid`. |
+
+```sparql
+# Default graph mode — fine for everyday repo/metadata lookups
+SELECT ?name WHERE {
+  <https://github.com/biopython/biopython> schema:name ?name .
+}
+
+# Named graph mode — pin a snapshot or reach non-default graphs
+SELECT ?name WHERE {
+  GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid> {
+    <https://github.com/biopython/biopython> schema:name ?name .
+  }
+}
 ```
-GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid> { ... }   # ~2.12M triples
-```
 
-Other graphs: a small per-study `…/graph/authors/protein-ai` (~1.1k), plus a
-default (unnamed) graph. Querying without the `GRAPH` wrapper across everything
-is slow and mixes studies — scope to the hybrid graph.
+Default mode does **not** union every named graph — backups and in-progress snapshots are invisible unless you name them explicitly.
+
+### Named-graph IRI pattern
+
+| Kind | Pattern | Example |
+|---|---|---|
+| **Production snapshot** | `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid` | `…/graph/2026-05/hybrid` |
+| **Utility / backup** | `https://open-pulse.epfl.ch/graph/_…` | `…/_backup/2026-05-hybrid-prenorm`, `…/_links/identity` |
+
+Pipeline `sparql_upload` (op-extractor) lands triples in the named graph for that month; the hub also promotes the current snapshot into the default graph. CHAOSS SPARQL traces may use either form.
+
+### Current named graphs (live)
+
+| Named graph | Triples | Role |
+|---|---|---|
+| `https://open-pulse.epfl.ch/graph/2026-05/hybrid` | ~2.45M | **Current production snapshot** — also what default-graph queries see |
+| `https://open-pulse.epfl.ch/graph/_backup/2026-05-hybrid-prenorm` | ~2.12M | Pre-normalisation backup — named graph only |
+| `https://open-pulse.epfl.ch/graph/2026-06/hybrid` | ~329k | In-progress next snapshot — named graph only |
+| `https://open-pulse.epfl.ch/graph/_links/identity` | ~204 | Cross-store identity links — named graph only |
+
+Refresh sizes: `python .agents/skills/op-collections/query.py stats` → `sparql.named_graphs`, or the inventory query below.
 
 ## Prefixes used in this graph
 
@@ -94,10 +129,11 @@ institution is `<https://ror.org/…>`. Match the full URL literal.
 
 | Goal | SPARQL |
 |---|---|
-| Total triple count | `SELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o }` |
 | Named graphs + sizes | `SELECT ?g (COUNT(*) AS ?n) WHERE { GRAPH ?g { ?s ?p ?o } } GROUP BY ?g ORDER BY DESC(?n)` |
-| Predicates on a repo | `SELECT DISTINCT ?p WHERE { GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid> { <https://github.com/biopython/biopython> ?p ?o } }` |
-| Stars/forks for repos | `… { VALUES ?r { <…/repo1> <…/repo2> } ?r op:githubRepoStars ?s ; op:githubRepoForks ?f }` |
+| Triple count (default graph) | `SELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o }` |
+| Triple count (named graph) | `SELECT (COUNT(*) AS ?n) WHERE { GRAPH <https://open-pulse.epfl.ch/graph/2026-05/hybrid> { ?s ?p ?o } }` |
+| Predicates on a repo | `SELECT DISTINCT ?p WHERE { <https://github.com/biopython/biopython> ?p ?o }` |
+| Stars/forks for repos | `{ VALUES ?r { <…/repo1> <…/repo2> } ?r op:githubRepoStars ?s ; op:githubRepoForks ?f }` |
 
 ## Gotchas learned the hard way
 
@@ -107,7 +143,7 @@ institution is `<https://ror.org/…>`. Match the full URL literal.
 
 ## Conventions
 
-- Always include `LIMIT` on exploratory queries, and wrap in the hybrid `GRAPH`.
+- Always include `LIMIT` on exploratory queries. **Default graph mode** (`{ … }` without `GRAPH`) is fine for the current production snapshot; use an explicit `GRAPH <https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid>` when you need a specific snapshot or a non-default graph.
 - Updates require `SPARQL_AUTH` admin role and are destructive — never run them unless the user explicitly asks. Use curl, not these scripts.
 - Oxigraph default response is SPARQL XML; the scripts always set `Accept` explicitly.
 - A 504 from the proxy means the query timed out — reduce the result set, tighten the pattern, or switch to fetch-and-join.
@@ -44,8 +44,9 @@ When wiring data or running query skills, use `openpulse.epfl.ch`. When citing t
 ### Oxigraph (SPARQL) — RDF metadata
 
 - **Endpoint:** `:7502`, behind a Caddy proxy that terminates HTTP-Basic auth (`/query` for reads, `/update` for writes)
-- **Contents:** ~300k triples across multiple named graphs (e.g. `http://open-pulse/repos`, `http://open-pulse/metadata`)
-- **Use it for:** Structured metadata, vocabulary/ontology queries, anything that benefits from `SELECT … WHERE { GRAPH ?g { … } }`
+- **Contents:** ~2.45M triples in the current production snapshot (`https://open-pulse.epfl.ch/graph/2026-05/hybrid`), plus utility graphs (`_backup/…`, `_links/identity`) and in-progress snapshots (`2026-06/hybrid`, …). **Default graph mode** — plain `{ ?s ?p ?o }` without a `GRAPH` clause — resolves to that production snapshot. Use explicit `GRAPH <…>` to pin a snapshot or reach non-default graphs. See `query-sparql` skill.
+- **Named-graph convention:** production snapshots live at `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid`; pipeline `sparql_upload` promotes the current month into both the named graph and the default graph. Inventory: `op-collections stats` → `sparql.named_graphs`.
+- **Use it for:** Structured metadata, vocabulary/ontology queries, repo stars/licenses/languages, contributions, ORCID↔GitHub bridges, scholarly articles
 - **Skill:** `query-sparql` (SELECT/ASK/CONSTRUCT/DESCRIBE). Updates are intentionally not supported by the skill — use `curl` explicitly if you need to mutate.
 
 ### OpenSearch — search & enriched indices
@@ -162,4 +163,4 @@ If a change makes one of these journeys harder (e.g. couples the design system t
 - **Data store query skills**: `.claude/skills/query-{neo4j,sparql,opensearch}/SKILL.md`
 - **CHAOSS health metrics**: `.claude/skills/query-chaoss/SKILL.md` (featured dashboard slugs above)
 - **Publishing**: README → *Publishing to GitHub Pages*
-- **Devcontainer**: `.devcontainer/`
+- **Devcontainer**: `.devcontainer/` (compose + images in `tools/image/docker/`)
@@ -79,6 +79,20 @@ Output is JSON (pretty-printed); `export` always streams the raw body so you can
 
 Each row payload includes `db_path`, `table`, and `columns` so you can see the schema before filtering. `cstats` exposes `search.columns` (what `--q` matches) and `search.examples`.
 
+## Oxigraph named graphs (`stats` → `sparql`)
+
+The `sparql` block lists every named graph Oxigraph holds — authoritative sizes before writing SPARQL:
+
+```json
+"named_graphs": [
+  { "uri": "https://open-pulse.epfl.ch/graph/2026-05/hybrid", "triples": 2453125 },
+  { "uri": "https://open-pulse.epfl.ch/graph/2026-06/hybrid", "triples": 328691 },
+  …
+]
+```
+
+**Convention:** production data lives in `https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid` and is also exposed as Oxigraph's **default graph** — plain `{ ?s ?p ?o }` queries work without a `GRAPH` clause. Use explicit `GRAPH <…>` to pin a snapshot or reach utility graphs (`_backup/…`, `_links/identity`). Full modes and gotchas: `query-sparql` skill.
+
 ## Notes
 
 - This is a **separate store** from the three query-* skills: the collections are the hub's curated DuckDB indices, not Neo4j/SPARQL/OpenSearch. Use `stats` to see all four side by side (`sparql`, neo4j, `opensearch`, `duckdb` blocks).
 
@@ -81,11 +81,11 @@ quest:
       skip_existing: true                         # don't re-process repos already done
       max_repos: 0                                # 0 = no cap
     neo4j_upload:           { enabled: false }    # loads crawler graph → Neo4j (input_dir/input_filename)
-    sparql_upload:          { enabled: false }    # uploads extracted RDF → Oxigraph
+    sparql_upload:          { enabled: false }    # uploads extracted RDF → Oxigraph named graph (…/graph/{YYYY-MM}/hybrid)
     apply_grimoire_projects: { enabled: false }
 ```
 
-To run a GME-only pass: enable just `metadata_extractor` (point `input_dir` at a crawl's output), then usually `sparql_upload` to land the triples. A full ingest enables `crawler` → `metadata_extractor` → `neo4j_upload` + `sparql_upload`.
+To run a GME-only pass: enable just `metadata_extractor` (point `input_dir` at a crawl's output), then usually `sparql_upload` to land the triples into the Oxigraph named graph for that snapshot (`https://open-pulse.epfl.ch/graph/{YYYY-MM}/hybrid`). A full ingest enables `crawler` → `metadata_extractor` → `neo4j_upload` + `sparql_upload`. After upload, confirm the graph size via `op-collections stats` → `sparql.named_graphs`.
 
 ## Run status fields