Skip to content

Commit 6b5e78f

Browse files
committed
Merge branch 'develop' into feat/hub-knowledge
Conflict: src/open_pulse/gui/hub/main.py Both branches added new routers / imports to the FastAPI app: · develop → ``admin`` router (the /admin Resources panel from PR #51, commit 915151f) · this branch → ``hub`` + ``chaoss_routes`` routers + the _propagate_globals helper that mirrors the shared template env to instances with their own filters Resolved by keeping all three. New combined wiring: app.include_router(crawler.router) app.include_router(admin.router) # develop app.include_router(hub.router) # ours app.include_router(hub.api) # ours app.include_router(chaoss_routes.router) # ours base.html merged cleanly (the new ``/admin`` nav entry from develop sits alongside the new ``/chaoss`` nav entry we already added).
2 parents 1127cd5 + 1ca7d12 commit 6b5e78f

24 files changed

Lines changed: 3199 additions & 24 deletions

File tree

docs-site/docs/architecture/index.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,33 @@ slug: /architecture
55

66
# Architecture
77

8+
Open Pulse is built around a single unified Python package and a small
9+
set of cooperating services. The pipeline crawls software ecosystems
10+
from GitHub, stores the resulting graph in Neo4j, extracts richer
11+
metadata about each repository, lands that metadata in a SPARQL store,
12+
and feeds development-activity signals into a GrimoireLab + OpenSearch
13+
stack for CHAOSS-style time-series metrics. A FastAPI hub stitches
14+
operations together for humans.
15+
16+
```mermaid
17+
flowchart LR
18+
CLI[open-pulse CLI] -->|quest run| Q[Quest pipeline]
19+
HUB[open-pulse-hub] -. docker socket .-> CLI
20+
Q --> CR[open-pulse-crawler]
21+
CR --> N[(Neo4j)]
22+
Q --> GME[git-metadata-extractor]
23+
GME -->|JSON-LD| S[(sparql_store<br/>Oxigraph)]
24+
Q -->|projects.json| GL[GrimoireLab<br/>Mordred + SortingHat]
25+
GL --> OS[(OpenSearch)]
26+
```
27+
28+
Two query layers expose the data, each tuned for a different shape of
29+
question — see
30+
[Concepts → Graph & Semantic Data](../concepts/graph-and-semantic-data.md)
31+
for the Neo4j ↔ SPARQL split, and
32+
[Concepts → Metrics & CHAOSS](../concepts/metrics-and-chaoss.md) for
33+
the GrimoireLab side.
34+
835
## Repository boundaries
936

1037
- `src/open_pulse/` — CLI and runtime code (src-layout, hatchling-built,

docs-site/docs/community/index.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
---
2+
title: Community
3+
slug: /community
4+
---
5+
6+
# Community
7+
8+
Open Pulse is built in the open by **SDSC** and **EPFL Open Science**, and
9+
is meant to be reusable by any institution or research programme that
10+
wants to surface the activity around its open-source software outputs.
11+
12+
## About the project
13+
14+
Open Pulse automates the discovery and monitoring of open-source software
15+
produced by a research institution and makes community vitality and
16+
engagement visible and measurable. Traditional metrics — paper citations,
17+
GitHub stars — only capture a fraction of open-science impact; many
18+
valuable projects stay invisible because they are niche, early-stage, or
19+
low-visibility. Open Pulse aims to map and surface those hidden
20+
contributions across the full continuum of community engagement.
21+
22+
## People
23+
24+
The roster below tracks the live landing
25+
([sdsc-ordes.github.io/open-pulse](https://sdsc-ordes.github.io/open-pulse/)).
26+
27+
- [Carlos Vivar Rios](https://github.com/caviri) — SDSC Project Lead
28+
- [Aruni Senaratne](https://fr.linkedin.com/in/aruni-p-senaratne-2a591b1ba)
29+
EPFL Open Science Project Lead
30+
- [Laure Vancauwenberghe](https://github.com/vancauwe) — Senior Data
31+
Engineer, SDSC
32+
- [Robin Franken](https://github.com/rmfranken) — Senior Knowledge and
33+
Data Engineer, SDSC
34+
- [Eisha Mazhar](https://github.com/EishaMazhar) — UNIGE-SDSC Data
35+
Science Intern
36+
- [Oksana Riba](https://ch.linkedin.com/in/oksana80) — Head of ORDES Team
37+
- [Gilles Dubochet](https://people.epfl.ch/gilles.dubochet?lang=en&cvlang=en)
38+
Head of Open Science
39+
- [Noémie Mazaré](https://ch.linkedin.com/in/noemie-mazare-2b960699)
40+
Former EPFL Open Science Project Lead
41+
42+
### Collaborations
43+
44+
- [EPFL ENAC-IT4R](https://www.epfl.ch/schools/enac/about/data-at-enac/enac-it4research/)
45+
- [EPFL C4DT](https://c4dt.epfl.ch/)
46+
47+
## Institutions and funding
48+
49+
A project by **SDSC** and **EPFL**, funded by **swissuniversities** and
50+
the **ETH Board**.
51+
52+
A
53+
[preliminary GitHub analysis](https://github.com/EPFL-Open-Science/EPFL_OS_Analysis)
54+
sponsored by EPFL Open Science seeded the work that became Open Pulse.
55+
56+
## Hosted nodes
57+
58+
Anyone running an Open Pulse instance can register it as a hosted node
59+
so the project landing surfaces a discoverable card for it. See
60+
[Register a node](../operations/register-a-node.md) for the schema and
61+
the browser-only node-builder form. The first listed node is
62+
[openpulse.epfl.ch](https://openpulse.epfl.ch), the EPFL instance.
63+
64+
## Events
65+
66+
- **Open Pulse Mini-Hackathon** — November 2025. Hands-on workshop on
67+
community health, license impact and cross-institutional collaboration
68+
patterns using Open Pulse data.
69+
- **Open Pulse Webinar** — January 2026. Introduction to the
70+
architecture, the data access surfaces (SPARQL, Cypher, REST) and
71+
research use-cases.
72+
73+
## Get involved
74+
75+
- **Code** — open an issue or PR at
76+
[sdsc-ordes/open-pulse](https://github.com/sdsc-ordes/open-pulse).
77+
- **Run your own node** — see
78+
[Register a node](../operations/register-a-node.md).
79+
- **Share research** — Jupyter notebooks, papers and case studies built
80+
on Open Pulse data are welcome.
81+
82+
## Contact
83+
84+
- carlos.vivarrios@epfl.ch (SDSC)
85+
- aruni.senaratne@epfl.ch (EPFL Open Science)
86+
- GitHub: [sdsc-ordes/open-pulse](https://github.com/sdsc-ordes/open-pulse)
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
---
2+
title: Graph & Semantic Data
3+
slug: /concepts/graph-and-semantic-data
4+
---
5+
6+
# Graph & Semantic Data
7+
8+
Open Pulse stores its data in two complementary layers, each tuned for a
9+
different kind of question:
10+
11+
- **Neo4j property graph** — fast traversal of the collaboration
12+
network. Users, repositories and organizations connected by
13+
contribution, ownership, membership and fork links.
14+
- **SPARQL store** (`sparql_store` — typically Oxigraph, but any
15+
SPARQL 1.1 + Graph Store HTTP Protocol backend works) — semantically
16+
rich metadata about each repository, person and organization, modelled
17+
with [the Open Pulse vocabulary](metadata-and-ontology.md).
18+
19+
Both layers are produced by the same pipeline; the two run in parallel
20+
so the right shape of question can hit the right backend.
21+
22+
```mermaid
23+
flowchart LR
24+
C[Open Pulse Crawler] --> N[(Neo4j<br/>property graph)]
25+
N --> M[git-metadata-extractor]
26+
M -->|JSON-LD| Q[Quest step:<br/>sparql_upload]
27+
Q --> S[(SPARQL store<br/>sparql_store)]
28+
N -. fast traversal .-> A1[Network analysis<br/>centrality, communities]
29+
S -. semantic query .-> A2[Metadata queries<br/>license, discipline, FAIR]
30+
```
31+
32+
## Neo4j: the community network
33+
34+
### Schema
35+
36+
```mermaid
37+
graph LR
38+
U((User)) -- CONTRIBUTES_TO --> R((Repo))
39+
O((Org)) -- OWNS --> R
40+
U -- MEMBER_OF --> O
41+
R -- FORK_OF --> R2((Repo))
42+
```
43+
44+
Three node labels and four relationship types. Property keys on the
45+
nodes:
46+
47+
| Label | Properties |
48+
| ------ | ---------------------------------------------------------------------------- |
49+
| `Repo` | `id`, `name`, `full_name`, `owner`, `is_explored`, `exploration_timestamp` |
50+
| `User` | `id`, `login`, `name`, `type`, `is_explored`, `exploration_timestamp` |
51+
| `Org` | `id`, `login`, `name`, `type`, `is_explored`, `exploration_timestamp` |
52+
53+
### Cypher examples
54+
55+
Each snippet below runs against the live Neo4j instance
56+
(`bolt://localhost:7504` from the host, `bolt://neo4j:7687` inside the
57+
stack).
58+
59+
**Top contributors by repository breadth.**
60+
61+
```cypher
62+
MATCH (u:User)-[:CONTRIBUTES_TO]->(r:Repo)
63+
RETURN u.login AS user, count(r) AS repos
64+
ORDER BY repos DESC
65+
LIMIT 10
66+
```
67+
68+
**All repositories an organization owns.**
69+
70+
```cypher
71+
MATCH (o:Org {login: "sdsc-ordes"})-[:OWNS]->(r:Repo)
72+
RETURN r.full_name AS repo
73+
ORDER BY repo
74+
```
75+
76+
**Find users who contribute to two specific repos (co-contributors).**
77+
78+
```cypher
79+
MATCH (u:User)-[:CONTRIBUTES_TO]->(r1:Repo {full_name: "sdsc-ordes/gimie"}),
80+
(u)-[:CONTRIBUTES_TO]->(r2:Repo)
81+
WHERE r2.full_name <> r1.full_name
82+
RETURN u.login AS user, collect(DISTINCT r2.full_name) AS also_contributes_to
83+
ORDER BY size(also_contributes_to) DESC
84+
LIMIT 10
85+
```
86+
87+
**Repositories with the most forks in the store.**
88+
89+
```cypher
90+
MATCH (fork:Repo)-[:FORK_OF]->(parent:Repo)
91+
RETURN parent.full_name AS repo, count(fork) AS forks
92+
ORDER BY forks DESC
93+
LIMIT 10
94+
```
95+
96+
**Shortest collaboration path between two users.**
97+
98+
```cypher
99+
MATCH p = shortestPath(
100+
(a:User {login: "caviri"})-[:CONTRIBUTES_TO|:MEMBER_OF*..6]-(b:User {login: "cmdoret"})
101+
)
102+
RETURN [n IN nodes(p) | coalesce(n.login, n.full_name)] AS hops
103+
```
104+
105+
### Neo4j Browser
106+
107+
A graphical Cypher console is available at
108+
[http://localhost:7503](http://localhost:7503). Authentication uses the
109+
`NEO4J_AUTH` credentials from `infra/.env` (default user: `neo4j`).
110+
111+
## SPARQL store: semantic queries
112+
113+
The same entities exist in the SPARQL store as RDF resources, modelled
114+
with a small custom vocabulary plus schema.org and the W3C Organization
115+
and Time ontologies. See
116+
[Metadata & Ontology](metadata-and-ontology.md) for the vocabulary
117+
reference and SPARQL examples.
118+
119+
### When to use which layer
120+
121+
| Question shape | Best layer |
122+
| ------------------------------------------------------- | --------------------- |
123+
| "Shortest path between two contributors" | Neo4j (graph algos) |
124+
| "Centrality / community detection / PageRank" | Neo4j + GDS plugin |
125+
| "Which repos are MIT-licensed and written in Python?" | SPARQL store |
126+
| "All people whose membership in `sdsc-ordes` is still open" | SPARQL store |
127+
| "Repositories enriched with linked external IDs (ORCID, …)" | SPARQL store |
128+
| "Aggregate contribution counts per discipline" | SPARQL store |
129+
130+
The same repository appears in both layers: a `Repo` node in Neo4j
131+
(identified by `full_name`) maps to a `schema:SoftwareSourceCode`
132+
resource in the SPARQL store (identified by
133+
`op:githubRepositoryHandle`).
134+
135+
## Cross-layer joins from Python
136+
137+
Pipeline steps and notebooks talk to both layers through the
138+
[Services](../services/index.md) container. Outside the pipeline, the
139+
two endpoints can be queried side-by-side from a notebook:
140+
141+
```python
142+
from neo4j import GraphDatabase
143+
from SPARQLWrapper import SPARQLWrapper, JSON
144+
145+
neo = GraphDatabase.driver("bolt://localhost:7504", auth=("neo4j", "<password>"))
146+
sparql = SPARQLWrapper("http://localhost:7502/query")
147+
sparql.setReturnFormat(JSON)
148+
149+
# 1. Graph traversal in Neo4j
150+
with neo.session() as s:
151+
repos = [r["full_name"] for r in s.run(
152+
"MATCH (:Org {login: 'sdsc-ordes'})-[:OWNS]->(r:Repo) RETURN r.full_name AS full_name"
153+
)]
154+
155+
# 2. Enrich with semantic metadata from the SPARQL store
156+
values = " ".join(f'"{h}"' for h in repos)
157+
sparql.setQuery(f"""
158+
PREFIX op: <https://open-pulse.epfl.ch/ontology#>
159+
PREFIX schema: <http://schema.org/>
160+
SELECT ?handle ?license ?language WHERE {{
161+
VALUES ?handle {{ {values} }}
162+
?repo op:githubRepositoryHandle ?handle .
163+
OPTIONAL {{ ?repo schema:license ?license }}
164+
OPTIONAL {{ ?repo schema:programmingLanguage ?language }}
165+
}}
166+
""")
167+
rows = sparql.query().convert()["results"]["bindings"]
168+
```
169+
170+
## Where each backend runs
171+
172+
| Service | Inside the stack | From the host |
173+
| -------------- | --------------------------------------- | ---------------------------------- |
174+
| Neo4j Bolt | `bolt://neo4j:7687` | `bolt://localhost:7504` |
175+
| Neo4j Browser | `http://neo4j:7474` | `http://localhost:7503` |
176+
| SPARQL endpoint| `http://sparql-proxy:7878/query` | `http://localhost:7502/query` |
177+
178+
Host ports can shift if you customise `infra/.env``op deploy ps`
179+
shows the live mapping.

0 commit comments

Comments
 (0)