Skip to content

Commit f8a4b44

Browse files
deeenesclaude
andcommitted
Add architecture plan and AGENTS.md for AI assistants
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0f242b2 commit f8a4b44

2 files changed

Lines changed: 304 additions & 0 deletions

File tree

AGENTS.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# OmniPath Client — Instructions for AI Assistants
2+
3+
You are working on `omnipath-client`, the new Python client for the OmniPath
4+
molecular biology prior-knowledge web API.
5+
6+
## Architecture plan
7+
8+
Read `planning/architecture.md` for the full architecture plan: module
9+
structure, data flow, component descriptions, and implementation order.
10+
The initial specifications are in `planning/initial_specs.md`.
11+
12+
## The web API
13+
14+
- **Production**: https://dev.omnipathdb.org/
15+
- **API docs** (rendered): https://dev.omnipathdb.org/api-docs
16+
- **Server repo**: https://github.com/saezlab/omnipath-present
17+
- **Run locally**:
18+
```
19+
git clone git@github.com:saezlab/omnipath-present.git
20+
cd omnipath-present/api-service
21+
uv sync
22+
uv run uvicorn api_service.main:app --reload --port 8081
23+
curl http://localhost:8081/openapi.json
24+
```
25+
26+
The API uses POST endpoints returning Parquet files. A standard
27+
`openapi.json` will be available soon; until then the HTML API docs page
28+
is the reference.
29+
30+
## Related local repositories
31+
32+
| Package | Local path | Purpose |
33+
|---------|-----------|---------|
34+
| **saezverse** | `/home/denes/saezverse/` | Architecture repo: coding conventions, package descriptions, ADRs, plans |
35+
| **pkg_infra** (saezlab_core) | `/home/denes/pypath-new/pkg_infra/` | Config, logging, and session infrastructure for all saezlab packages |
36+
| **download-manager** || Cache-aware download manager ([GitHub](https://github.com/saezlab/download-manager)) |
37+
| **cache-manager** || SQLite-backed file caching ([GitHub](https://github.com/saezlab/cache-manager)) |
38+
| **annnet** || Annotated network/graph library, Polars-backed ([GitHub](https://github.com/saezlab/annnet)) |
39+
| **omnipath** (old client) || Legacy Python client being replaced ([GitHub](https://github.com/saezlab/omnipath)) |
40+
41+
## Key dependencies
42+
43+
- **pkg_infra (`saezlab_core`)** — all config, logging, and session management
44+
must go through this package. Source at
45+
`/home/denes/pypath-new/pkg_infra/saezlab_core/`. Uses OmegaConf YAML
46+
hierarchy for config, Python `dictConfig` for logging, and a singleton
47+
`Session` object. If it lacks features needed by this client, contribute
48+
upstream rather than building parallel solutions.
49+
- **download-manager** — wraps HTTP downloads with cache-manager integration.
50+
Async support is a goal (not yet implemented).
51+
- **narwhals** — dataframe compatibility layer. Default backend is **polars**.
52+
- **annnet** — for converting interaction/association data to graph objects.
53+
54+
## Coding conventions
55+
56+
Follow the saezlab Python coding style documented in
57+
`/home/denes/saezverse/human/guidelines/python-coding-style.md`. Key points:
58+
59+
- Spaces around `=` in keyword arguments and default values
60+
- Blank lines inside functions before/after blocks and between logical segments
61+
- Argument lists on multiple lines: opening paren on first line, each arg on
62+
its own line, trailing comma, closing paren at original indentation
63+
- Single quotes for strings
64+
- Google (Napoleon) docstring style with triple quotes on separate lines
65+
- Resource names as single words without underscores
66+
67+
## Package description
68+
69+
The saezverse package description for this client is at
70+
`/home/denes/saezverse/human/packages/omnipath-client.md`.

planning/architecture.md

Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
# OmniPath Client — Architecture Plan
2+
3+
## Context
4+
5+
New Python client for the OmniPath molecular biology web API
6+
(`https://dev.omnipathdb.org/`), replacing the old `omnipath` package. The API
7+
currently serves Parquet data via POST endpoints, but the client must be
8+
designed to accommodate other formats and endpoints in the future. The client
9+
must provide validated queries, multi-backend DataFrame output, graph
10+
conversion, and self-updating endpoint introspection.
11+
12+
## Module Structure
13+
14+
```
15+
omnipath_client/
16+
__init__.py # Re-exports public API from _client; triggers inventory load
17+
_metadata.py # (exists) version, author
18+
_client.py # OmniPath class + module-level convenience functions
19+
_session.py # Session via pkg_infra (get_session, config, logging)
20+
_constants.py # Base URL, static fallback inventory, defaults
21+
_types.py # BackendType literal, enums, type aliases
22+
_inventory.py # Fetch + parse API schema -> endpoint registry
23+
_endpoints.py # EndpointDef + ParamDef dataclasses
24+
_query.py # QueryBuilder + Query (validation against inventory)
25+
_download.py # Downloader wrapping download-manager (async-capable)
26+
_response.py # Response dispatch: Parquet, JSON, etc. + backend conversion
27+
_graph.py # DataFrame -> annnet.Graph conversion
28+
_errors.py # Exception hierarchy
29+
```
30+
31+
## Data Flow
32+
33+
```
34+
op.interactions(entity_ids=['Q9Y6K9'])
35+
-> OmniPath (lazy singleton or explicit)
36+
-> Inventory (loaded at import, non-blocking on failure)
37+
-> QueryBuilder.build() — validate endpoint, params, values
38+
-> Query object
39+
-> Downloader.fetch(query) — async POST via download-manager (cache-aware)
40+
-> Path to cached response file (.parquet, .json, etc.)
41+
-> ResponseHandler.parse(path, format, backend) — dispatch by format
42+
-> polars/pandas/pyarrow DataFrame
43+
-> (optional) interactions_to_graph(df) -> annnet.Graph
44+
```
45+
46+
## Key Components
47+
48+
### 1. Public API (`_client.py`)
49+
50+
Two interfaces: OO client for control, module-level functions for convenience.
51+
52+
```python
53+
# OO
54+
client = op.OmniPath(backend='pandas', base_url='...')
55+
df = client.interactions(entity_ids=['Q9Y6K9'])
56+
57+
# Convenience (lazy default singleton)
58+
df = op.interactions(entity_ids=['Q9Y6K9'])
59+
g = op.interactions(as_graph=True, entity_ids=['Q9Y6K9'])
60+
61+
# Introspection
62+
client.endpoints
63+
client.params('exports/interactions')
64+
client.values('exports/entities', 'entity_types')
65+
```
66+
67+
Methods mirror the 6 API endpoints:
68+
- `entities(**filters)`, `interactions(**filters)`, `associations(**filters)`
69+
- `entity_lookup(identifiers)`, `ontology_terms(term_ids)`,
70+
`ontology_tree(term_ids)`
71+
72+
`interactions()` and `associations()` accept `as_graph=True` to return
73+
`annnet.Graph`.
74+
75+
### 2. Inventory (`_inventory.py`)
76+
77+
Auto-populates endpoint/param/value definitions at **import time**.
78+
79+
**Phase 1 (now):** Parse the rendered HTML from `/api-docs` to extract
80+
endpoints, parameters, and allowed values. Runs at import time but **failure
81+
must not block import** — on any error (network, parse), log a warning and
82+
fall back to static definitions in `_constants.py`.
83+
84+
**Phase 2 (soon):** When the standard FastAPI/Swagger `openapi.json` becomes
85+
available (the server already supports it locally via
86+
`GET /openapi.json`), switch to fetching and parsing that. Same
87+
load-at-import + silent-fallback pattern.
88+
89+
Both phases share:
90+
1. Check for cached inventory (via cache-manager, with TTL).
91+
2. Attempt to fetch and parse the API schema.
92+
3. On failure, fall back to static definitions in `_constants.py`.
93+
4. Expose: `endpoints()`, `params(endpoint)`, `allowed_values(endpoint,
94+
param)`.
95+
96+
### 3. Query Validation (`_query.py`)
97+
98+
`QueryBuilder.build(endpoint, **params)` validates against inventory:
99+
- Endpoint exists
100+
- Param names recognized
101+
- Param types correct (string[], string, bool, enum)
102+
- Values in allowed set (if constrained)
103+
- Required params present
104+
105+
Raises specific exceptions from `_errors.py`.
106+
107+
### 4. Downloads (`_download.py`)
108+
109+
Wraps `download_manager.DownloadManager` with **async support as a goal**:
110+
- POST with JSON body (the API uses POST endpoints)
111+
- Cache keyed on URL + body
112+
- Returns path to cached response file
113+
114+
**Async strategy:** download-manager currently uses synchronous
115+
requests/pycurl backends. Plan:
116+
1. Add an async backend to download-manager (e.g. `httpx.AsyncClient`).
117+
2. Update cache-manager for async-compatible file I/O where needed.
118+
3. Expose both sync and async interfaces in omnipath-client:
119+
`client.interactions()` (sync) and `await client.ainteractions()` or an
120+
async context manager.
121+
4. Initial implementation is sync-first; async added incrementally to
122+
download-manager and cache-manager.
123+
124+
### 5. Response Handling (`_response.py`)
125+
126+
**Format-agnostic dispatch** — designed to accommodate future response formats:
127+
128+
```python
129+
def parse_response(
130+
source: Path | BytesIO,
131+
format: str = 'parquet', # future: 'json', 'csv', 'arrow_ipc', ...
132+
backend: BackendType = 'polars',
133+
) -> Any:
134+
```
135+
136+
- **Parquet** (current default): read via `pyarrow.parquet.read_table()`,
137+
convert to backend.
138+
- **Future formats**: JSON, CSV, Arrow IPC, etc. — each gets a reader
139+
function, dispatched by `format`.
140+
- Default backend: **polars** (Arrow-native, fast, matches annnet's Polars
141+
backend).
142+
- narwhals used as compatibility bridge for DataFrame operations.
143+
- The `format` is determined from the endpoint definition in the inventory, so
144+
adding a new format only requires a reader function and updating the
145+
endpoint metadata.
146+
147+
### 6. Graph Conversion (`_graph.py`)
148+
149+
For interactions: map `member_a_id`/`member_b_id` to source/target edges,
150+
create `annnet.Graph`. For associations: parent-member relationships as
151+
hyperedges.
152+
153+
### 7. Error Hierarchy (`_errors.py`)
154+
155+
```
156+
OmniPathError
157+
+-- OmniPathAPIError (HTTP 4xx/5xx)
158+
+-- OmniPathConnectionError
159+
+-- ValidationError
160+
| +-- UnknownEndpointError
161+
| +-- UnknownParameterError
162+
| +-- InvalidParameterValueError
163+
| +-- MissingParameterError
164+
+-- BackendNotAvailableError
165+
```
166+
167+
### 8. Config, Logging & Session (`_session.py`)
168+
169+
**All configuration and logging uses pkg_infra (`saezlab_core`).**
170+
171+
- **Config**: `saezlab_core.config.ConfigLoader.load_config()` provides
172+
hierarchical OmegaConf/YAML config merged from: ecosystem -> package
173+
defaults -> user dir -> workdir -> env vars. omnipath-client ships a
174+
`default_settings.yaml` (bundled as package data under
175+
`omnipath_client/data/`) with its own settings under a dedicated section
176+
(e.g. `omnipath_client:` with keys `base_url`, `backend`, `cache_ttl`,
177+
`timeout`, `retries`).
178+
179+
- **Logging**: `saezlab_core.logger.configure_loggers_from_omegaconf()` +
180+
standard `logging.getLogger(__name__)` in each module. Logging config lives
181+
in the same YAML hierarchy.
182+
183+
- **Session**: `saezlab_core.session.get_session()` singleton ties config +
184+
logger + runtime metadata. The client calls this once at initialization.
185+
186+
**pkg_infra considerations**: The current `Settings` schema in
187+
`saezlab_core.schema` uses `extra="forbid"` on all models. For
188+
omnipath-client to add its own config section, the schema needs to either:
189+
(a) allow extra fields at the top level, or (b) support a generic
190+
plugin/package config namespace. This is an upstream change to pkg_infra that
191+
should be designed to work for all saezlab client packages, not just
192+
omnipath-client.
193+
194+
Similarly, `ConfigLoader.read_package_default()` currently reads from
195+
`saezlab_core.data` — it needs to be parameterized to also read defaults
196+
from the calling package's data directory. This upstream generalization
197+
benefits all packages using pkg_infra.
198+
199+
## Cross-cutting: Upstream Work Required
200+
201+
| Package | Work needed |
202+
|----------------------|----------------------------------------------------------|
203+
| **pkg_infra** | Generalize config schema for per-package sections; |
204+
| | parameterize `read_package_default()` for caller package;|
205+
| | evaluate session API for client-library use |
206+
| **download-manager** | Add async download backend (httpx); ensure POST+JSON |
207+
| | support |
208+
| **cache-manager** | Async-compatible file I/O if needed by async download |
209+
210+
## Testing Strategy
211+
212+
- **Unit tests**: mock inventory, query validation, response conversion, graph
213+
conversion
214+
- **Integration tests**: mock HTTP server returning sample Parquet, full
215+
round-trip; also tests against local server
216+
(`uv run uvicorn api_service.main:app --port 8081`)
217+
- **Fixtures**: pre-generated small Parquet files, mock DownloadManager
218+
219+
## Implementation Order
220+
221+
1. `_errors.py`, `_types.py`, `_constants.py` — foundations
222+
2. `_endpoints.py` — dataclasses for endpoint/param definitions
223+
3. `_session.py` — integrate pkg_infra config/logging/session (upstream
224+
updates to pkg_infra as needed)
225+
4. `_inventory.py` — HTML parsing now, `openapi.json` soon; load at import,
226+
fail silently
227+
5. `_query.py` — query builder with validation
228+
6. `_download.py` — sync wrapper first; async in download-manager as follow-up
229+
7. `_response.py` — Parquet reader first, format dispatch for future
230+
extensibility
231+
8. `_graph.py` — annnet conversion
232+
9. `_client.py` — orchestrator + public API
233+
10. `__init__.py` — re-exports + import-time inventory load
234+
11. Tests throughout

0 commit comments

Comments
 (0)