|
| 1 | +# OmniPath Client — Architecture Plan |
| 2 | + |
| 3 | +## Context |
| 4 | + |
| 5 | +New Python client for the OmniPath molecular biology web API |
| 6 | +(`https://dev.omnipathdb.org/`), replacing the old `omnipath` package. The API |
| 7 | +currently serves Parquet data via POST endpoints, but the client must be |
| 8 | +designed to accommodate other formats and endpoints in the future. The client |
| 9 | +must provide validated queries, multi-backend DataFrame output, graph |
| 10 | +conversion, and self-updating endpoint introspection. |
| 11 | + |
| 12 | +## Module Structure |
| 13 | + |
| 14 | +``` |
| 15 | +omnipath_client/ |
| 16 | + __init__.py # Re-exports public API from _client; triggers inventory load |
| 17 | + _metadata.py # (exists) version, author |
| 18 | + _client.py # OmniPath class + module-level convenience functions |
| 19 | + _session.py # Session via pkg_infra (get_session, config, logging) |
| 20 | + _constants.py # Base URL, static fallback inventory, defaults |
| 21 | + _types.py # BackendType literal, enums, type aliases |
| 22 | + _inventory.py # Fetch + parse API schema -> endpoint registry |
| 23 | + _endpoints.py # EndpointDef + ParamDef dataclasses |
| 24 | + _query.py # QueryBuilder + Query (validation against inventory) |
| 25 | + _download.py # Downloader wrapping download-manager (async-capable) |
| 26 | + _response.py # Response dispatch: Parquet, JSON, etc. + backend conversion |
| 27 | + _graph.py # DataFrame -> annnet.Graph conversion |
| 28 | + _errors.py # Exception hierarchy |
| 29 | +``` |
| 30 | + |
| 31 | +## Data Flow |
| 32 | + |
| 33 | +``` |
| 34 | +op.interactions(entity_ids=['Q9Y6K9']) |
| 35 | + -> OmniPath (lazy singleton or explicit) |
| 36 | + -> Inventory (loaded at import, non-blocking on failure) |
| 37 | + -> QueryBuilder.build() — validate endpoint, params, values |
| 38 | + -> Query object |
| 39 | + -> Downloader.fetch(query) — async POST via download-manager (cache-aware) |
| 40 | + -> Path to cached response file (.parquet, .json, etc.) |
| 41 | + -> ResponseHandler.parse(path, format, backend) — dispatch by format |
| 42 | + -> polars/pandas/pyarrow DataFrame |
| 43 | + -> (optional) interactions_to_graph(df) -> annnet.Graph |
| 44 | +``` |
| 45 | + |
| 46 | +## Key Components |
| 47 | + |
| 48 | +### 1. Public API (`_client.py`) |
| 49 | + |
| 50 | +Two interfaces: OO client for control, module-level functions for convenience. |
| 51 | + |
| 52 | +```python |
| 53 | +# OO |
| 54 | +client = op.OmniPath(backend='pandas', base_url='...') |
| 55 | +df = client.interactions(entity_ids=['Q9Y6K9']) |
| 56 | + |
| 57 | +# Convenience (lazy default singleton) |
| 58 | +df = op.interactions(entity_ids=['Q9Y6K9']) |
| 59 | +g = op.interactions(as_graph=True, entity_ids=['Q9Y6K9']) |
| 60 | + |
| 61 | +# Introspection |
| 62 | +client.endpoints |
| 63 | +client.params('exports/interactions') |
| 64 | +client.values('exports/entities', 'entity_types') |
| 65 | +``` |
| 66 | + |
| 67 | +Methods mirror the 6 API endpoints: |
| 68 | +- `entities(**filters)`, `interactions(**filters)`, `associations(**filters)` |
| 69 | +- `entity_lookup(identifiers)`, `ontology_terms(term_ids)`, |
| 70 | + `ontology_tree(term_ids)` |
| 71 | + |
| 72 | +`interactions()` and `associations()` accept `as_graph=True` to return |
| 73 | +`annnet.Graph`. |
| 74 | + |
| 75 | +### 2. Inventory (`_inventory.py`) |
| 76 | + |
| 77 | +Auto-populates endpoint/param/value definitions at **import time**. |
| 78 | + |
| 79 | +**Phase 1 (now):** Parse the rendered HTML from `/api-docs` to extract |
| 80 | +endpoints, parameters, and allowed values. Runs at import time but **failure |
| 81 | +must not block import** — on any error (network, parse), log a warning and |
| 82 | +fall back to static definitions in `_constants.py`. |
| 83 | + |
| 84 | +**Phase 2 (soon):** When the standard FastAPI/Swagger `openapi.json` becomes |
| 85 | +available (the server already supports it locally via |
| 86 | +`GET /openapi.json`), switch to fetching and parsing that. Same |
| 87 | +load-at-import + silent-fallback pattern. |
| 88 | + |
| 89 | +Both phases share: |
| 90 | +1. Check for cached inventory (via cache-manager, with TTL). |
| 91 | +2. Attempt to fetch and parse the API schema. |
| 92 | +3. On failure, fall back to static definitions in `_constants.py`. |
| 93 | +4. Expose: `endpoints()`, `params(endpoint)`, `allowed_values(endpoint, |
| 94 | + param)`. |
| 95 | + |
| 96 | +### 3. Query Validation (`_query.py`) |
| 97 | + |
| 98 | +`QueryBuilder.build(endpoint, **params)` validates against inventory: |
| 99 | +- Endpoint exists |
| 100 | +- Param names recognized |
| 101 | +- Param types correct (string[], string, bool, enum) |
| 102 | +- Values in allowed set (if constrained) |
| 103 | +- Required params present |
| 104 | + |
| 105 | +Raises specific exceptions from `_errors.py`. |
| 106 | + |
| 107 | +### 4. Downloads (`_download.py`) |
| 108 | + |
| 109 | +Wraps `download_manager.DownloadManager` with **async support as a goal**: |
| 110 | +- POST with JSON body (the API uses POST endpoints) |
| 111 | +- Cache keyed on URL + body |
| 112 | +- Returns path to cached response file |
| 113 | + |
| 114 | +**Async strategy:** download-manager currently uses synchronous |
| 115 | +requests/pycurl backends. Plan: |
| 116 | +1. Add an async backend to download-manager (e.g. `httpx.AsyncClient`). |
| 117 | +2. Update cache-manager for async-compatible file I/O where needed. |
| 118 | +3. Expose both sync and async interfaces in omnipath-client: |
| 119 | + `client.interactions()` (sync) and `await client.ainteractions()` or an |
| 120 | + async context manager. |
| 121 | +4. Initial implementation is sync-first; async added incrementally to |
| 122 | + download-manager and cache-manager. |
| 123 | + |
| 124 | +### 5. Response Handling (`_response.py`) |
| 125 | + |
| 126 | +**Format-agnostic dispatch** — designed to accommodate future response formats: |
| 127 | + |
| 128 | +```python |
| 129 | +def parse_response( |
| 130 | + source: Path | BytesIO, |
| 131 | + format: str = 'parquet', # future: 'json', 'csv', 'arrow_ipc', ... |
| 132 | + backend: BackendType = 'polars', |
| 133 | +) -> Any: |
| 134 | +``` |
| 135 | + |
| 136 | +- **Parquet** (current default): read via `pyarrow.parquet.read_table()`, |
| 137 | + convert to backend. |
| 138 | +- **Future formats**: JSON, CSV, Arrow IPC, etc. — each gets a reader |
| 139 | + function, dispatched by `format`. |
| 140 | +- Default backend: **polars** (Arrow-native, fast, matches annnet's Polars |
| 141 | + backend). |
| 142 | +- narwhals used as compatibility bridge for DataFrame operations. |
| 143 | +- The `format` is determined from the endpoint definition in the inventory, so |
| 144 | + adding a new format only requires a reader function and updating the |
| 145 | + endpoint metadata. |
| 146 | + |
| 147 | +### 6. Graph Conversion (`_graph.py`) |
| 148 | + |
| 149 | +For interactions: map `member_a_id`/`member_b_id` to source/target edges, |
| 150 | +create `annnet.Graph`. For associations: parent-member relationships as |
| 151 | +hyperedges. |
| 152 | + |
| 153 | +### 7. Error Hierarchy (`_errors.py`) |
| 154 | + |
| 155 | +``` |
| 156 | +OmniPathError |
| 157 | + +-- OmniPathAPIError (HTTP 4xx/5xx) |
| 158 | + +-- OmniPathConnectionError |
| 159 | + +-- ValidationError |
| 160 | + | +-- UnknownEndpointError |
| 161 | + | +-- UnknownParameterError |
| 162 | + | +-- InvalidParameterValueError |
| 163 | + | +-- MissingParameterError |
| 164 | + +-- BackendNotAvailableError |
| 165 | +``` |
| 166 | + |
| 167 | +### 8. Config, Logging & Session (`_session.py`) |
| 168 | + |
| 169 | +**All configuration and logging uses pkg_infra (`saezlab_core`).** |
| 170 | + |
| 171 | +- **Config**: `saezlab_core.config.ConfigLoader.load_config()` provides |
| 172 | + hierarchical OmegaConf/YAML config merged from: ecosystem -> package |
| 173 | + defaults -> user dir -> workdir -> env vars. omnipath-client ships a |
| 174 | + `default_settings.yaml` (bundled as package data under |
| 175 | + `omnipath_client/data/`) with its own settings under a dedicated section |
| 176 | + (e.g. `omnipath_client:` with keys `base_url`, `backend`, `cache_ttl`, |
| 177 | + `timeout`, `retries`). |
| 178 | + |
| 179 | +- **Logging**: `saezlab_core.logger.configure_loggers_from_omegaconf()` + |
| 180 | + standard `logging.getLogger(__name__)` in each module. Logging config lives |
| 181 | + in the same YAML hierarchy. |
| 182 | + |
| 183 | +- **Session**: `saezlab_core.session.get_session()` singleton ties config + |
| 184 | + logger + runtime metadata. The client calls this once at initialization. |
| 185 | + |
| 186 | +**pkg_infra considerations**: The current `Settings` schema in |
| 187 | +`saezlab_core.schema` uses `extra="forbid"` on all models. For |
| 188 | +omnipath-client to add its own config section, the schema needs to either: |
| 189 | +(a) allow extra fields at the top level, or (b) support a generic |
| 190 | +plugin/package config namespace. This is an upstream change to pkg_infra that |
| 191 | +should be designed to work for all saezlab client packages, not just |
| 192 | +omnipath-client. |
| 193 | + |
| 194 | +Similarly, `ConfigLoader.read_package_default()` currently reads from |
| 195 | +`saezlab_core.data` — it needs to be parameterized to also read defaults |
| 196 | +from the calling package's data directory. This upstream generalization |
| 197 | +benefits all packages using pkg_infra. |
| 198 | + |
| 199 | +## Cross-cutting: Upstream Work Required |
| 200 | + |
| 201 | +| Package | Work needed | |
| 202 | +|----------------------|----------------------------------------------------------| |
| 203 | +| **pkg_infra** | Generalize config schema for per-package sections; | |
| 204 | +| | parameterize `read_package_default()` for caller package;| |
| 205 | +| | evaluate session API for client-library use | |
| 206 | +| **download-manager** | Add async download backend (httpx); ensure POST+JSON | |
| 207 | +| | support | |
| 208 | +| **cache-manager** | Async-compatible file I/O if needed by async download | |
| 209 | + |
| 210 | +## Testing Strategy |
| 211 | + |
| 212 | +- **Unit tests**: mock inventory, query validation, response conversion, graph |
| 213 | + conversion |
| 214 | +- **Integration tests**: mock HTTP server returning sample Parquet, full |
| 215 | + round-trip; also tests against local server |
| 216 | + (`uv run uvicorn api_service.main:app --port 8081`) |
| 217 | +- **Fixtures**: pre-generated small Parquet files, mock DownloadManager |
| 218 | + |
| 219 | +## Implementation Order |
| 220 | + |
| 221 | +1. `_errors.py`, `_types.py`, `_constants.py` — foundations |
| 222 | +2. `_endpoints.py` — dataclasses for endpoint/param definitions |
| 223 | +3. `_session.py` — integrate pkg_infra config/logging/session (upstream |
| 224 | + updates to pkg_infra as needed) |
| 225 | +4. `_inventory.py` — HTML parsing now, `openapi.json` soon; load at import, |
| 226 | + fail silently |
| 227 | +5. `_query.py` — query builder with validation |
| 228 | +6. `_download.py` — sync wrapper first; async in download-manager as follow-up |
| 229 | +7. `_response.py` — Parquet reader first, format dispatch for future |
| 230 | + extensibility |
| 231 | +8. `_graph.py` — annnet conversion |
| 232 | +9. `_client.py` — orchestrator + public API |
| 233 | +10. `__init__.py` — re-exports + import-time inventory load |
| 234 | +11. Tests throughout |
0 commit comments