All notable changes to this project will be documented in this file. The format is based on Keep a Changelog.
- Curation decisions are now sorted by cluster size — clusters with more entities surface first in browsing responses
cluster_sizeandreviewed_since_placementfields on decision rows — enables per-cluster review-progress display and filtering by review status- Review counter in decision summary responses — reports how many decisions in each request set have been reviewed since placement
- Cluster-size index: a materialised index tracking the size of every cluster, updated atomically on accept/reject actions
- Decision browsing filter by
reviewed_since_placementstatus - User-action idempotency: repeated curator actions on the same decision are detected and silently ignored
- Reject action now excludes the current cluster placement from the candidate set
- Operational scripts:
backfill_cluster_sizesandverify_cluster_sizesfor upgrading existing deployments;backfill_previous_review_countfor the review-counter backfill; all scripts documented inINSTALL.mdandMakefile
- Decision store: decision document is now rewritten on any material outcome change, preventing stale decisions surviving ERE re-evaluation rounds
- Decision store:
$addFieldsstage in cluster-size aggregation pipeline now runs before the cursor predicate (query correctness fix) - Curation: TOCTOU race condition on concurrent curator actions closed via atomic claim
- Cluster-size index: entries with a zero count are deleted rather than stored
- Backfill script:
reviewed_since_placementis derived from the maximum action date during backfill - Seed script:
seed_dbnow populatescluster_sizesafter seeding decisions
- Removed Meaningfy contact references and attributions from source files, documentation, and licence
- Immediate provisional mode: setting
ERS_COORDINATOR_SINGLE_REQUEST_TIME_BUDGET=0returns a provisional identifier without waiting for ERE - Stateless multi-instance deployment via Redis Pub/Sub cross-instance notification — coordinators on separate instances signal each other when a resolution outcome arrives
- Optional TLS support for Redis connections via
REDIS_TLSenvironment variable ServiceUnavailableErrorpropagated as HTTP 503 on MongoDB and Redis infrastructure outages —resolve, curation, and lookup endpoints return a structured error response instead of crashing- Configurable entity display name field per entity type; returned by the
GET /curation/entity-typesdiscovery endpoint - Exponential backoff with jitter for
OutcomeIntegrationWorkerreconnect on Redis failures - Socket connect timeout for the Redis subscriber worker
- Black-box end-to-end ERSys test suite migrated from er-ops
ERS_SUBSCRIBER_READY_TIMEOUTenvironment variable for startup readiness gate- Notable environment variables documented in README with defaults and descriptions
ERE_REQUEST_CHANNEL/ERE_RESPONSE_CHANNELrenamed toERSYS_REQUEST_QUEUE/ERSYS_RESPONSE_QUEUE— update.envfiles accordinglydisplay_name_fieldproperty renamed toentity_label_fieldin entity-type configuration- Error field in REST API responses renamed from
errortomessage;SERVICE_UNAVAILABLEadded to all error enums POST /refresh-bulknow returns only decisions whose cluster placement changed, reducing payload size for unchanged entities- Startup readiness gate and MongoDB-fallback safety net added to coordinator startup
- Dependency versions pinned to exact values for reproducible builds
- Redis subscriber: reconnect backoff now resets on successful subscribe; subscribed event cleared on
stop();aclose()guaranteed to run when unsubscribe is cancelled; malformed Pub/Sub payloads skipped instead of crashing - Redis client:
TimeoutErrorinpublish_notificationnow wrapped correctly - Configuration: identifier property path and
full_addressfield corrected for real TED data
- Entity resolution pipeline — six core components implemented end-to-end: Request Registry (immutable intake records, idempotency enforcement, snapshot watermarks), RDF Mention Parser (SPARQL-based, configuration-driven extraction), ERE Contract Client (async Redis publisher/subscriber), Resolution Decision Store (MongoDB, optimistic concurrency), ERE Result Integrator (async outcome consumer with at-least-once delivery handling), and Resolution Coordinator (time-budget enforcement, provisional identifier issuance, bulk decomposition)
- ERS REST API (FastAPI):
POST /resolve— entity mention intake with canonical or provisional cluster ID response (Spine A);GET /lookupandPOST /lookup-bulk— current cluster assignment retrieval (Spine C);POST /refreshBulk— delta of changed assignments since last snapshot (Spine C);GET /entity-types— supported entity type discovery - Curation application: human-in-the-loop decision review interface with
POST /accept/POST /rejectbulk operations, role-based user management, user action audit log, user search by email, and automatic ERE re-evaluation request on every curation action - Resolution lookup context enrichment — optional
contextfield on/lookupand/refreshBulkdelta entries carrying the original request context from the Request Registry - OpenTelemetry tracing: SDK integration with structured span naming, auto-instrumentation for FastAPI and async Redis calls, and configurable OTLP exporter
- CI/CD pipeline: GitHub Actions workflow with SonarCloud quality gate,
rufflinting,mypystrict type checking,importlinterarchitecture contract enforcement, and staging environment deployment dispatch