Add semantic search() to TripleStoreService via per-subject vectorization

## Summary

Add a `search(query: str, ...)` method to `TripleStoreService` that performs
semantic search across the knowledge graph by vectorizing each subject's TTL
representation and querying via a vector store. Results return matched
subjects with their types, ranked by similarity, optionally filtered by graph,
type, or schema/instance kind.

## Motivation

Today, finding things in the graph requires knowing exactly what you're looking
for and writing SPARQL. We need a way to discover entities, classes, and
properties from natural-language queries — e.g. searching "Border Collie"
should surface the `Dog` class even when no instance literally contains that
string, by leveraging an LLM-based embedding model.

This will also unlock downstream features (agent retrieval, UI search, ontology
exploration) that don't have a fixed query shape.

## Blocked by

- #879 (per-subject TTL files in versionstore)
- Predecessor of #879

The vectorizer consumes per-subject TTL documents and reacts to subject-level
change events. #879 is the natural source of both, and waiting avoids
duplicating that responsibility inside the vectorizer.

## Design

### Granularity: per-subject TTL

Each subject's full CBD/TTL representation is embedded as a single document.
This uniformly covers instances, classes (`owl:Class`), and properties
(`owl:DatatypeProperty`, `owl:ObjectProperty`) — a class is just another
subject whose TTL contains `rdfs:label`, `rdfs:comment`, etc. No separate
schema-term pipeline.

The class/instance/property distinction is encoded in **metadata**, not in
separate code paths.

Rejected alternatives:
- **Per-triple** — too shallow; embeddings of single SPOs lack context, and
  re-indexing is fine-grained but the recall is poor.
- **Hybrid per-subject + per-schema-term** — redundant once schema terms are
  themselves subjects with TTL.

### Indexing topology: async worker

A separate `vectorizer-worker` container, same image as `abi`, different
command. Subscribes to subject-level change events from the versionstore
(post-#879), debounces per-subject, embeds, upserts to Qdrant.

The triple store path stays untouched — `insert()` / `remove()` do not block
on embedding.

### Embedder configuration

Embedder config lives in the TripleStoreService engine configuration. The
collection name (or its stored fingerprint metadata) encodes
`{embedder_id, model_version, dim, normalization}`. On mismatch at startup,
a new collection is created and a full reindex is triggered as a Dagster job;
the old collection serves until the new one is warm, then the swap happens.

### Vector metadata schema

```
{
  "subject_uri":  str,
  "graph_name":   str,
  "types":        list[str],   # rdf:type URIs
  "is_schema":    bool,        # derived: types ∩ {owl:Class, owl:*Property} ≠ ∅
                               # or graph == schema graph
  "namespace":    str,
  "lang":         str | None,
}
```

These filters are hard to add later without re-indexing — included from day one.

### Search API

```python
def search(
    self,
    query: str,
    *,
    graph: URIRef | None = None,
    types: list[URIRef] | None = None,
    is_schema: bool | None = None,
    k: int = 10,
    score_threshold: float | None = None,
) -> list[SearchHit]:
    ...

@dataclass
class SearchHit:
    subject: URIRef
    types: list[URIRef]
    score: float
    graph: URIRef
```

Implementation: embed query → vector store search with metadata filters →
SPARQL hydration of `?s a ?type` for each hit → return.

## Phases

### Phase 1 — `ISubjectDocumentSource` port + versionstore adapter

- Define port: `get_subject_document(s) -> str` (turtle),
  `subscribe_changes(callback)`.
- Implement `VersionStoreSubjectDocumentSource` reading per-subject TTL files
  from #879's versionstore.
- Tests: generic adapter test suite + versionstore-specific tests.

### Phase 2 — Vectorizer worker

- New app: `naas_abi_core/apps/workers/vectorizer/`.
- Subscribes to `ISubjectDocumentSource`, debounces per-subject changes
  (configurable window, default ~500ms), embeds full TTL, upserts to Qdrant
  with metadata schema above.
- Handles delete events: removes the subject's vector from the collection.
- New `vectorizer-worker` service in `abi/docker-compose.yml` — same image,
  different command, no inbound ports, healthcheck overridden to `pgrep`.
- Crash-only design; relies on `restart: unless-stopped` + RabbitMQ acks for
  redelivery.

### Phase 3 — Embedder config + collection fingerprinting

- Embedder config added to TripleStoreService engine config.
- Collection fingerprint stored as Qdrant collection metadata (or encoded in
  name). Mismatch at worker startup → create new collection, schedule full
  reindex via Dagster, swap on completion.
- Dagster job: full reindex (iterates all subjects from the document source).

### Phase 4 — `TripleStoreService.search(...)`

- Implement search method as specified above.
- SPARQL hydration step uses existing `query_view` / `query`.
- Tests covering: text match, filter by graph, filter by types, `is_schema`,
  threshold, empty results.

## Out of scope

- Re-ranking (cross-encoder) — can come later if recall@k is fine but ranking
  isn't.
- Multi-language query routing.
- Hybrid BM25 + vector — possible follow-up if pure vector search has gaps on
  exact identifier matches.

## Acceptance criteria

- [ ] `TripleStoreService.search("Border Collie")` returns the `Dog` class
      (assuming standard `rdfs:label` / `rdfs:comment` are present) with a
      reasonable score.
- [ ] Inserting a new triple via `TripleStoreService.insert()` results in the
      affected subject's vector being updated within the debounce window,
      without blocking the insert call.
- [ ] Changing the embedder config triggers a full reindex into a new
      collection without downtime on existing search queries.
- [ ] `vectorizer-worker` container restarts cleanly and resumes processing
      from the bus without duplicating embeddings (idempotent upsert by
      subject URI).
- [ ] Filtering by `graph`, `types`, and `is_schema` works as documented.

## Notes on embedder choice

The "Border Collie → Dog" semantic match relies on the embedder having
sufficient world knowledge from pretraining. API-based models (OpenAI
`text-embedding-3-*`, Voyage, Cohere) handle this comfortably; smaller local
models (`all-MiniLM`, `embeddinggemma`) work but recall/ranking will be
weaker. Configurable per the engine config in Phase 3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add semantic search() to TripleStoreService via per-subject vectorization #916

Summary

Motivation

Blocked by

Design

Granularity: per-subject TTL

Indexing topology: async worker

Embedder configuration

Vector metadata schema

Search API

Phases

Phase 1 — `ISubjectDocumentSource` port + versionstore adapter

Phase 2 — Vectorizer worker

Phase 3 — Embedder config + collection fingerprinting

Phase 4 — `TripleStoreService.search(...)`

Out of scope

Acceptance criteria

Notes on embedder choice

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add semantic search() to TripleStoreService via per-subject vectorization #916

Description

Summary

Motivation

Blocked by

Design

Granularity: per-subject TTL

Indexing topology: async worker

Embedder configuration

Vector metadata schema

Search API

Phases

Phase 1 — ISubjectDocumentSource port + versionstore adapter

Phase 2 — Vectorizer worker

Phase 3 — Embedder config + collection fingerprinting

Phase 4 — TripleStoreService.search(...)

Out of scope

Acceptance criteria

Notes on embedder choice

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Phase 1 — `ISubjectDocumentSource` port + versionstore adapter

Phase 4 — `TripleStoreService.search(...)`