QE — Questions Écrites

Ingests French parliamentary written questions (Assemblée Nationale + Sénat) from DILA open data into a local PostgreSQL database.

Installation

# Start Postgres locally
docker compose up postgres -d

# Install dependencies
poetry install

# Run database migrations
poetry run alembic upgrade head

The Postgres service is available at postgresql://qe:qe@localhost:5433/qe by default. Override with PGHOST, PGPORT, PGUSER, PGPASSWORD, and PGDATABASE environment variables.

Download open data archives

Downloads .taz archives from the DILA open data server. Use --years 2 to include the current year plus the 2 prior calendar years:

poetry run python scripts/download_opendata.py --dir data/opendata/ --years 2

Files already present are skipped automatically (idempotent).

Ingest into PostgreSQL

Parses the downloaded archives and upserts questions into the questions table:

poetry run python scripts/ingest_opendata.py --dir data/opendata/

Add --dry-run to parse archives without writing to the database.

Compute question clusters

Embeds all questions into Qdrant, then clusters them by semantic similarity and saves the results to PostgreSQL.

# 1. Start Qdrant
docker compose up qdrant -d

# 2. Embed questions (all statuses, so the UI can filter later)
poetry run python scripts/embed_questions.py

# 3. Cluster and persist to DB
poetry run python scripts/cluster_questions.py

Results are stored in question_cluster_runs and question_cluster_members. Re-run step 3 at any time to refresh — each run creates a new set of rows.

Assign questions to offices

Routes each QE to the most relevant office based on office responsibility descriptions.

1. Ingest office responsibilities

Place XLSX files in data/office_responsibilities/. Each file must have columns: direction, office_id, office_name, responsibilities, keywords.

poetry run python scripts/ingest_office_responsibilities.py

Re-run at any time — unchanged files are skipped automatically.

2. Assign a question

poetry run python scripts/assign_qe_to_office.py --question "Quel est le montant du RSA ?"

Returns a ranked JSON list of offices. Options:

--top-k 20        candidates retrieved per query unit (default: 20)
--top-offices 5   offices to return (default: 5)
--collection      Qdrant collection name (default: office_responsibilities)

Evaluate assignment quality

Measures Hit@1/3/5 and MRR against a ground-truth XLSX file with columns question_id, question_text, expected_office_id:

poetry run python scripts/eval_office_assignment.py --input data/qe_attributions_DGCS.xlsx

Options:

--top-k 20          candidates retrieved per question (default: 20)
--top-offices 10    offices to rank per question (default: 10)
--chunks all        chunk types to search: all, responsibilities, keywords (default: all)

Reset

poetry run python scripts/reset_dbs.py

Attribution API

Exposes office attribution suggestions over HTTP for the frontend (qe-front).

Prerequisites

Both Qdrant collections must be populated before starting the server:

# 1. Office responsibilities
poetry run python scripts/ingest_office_responsibilities.py

# 2. Questions (embed into questions_opendata)
poetry run python scripts/embed_questions.py

Start the server

poetry run uvicorn api.main:app --reload

The server starts on http://localhost:8000 by default.

`GET /api/questions/{question_id}/attributions`

Returns the top 3 office suggestions for a question. The question's embedding is read directly from Qdrant — no call to Socle IA is made.

curl http://localhost:8000/api/questions/AN-17-QE-12345/attributions

{
  "question_id": "AN-17-QE-12345",
  "attributions": [
    {
      "rank": 1,
      "office_id": "...",
      "office_name": "Sous-direction des affaires sociales",
      "direction": "Direction générale du travail",
      "score": 1.8432,
      "confidence": 0.87
    }
  ]
}

confidence is a calibrated 0–1 value (sigmoid of the Albert cross-encoder logit). It is meaningful in absolute terms: values above ~0.7 indicate a strong match; values below ~0.3 indicate the question is likely outside this office's scope.

Optional query param: top_k (default 3).

Environment variables

Variable	Required	Default	Description
`ALBERT_API_KEY`	Yes	—	Albert reranking API key
`QDRANT_URL`	No	`http://localhost:6333`	Qdrant base URL
`CORS_ORIGINS`	No	`http://localhost:3000`	Comma-separated allowed origins

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QE — Questions Écrites

Installation

Download open data archives

Ingest into PostgreSQL

Compute question clusters

Assign questions to offices

1. Ingest office responsibilities

2. Assign a question

Evaluate assignment quality

Reset

Attribution API

Prerequisites

Start the server

`GET /api/questions/{question_id}/attributions`

Environment variables

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

QE — Questions Écrites

Installation

Download open data archives

Ingest into PostgreSQL

Compute question clusters

Assign questions to offices

1. Ingest office responsibilities

2. Assign a question

Evaluate assignment quality

Reset

Attribution API

Prerequisites

Start the server

GET /api/questions/{question_id}/attributions

Environment variables

`GET /api/questions/{question_id}/attributions`