refactor: replace YAML config loading with pydantic-settings for env var support (#6)

wadhah101 · web-flow · commit ffc5459551a3 · 2026-05-27T10:38:09.000-04:00
* refactor: implement Pydantic-based configuration management

- Create `config.py` with Pydantic models for structured settings validation
- Replace manual YAML parsing and dotenv loading with `BaseSettings`
- Update `api_url` in `braintest.yaml` to the staging internal endpoint
- Integrate `load_config` helper across `main.py`, `evaltest`, `loadtest`, and `functional_test`
- Add `pydantic-settings[yaml]` to project dependencies

The configuration logic is centralized and typed using Pydantic, replacing
fragmented YAML loading across the codebase. This ensures schema validation,
supports environment variable overrides with nested delimiters, and updates the
target Braintrust API URL for the staging environment.

* chore: remove python-dotenv dependency

- Remove `python-dotenv` from `pyproject.toml`
- Delete `load_dotenv` import in `mock_conversation_task.py`
- Remove `load_dotenv()` call from main execution block

Redundant dependency and logic removed as environment variables are
expected to be managed externally or by the runtime environment.

* refactor(config): update braintrust api endpoint

- Replace internal staging URL with official production API URL in `braintest.yaml`

* docs: update README installation and configuration guide

- Correct typo from "virutal" to "virtual"
- Update environment variable instructions to be platform agnostic
- Add Configuration section describing Pydantic settings behavior
- Include table and examples for environment variable overrides

Standardize environment variable documentation and introduce guidance on nested configuration overrides while fixing typographical errors.

* docs: update environment variable setup instructions in README

- Replace manual environment variable export step with .env file creation
- Add reference to example.env for configuration template

Updates the setup documentation to recommend using a .env file instead of manual exports for easier environment configuration management.

* build: add python-dotenv dependency and initialize in scripts

- Add `python-dotenv` to `pyproject.toml` dependencies
- Import and call `load_dotenv()` in `evaltest/run.py`
- Import and call `load_dotenv()` in `functional_test/run.py`
- Import and call `load_dotenv()` in `loadtest/mock_conversation_task.py`
- Import and call `load_dotenv()` in `loadtest/run.py`

Enable automatic loading of environment variables from .env files across all test suites and scripts. This ensures that sensitive configurations and API keys are consistently available in local and CI environments without manual exports.

* refactor: centralize configuration loading logic

- Remove redundant `load_config` definitions from task files
- Import `load_config` from shared `config` module
- Remove unused `yaml` import in load test tasks

Shared configuration logic reduces code duplication and ensures consistent
loading of the braintest.yaml file across different mock task modules.
diff --git a/README.md b/README.md
@@ -27,7 +27,7 @@ Each test is highly configurable via the `braintest.yaml` config file. The tests
    ```bash
    uv sync
    ```
-3. Activate the virutal env uv creates if it isn't already activated
+3. Activate the virtual env uv creates if it isn't already activated
    ```bash
    source .venv/bin/activate
    ```
@@ -50,5 +50,24 @@ Each test is highly configurable via the `braintest.yaml` config file. The tests
    nohup python main.py > loadtest.out 2>&1 &
    ```
 
+## Configuration
+
+Configuration is loaded from `braintest.yaml` using [pydantic-settings](https://docs.pydantic.dev/latest/concepts/pydantic_settings/). Environment variables take priority over YAML values.
+
+To override any config value via environment variable, use `__` (double underscore) as the nested separator. For example:
+
+| YAML path | Environment variable |
+|---|---|
+| `braintrust.api_url` | `BRAINTRUST__API_URL` |
+| `braintrust.project_name` | `BRAINTRUST__PROJECT_NAME` |
+| `loadtest.processes` | `LOADTEST__PROCESSES` |
+| `evaltest.trial_count` | `EVALTEST__TRIAL_COUNT` |
+| `functionaltest.name_prefix` | `FUNCTIONALTEST__NAME_PREFIX` |
+
+Example:
+```bash
+BRAINTRUST__API_URL=https://my-api.example.com LOADTEST__PROCESSES=8 python main.py
+```
+
 ## Important Notes
 - No actual LLM calls are made in any of these tests. Everything is mocked. The purpose is to load test Braintrust infra, not the LLM provider.
diff --git a/config.py b/config.py
@@ -0,0 +1,96 @@
+from pydantic import BaseModel, Field
+from pydantic_settings import BaseSettings, SettingsConfigDict, YamlConfigSettingsSource
+
+
+class BraintrustConfig(BaseModel):
+    project_name: str = "load-testing-project"
+    api_url: str = ""
+
+
+class FunctionalTestConfig(BaseModel):
+    run: bool = False
+    name_prefix: str = "functional-test"
+
+
+class DatasetConfig(BaseModel):
+    name: str = "test-large-dataset"
+    description: str = ""
+    size: int = 100
+    flush_batch_size: int = 25
+
+
+class EvalTestConfig(BaseModel):
+    run: bool = False
+    project_id: str | None = None
+    name: str = "test-large"
+    trial_count: int = 1
+    dataset: DatasetConfig = DatasetConfig()
+
+
+class WaitTimeConfig(BaseModel):
+    min: int = 5
+    max: int = 10
+
+
+class ReadTrafficConfig(BaseModel):
+    peak_concurrency: int = 2
+    btql_calls_per_min: float = 10
+
+
+class LoadTestParams(BaseModel):
+    faker_pool_size: int = 20
+    max_tokens: int = 1000
+    peak_concurrency: int = 20
+    ramp_up: int = 2
+    run_time: str = "1m"
+    wait_time: WaitTimeConfig = WaitTimeConfig()
+    read_traffic: ReadTrafficConfig = ReadTrafficConfig()
+
+
+class BraintrustLoggerConfig(BaseModel):
+    flush_size: int = 100
+    queue_size: int = 25000
+
+
+class LogsConfig(BaseModel):
+    model_config = {"populate_by_name": True}
+
+    html: bool = True
+    csv: bool = False
+    json_log: bool = Field(False, alias="json")
+
+
+class LoadTestConfig(BaseModel):
+    run: bool = False
+    locustfile_path: str = "loadtest/run.py"
+    headless: bool = False
+    web_ui_port: int = 8089
+    processes: int = 4
+    connection_pool_size: int = 10
+    braintrust_logger: BraintrustLoggerConfig = BraintrustLoggerConfig()
+    params: LoadTestParams = LoadTestParams()
+    logs: LogsConfig = LogsConfig()
+
+
+class Settings(BaseSettings):
+    model_config = SettingsConfigDict(
+        yaml_file="braintest.yaml",
+        env_nested_delimiter="__",
+    )
+
+    braintrust: BraintrustConfig = BraintrustConfig()
+    functionaltest: FunctionalTestConfig = FunctionalTestConfig()
+    evaltest: EvalTestConfig = EvalTestConfig()
+    loadtest: LoadTestConfig = LoadTestConfig()
+
+    @classmethod
+    def settings_customise_sources(cls, settings_cls, **kwargs):
+        return (
+            kwargs["env_settings"],
+            YamlConfigSettingsSource(settings_cls),
+            kwargs["init_settings"],
+        )
+
+
+def load_config() -> dict:
+    return Settings().model_dump(by_alias=True)
diff --git a/evaltest/run.py b/evaltest/run.py
@@ -2,22 +2,14 @@
 from braintrust import init_logger, init_dataset, Eval
 from autoevals import Levenshtein, ExactMatch
 from dotenv import load_dotenv
-import yaml
 from faker import Faker
 import random
+from config import load_config
 from util import http_client
 
+load_dotenv()
 fake = Faker()
 
-
-def load_config() -> dict:
-    load_dotenv()
-    with open("./braintest.yaml", "r") as f:
-        config = yaml.safe_load(f)
-
-    return config
-
-
 config = load_config()
 
 
diff --git a/functional_test/run.py b/functional_test/run.py
@@ -5,9 +5,9 @@
 from urllib.parse import urlencode
 
 import requests
-import yaml
 from dotenv import load_dotenv
 
+from config import load_config
 from util import http_client
 
 
@@ -905,15 +905,6 @@ def _unique_env_var_name(self, prefix: str) -> str:
         )
 
 
-def load_config() -> dict[str, Any]:
-    with open("./braintest.yaml", "r") as file_handle:
-        loaded = yaml.safe_load(file_handle)
-    if not isinstance(loaded, dict):
-        print(
-            "[functionaltest] braintest.yaml did not parse to an object. Using empty config."
-        )
-        return {}
-    return loaded
 
 
 def run() -> bool:
diff --git a/loadtest/mock_conversation_task.py b/loadtest/mock_conversation_task.py
@@ -1,10 +1,10 @@
 import json
 import os
 import random
-import yaml
 from dotenv import load_dotenv
 from faker import Faker
 from braintrust import traced, current_span, start_span, JSONAttachment, init_logger
+from config import load_config
 
 fake = Faker()
 
@@ -62,12 +62,6 @@
 ]
 
 
-def load_config() -> dict:
-    with open("./braintest.yaml", "r") as f:
-        config = yaml.safe_load(f)
-    return config
-
-
 config = load_config()
 
 
diff --git a/loadtest/mock_default_task.py b/loadtest/mock_default_task.py
@@ -2,21 +2,14 @@
 import time
 from braintrust import traced, current_span, JSONAttachment, init_logger
 from faker import Faker
-import yaml
+from config import load_config
 
 fake = Faker()
 
 MAX_SPAN_SIZE = 5 * 1024 * 1024  # 5MB
 QUERY_TYPES = ["factual", "coding", "analytical", "creative", "conversational"]
 
 
-def load_config() -> dict:
-    with open("./braintest.yaml", "r") as f:
-        config = yaml.safe_load(f)
-
-    return config
-
-
 config = load_config()
 
 
diff --git a/loadtest/run.py b/loadtest/run.py
@@ -2,10 +2,10 @@
 import requests
 import os
 import random
-import yaml
 from faker import Faker
 from loadtest.mock_conversation_task import mock_multiturn_conversation
 from loadtest.braintrust_http_metrics import BraintrustMetricsAdapter, BraintrustMetricsEmitter
+from config import load_config
 from util import http_client
 from dotenv import load_dotenv
 from urllib.parse import urlparse
@@ -16,12 +16,6 @@
 fake = Faker()
 
 
-def load_config():
-    with open("./braintest.yaml", "r") as f:
-        config = yaml.safe_load(f)
-    return config
-
-
 config = load_config()
 _LOGGER_INITIALIZED = False
 _BT_METRICS_EMITTER = None
diff --git a/main.py b/main.py
@@ -3,18 +3,13 @@
 Main script to orchestrate functionaltest, evaltest, and loadtest execution
 based on braintest.yaml config.
 """
-import yaml
 import subprocess
 import sys
 import os
 import signal
 from datetime import datetime
 
-
-def load_config(config_path="braintest.yaml"):
-    with open(config_path, "r") as f:
-        config = yaml.safe_load(f)
-    return config
+from config import load_config
 
 
 def run_evaltest(config):
@@ -252,9 +247,6 @@ def main():
     except FileNotFoundError as e:
         print(f"Error: Configuration file not found - {e}")
         sys.exit(1)
-    except yaml.YAMLError as e:
-        print(f"Error: Failed to parse YAML configuration - {e}")
-        sys.exit(1)
     except Exception as e:
         print(f"Fatal error: {e}")
         sys.exit(1)
diff --git a/pyproject.toml b/pyproject.toml
@@ -9,6 +9,7 @@ dependencies = [
     "braintrust>=0.10.0",
     "faker>=40.1.2",
     "locust>=2.43.2",
+    "pydantic-settings[yaml]>=2.6.0",
     "python-dotenv>=1.2.1",
     "pyyaml>=6.0.3",
 ]
diff --git a/uv.lock b/uv.lock

Original file line number	Diff line number	Diff line change
`@@ -9,6 +9,7 @@ dependencies = [`
`9`	`9`	`"braintrust>=0.10.0",`
`10`	`10`	`"faker>=40.1.2",`
`11`	`11`	`"locust>=2.43.2",`
	`12`	`+ "pydantic-settings[yaml]>=2.6.0",`
`12`	`13`	`"python-dotenv>=1.2.1",`
`13`	`14`	`"pyyaml>=6.0.3",`
`14`	`15`	`]`