Skip to content

Commit 20c603c

Browse files
authored
fix: specify UTF-8 encoding for all text file I/O (#769)
## Summary Fixes #765 and prevents the same class of bug on every other text I/O path in the codebase. `open()` / `Path.read_text()` / `Path.write_text()` without an explicit `encoding=` argument falls back to the system locale on Windows (CP-1252), which fails on common UTF-8 bytes (smart quotes, em dashes, accented characters, CJK). #674 already fixed this for `manifest.json` after #594. This PR extends the same one-line fix to the remaining unprotected readers/writers found by auditing the `src/` tree: - `prompts/prompts.py` -- `server_instructions.md` (the file that triggers #765 on startup) - `oauth/context_manager.py` -- OAuth context yaml read and write - `config/dbt_yaml.py` -- `dbt_project.yml` / `.user.yml` read - `lsp/lsp_binary_manager.py` -- LSP `.version` file read - `tracking/tracking.py` -- `.user.yml` write It also normalizes the smart quotes, em dashes, and ellipses in `server_instructions.md` to plain ASCII. The file is the only "user-visible" prompt asset and previously contained the byte (`0x9d`) that crashed the server on startup; making it pure ASCII means it can't trip the same bug again, and the rendered prompt looks identical to a model. ## Test plan - [x] `task check` passes (ruff, mypy, formatter, doc generation) - [x] `task test:unit` -- 634 tests pass - [x] `server_instructions.md` is verified pure ASCII (no bytes > 127)
1 parent 6eaaf88 commit 20c603c

7 files changed

Lines changed: 21 additions & 16 deletions

File tree

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kind: Bug Fix
2+
body: 'Specify UTF-8 encoding for all text file reads/writes to fix UnicodeDecodeError on Windows (closes #765)'
3+
time: 2026-05-11T13:25:28.435703+02:00

src/dbt_mcp/config/dbt_yaml.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ def try_read_yaml(file_path: Path) -> dict | None:
1111
alternate_suffix = ".yaml" if suffix == ".yml" else ".yml"
1212
alternate_path = file_path.with_suffix(alternate_suffix)
1313
if file_path.exists():
14-
return yaml.safe_load(file_path.read_text())
14+
return yaml.safe_load(file_path.read_text(encoding="utf-8"))
1515
if alternate_path.exists():
16-
return yaml.safe_load(alternate_path.read_text())
16+
return yaml.safe_load(alternate_path.read_text(encoding="utf-8"))
1717
except Exception:
1818
return None
1919
return None

src/dbt_mcp/lsp/lsp_binary_manager.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -226,7 +226,7 @@ def get_lsp_binary_version(path: str) -> str:
226226
"""
227227
version_file = Path(path).parent / ".version"
228228
if version_file.exists():
229-
return version_file.read_text().strip()
229+
return version_file.read_text(encoding="utf-8").strip()
230230
else:
231231
return subprocess.run(
232232
[path, "--version"], capture_output=True, text=True

src/dbt_mcp/oauth/context_manager.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ def read_context(self) -> DbtPlatformContext | None:
2121
if not self.config_location.exists():
2222
return None
2323
try:
24-
content = self.config_location.read_text()
24+
content = self.config_location.read_text(encoding="utf-8")
2525
if not content.strip():
2626
return None
2727
parsed_content = yaml.safe_load(content)
@@ -61,5 +61,6 @@ def write_context_to_file(self, context: DbtPlatformContext) -> None:
6161
"""Write context to file with proper locking."""
6262
self._ensure_config_location_exists()
6363
self.config_location.write_text(
64-
yaml.dump(context.model_dump(), default_flow_style=False)
64+
yaml.dump(context.model_dump(), default_flow_style=False),
65+
encoding="utf-8",
6566
)
Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
Use these tools to interact with dbt resources (typically via dbt Platform):
22

33
- Understand how the project is structured: what exists in the environment, how objects depend on one another, and where to look when something looks wrong or slow.
4-
- Reason over governed business meaningnamed measures and breakdowns the project maintainsand answer questions with validated aggregates or the warehouse logic behind them when that helps.
4+
- Reason over governed business meaning -- named measures and breakdowns the project maintains -- and answer questions with validated aggregates or the warehouse logic behind them when that helps.
55
- Work with platform automation when available: see what runs on a schedule, inspect past outcomes, act on failures or reruns, and dig into logs and outputs.
66
- Assist with engineering work on dbt projects: take action on the project and reason about the underlying SQL.
77

88
Example data-oriented questions (users may not use dbt vocabulary):
99

10-
- Revenue, ARR, bookings, pipeline, or quota: how are we doing this quarter?”; “split it by region or product”; “this total doesnt match finance.
11-
- Customers, users, signups, or retention: how many active …?” “is churn getting worse?” “cohorts or segmentswhatever we already track.
12-
- Orders, inventory, SKUs, or fulfillment: whats selling?” “stock or backorder questions when they only describe the business problem.
13-
- Funnel, marketing, or web activity: conversion from visit to purchase” “campaign performance with loose wording and no metric names.
14-
- Trust and definitions: where does this dashboard number come from?” “two reports disagreehelp me find why” “whats the official definition of …?”
15-
- Quality and freshness: is this table up to date?” “missing yesterdays data” “something looks stale in our reporting.
16-
- Orchestration tied to data: did last nights refresh finish?” “prod load failed and Finance is blocked without run or job IDs.
17-
- Engineering on a dbt project: help me with this model,” “whats wrong with my project?
10+
- Revenue, ARR, bookings, pipeline, or quota: "how are we doing this quarter?"; "split it by region or product"; "this total doesn't match finance."
11+
- Customers, users, signups, or retention: "how many active ...?" "is churn getting worse?" "cohorts or segments -- whatever we already track."
12+
- Orders, inventory, SKUs, or fulfillment: "what's selling?" "stock or backorder questions" when they only describe the business problem.
13+
- Funnel, marketing, or web activity: "conversion from visit to purchase" "campaign performance" with loose wording and no metric names.
14+
- Trust and definitions: "where does this dashboard number come from?" "two reports disagree -- help me find why" "what's the official definition of ...?"
15+
- Quality and freshness: "is this table up to date?" "missing yesterday's data" "something looks stale in our reporting."
16+
- Orchestration tied to data: "did last night's refresh finish?" "prod load failed and Finance is blocked" without run or job IDs.
17+
- Engineering on a dbt project: "help me with this model," "what's wrong with my project?"

src/dbt_mcp/prompts/prompts.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22

33

44
def get_prompt(name: str) -> str:
5-
return (Path(__file__).parent / f"{name}.md").read_text()
5+
return (Path(__file__).parent / f"{name}.md").read_text(encoding="utf-8")

src/dbt_mcp/tracking/tracking.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,8 @@ def _get_local_user_id(self, settings: DbtMcpSettings) -> str:
7777
self._local_user_id = str(uuid.uuid4())
7878
with suppress(Exception):
7979
Path(user_yaml_path).write_text(
80-
yaml.dump({"id": self._local_user_id})
80+
yaml.dump({"id": self._local_user_id}),
81+
encoding="utf-8",
8182
)
8283
return self._local_user_id
8384

0 commit comments

Comments
 (0)