qol: exception formatting #2715

zilto · 2025-06-04T15:09:51Z

Description

The majority of exceptions use f-strings and interpolate variables into the exception message. However, they don't highlight when values are interpolated nor name the entities. This makes some user-facing message confusing.

Example

# current exception
f"A data source {func_name} of a transformer {resource_name} is an undecorated function."
# could render as
"A data source source of a transformer transformer is an undecorated function." 

# after changes
"A data source `source` of a transformer `transformer` is an undecorated function."

Changes

Use backticks ` to enclose reference to code objects and interpolated variables

Use f-string formating {variable_name=:} for clearer exception messages. This also simplifies maintenance when renaming args

column_mode = "freeze"
# from
f"{column_mode} column mode not implemented for Pydantic validation"
# to 
f"`{column_mode=:}` not implemented for Pydantic validation"
# interpolates to
"`column_mode='freeze'` not implemented for Pydantic validation"

Add to dlt.common.exceptions the TypeErrorWithKnownTypes and ValueErrorWithKnownValues to streamline a two common error messages

compression = "foo"
# from
ValueError('The argument `compression` must have one of the following values: "auto", "enable", "disable".')
# to
ValueErrorWithKnownValues("compression", compression, ["auto", "enable", "disable"])
# renders as
'Received invalid value `compression="foo"`.  Valid values are: ["auto", "enable", "disable"]'

netlify · 2025-06-04T15:09:55Z

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`b117aa5`
🔍 Latest deploy log	https://app.netlify.com/projects/dlt-hub-docs/deploys/68494f73e4daa00008dc7cbd

rudolfix · 2025-06-06T08:25:03Z

I briefly looked at the failed tests and except:
SELECT o,t,h,e,r,_,d,e,c,i,m,a,l FROM items which fails the lineage test I do not see anything suspicious.

FAILED tests/destinations/test_utils.py::test_get_resource_for_adapter - Failed: DID NOT RAISE <class 'ValueError'>
this is pretty weird, You added new ValueError but derived correctly so all tests should still work...

zilto · 2025-06-06T13:49:51Z

FAILED tests/destinations/test_utils.py::test_get_resource_for_adapter - Failed: DID NOT RAISE <class 'ValueError'>
this is pretty weird, You added new ValueError but derived correctly so all tests should still work...

That's a fun one. it seems that the code doesn't raise any exception (took me several minute reading the code)!

    if isinstance(data, DltSource):
        if len(data.selected_resources.keys()) == 1:
            return list(data.selected_resources.values())[0]
        else:
            # no `raise` keyword
            ValueError(
                "You are trying to use an adapter on a `DltSource` with multiple resources. You can"
                " only use adapters on: pure data, a `DltResouce` or a `DltSource` with a single"
                " `DltResource`."
            )

Good reminder for writing unit tests: red -> green -> refactor. This test never failed!

djudjuu · 2025-06-07T13:56:17Z

dlt/common/configuration/exceptions.py

@@ -23,8 +23,6 @@ class ConfigurationValueError(ConfigurationException, ValueError):
 class ContainerException(DltException):
    """base exception for all exceptions related to injectable container"""



learning question: why not pass?

The pass keyword is not needed since there's a docstring.

details

These are valid ways to define a "empty" class or function in Python (an "empty" function returns None):

# with a pass class MyClass: pass def my_fn(): pass # with a docstring class MyClass: """This is my class""" def my_fn(): """This is my function""" # with ellipsis class MyClass: ... def my_fn(): ... # ellipsis is typically used inline class MyClass: ... def my_fn(): ...

Those are all equivalent and can be combined. Personally, I find pass to be the most ambiguous because it can be used for other purposes in loops.

"Empty" classes and functions are mostly used in typing stubs and @typing.overload to define type signatures.

# dlt.common.destination.dataset class SupportsReadableRelation: # ... @overload def __getitem__(self, column: str) -> Self: ... @overload def __getitem__(self, columns: Sequence[str]) -> Self: ... def __getitem__(self, columns: Union[str, Sequence[str]]) -> Self: """Returns a new relation with the given columns selected. Args: columns (Union[str, Sequence[str]]): The columns to select. Returns: Self: The relation with the columns selected. """ raise NotImplementedError("`__getitem__()` method is not supported for this relation")

dlt/common/configuration/exceptions.py

tests/sources/rest_api/configurations/test_resolve_config.py

dlt/common/data_writers/exceptions.py

dlt/common/data_writers/writers.py

djudjuu · 2025-06-08T15:09:33Z

dlt/common/libs/pyarrow.py

+        pivoted_rows = np.asarray(rows, dtype="object", order="K").T
    return {


praise: nice fix! apparently 'k' didn't do anything, but maybe it then defaulted to K...
was there a no test for this? maybe we should consider adding it, if it's easy

oh, I completely missed this. Good catch!

The numpy docs version 2.2 (latest) and 1.26 indicate that order="K" is the default value. Then, I think the change from "k" (did nothing and defaulted to "K") to "K" is safe

dlt/common/libs/pydantic.py

dlt/common/runtime/run_context.py

dlt/common/schema/schema.py

djudjuu · 2025-06-08T15:31:46Z

dlt/common/schema/utils.py

@@ -440,7 +441,7 @@ def merge_columns(
    * incomplete columns in `columns_a` that got completed in `columns_b` are removed to preserve order
    """
    if columns_partial is False:
-        raise NotImplementedError("columns_partial must be False for merge_columns")
+        raise NotImplementedError("`columns_partial` must be `False` for `merge_columns`")


nitpick: I don't think the error message is helpful here, as columns_partial being false is exactly what raises the error here, isn't it

not a problem this pr created , but it could fix it

I'm unsure. I think it's legacy code from a refactoring. The developer didn't want to change the function signature by removing columns_partial (this would break downstream code), but instead added an explicit exception with a more useful error message.

Tip: when using git or GitHub, you can use git blame to blame who made the changes :). It points to specific commit where you'll find a useful commit message or PR hopefully.

dlt/common/storages/exceptions.py

dlt/common/storages/normalize_storage.py

djudjuu · 2025-06-08T15:40:59Z

dlt/common/storages/transactional_file.py

@@ -114,8 +114,8 @@ def _sync_locks(self) -> t.List[str]:
            output.append(name)
        if not output:
            raise RuntimeError(
-                f"When syncing locks for path {self.path} and lock {self.lock_path} no lock file"
-                " was found"
+                f"Lock syncing failed. No lock file found for path `{self.path}` and lock"


praise: I like this format much better. is there a wor for it?
in emails or conversations, I call it frontloading

I don't know, but "frontloading" is a good name for it! I don't have a rigorous process, but I try to write exceptions:

what's wrong in the fewest words possible. Here, it would be even better to raise a LockSyncingError

explain why the error is raised

include hints to start debugging

make it explicit when you interpolate a variable

keep long variables at the end (e.g., self.path)

Didn't want to rewrite all exceptions, but this one was hiding the key message at the very end "no lock file was found"

dlt/common/validation.py

dlt/helpers/dbt_cloud/client.py

dlt/load/exceptions.py

dlt/pipeline/configuration.py

dlt/sources/helpers/rest_client/paginators.py

djudjuu

hey thierry,
what a ride. Reading nothing but exception messages is kinda fun. never did that before.

I found only two or three places that really need changes imo (prefixed my comment with issue:, the rest is just a bunch of small things to adjust like missing backticks or ' that failed to get replaced.
I corrected or marked many of them, not all, but I think the codebase will be more unified for sure after this PR. (achieving 100% similarity wouldnt be a wise use of our time for sure)

the other main issue is that you probably need to adjust a couple more test-messages to match the new messages.

I'm on vacation until next week, so I won't mark this as changes requested to no tblock this from being merged becasue I am slow to respond, but my requested changes would be: the issues i marked and make the tests pass. then its green from me.

dlt/sources/rest_api/config_setup.py

rudolfix

LGTM! tests are passing

* first pass formatting raise statements * added custom exception * updated Exception classes * linting & formatting * fixed tests * fix imports * capitalize argument to please mypy * add raise keyword * applied review comments * passing tests * fixed test * fixed broken check logic * format * fixes bigquery test --------- Co-authored-by: Marcin Rudolf <[email protected]>

zilto force-pushed the qol/clearer-exceptions branch from b09a52b to f48df5a Compare June 4, 2025 15:12

zilto added the QoL Quality of Life label Jun 4, 2025

zilto self-assigned this Jun 4, 2025

zilto marked this pull request as ready for review June 4, 2025 19:54

sh-rp requested a review from djudjuu June 5, 2025 16:26

zilto force-pushed the qol/clearer-exceptions branch from 094ce2f to a39b802 Compare June 5, 2025 20:02

djudjuu reviewed Jun 7, 2025

View reviewed changes

dlt/common/configuration/exceptions.py Show resolved Hide resolved

djudjuu reviewed Jun 7, 2025

View reviewed changes

dlt/common/configuration/exceptions.py Show resolved Hide resolved

djudjuu reviewed Jun 7, 2025

View reviewed changes

tests/sources/rest_api/configurations/test_resolve_config.py Outdated Show resolved Hide resolved

djudjuu reviewed Jun 7, 2025

View reviewed changes

dlt/common/data_writers/exceptions.py Show resolved Hide resolved

djudjuu reviewed Jun 7, 2025

View reviewed changes

dlt/common/data_writers/writers.py Outdated Show resolved Hide resolved