Skip to content

qol: exception formatting #2715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 11, 2025
Merged

qol: exception formatting #2715

merged 14 commits into from
Jun 11, 2025

Conversation

zilto
Copy link
Collaborator

@zilto zilto commented Jun 4, 2025

Description

The majority of exceptions use f-strings and interpolate variables into the exception message. However, they don't highlight when values are interpolated nor name the entities. This makes some user-facing message confusing.

Example

# current exception
f"A data source {func_name} of a transformer {resource_name} is an undecorated function."
# could render as
"A data source source of a transformer transformer is an undecorated function." 

# after changes
"A data source `source` of a transformer `transformer` is an undecorated function."

Changes

  • Use backticks ` to enclose reference to code objects and interpolated variables
  • Use f-string formating {variable_name=:} for clearer exception messages. This also simplifies maintenance when renaming args
    column_mode = "freeze"
    # from
    f"{column_mode} column mode not implemented for Pydantic validation"
    # to 
    f"`{column_mode=:}` not implemented for Pydantic validation"
    # interpolates to
    "`column_mode='freeze'` not implemented for Pydantic validation"
  • Add to dlt.common.exceptions the TypeErrorWithKnownTypes and ValueErrorWithKnownValues to streamline a two common error messages
    compression = "foo"
    # from
    ValueError('The argument `compression` must have one of the following values: "auto", "enable", "disable".')
    # to
    ValueErrorWithKnownValues("compression", compression, ["auto", "enable", "disable"])
    # renders as
    'Received invalid value `compression="foo"`.  Valid values are: ["auto", "enable", "disable"]'

Copy link

netlify bot commented Jun 4, 2025

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit b117aa5
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/68494f73e4daa00008dc7cbd

@zilto zilto force-pushed the qol/clearer-exceptions branch from b09a52b to f48df5a Compare June 4, 2025 15:12
@zilto zilto added the QoL Quality of Life label Jun 4, 2025
@zilto zilto self-assigned this Jun 4, 2025
@zilto zilto marked this pull request as ready for review June 4, 2025 19:54
@sh-rp sh-rp requested a review from djudjuu June 5, 2025 16:26
@zilto zilto force-pushed the qol/clearer-exceptions branch from 094ce2f to a39b802 Compare June 5, 2025 20:02
@rudolfix
Copy link
Collaborator

rudolfix commented Jun 6, 2025

I briefly looked at the failed tests and except:
SELECT o,t,h,e,r,_,d,e,c,i,m,a,l FROM items which fails the lineage test I do not see anything suspicious.

FAILED tests/destinations/test_utils.py::test_get_resource_for_adapter - Failed: DID NOT RAISE <class 'ValueError'>
this is pretty weird, You added new ValueError but derived correctly so all tests should still work...

@zilto
Copy link
Collaborator Author

zilto commented Jun 6, 2025

FAILED tests/destinations/test_utils.py::test_get_resource_for_adapter - Failed: DID NOT RAISE <class 'ValueError'>
this is pretty weird, You added new ValueError but derived correctly so all tests should still work...

That's a fun one. it seems that the code doesn't raise any exception (took me several minute reading the code)!

    if isinstance(data, DltSource):
        if len(data.selected_resources.keys()) == 1:
            return list(data.selected_resources.values())[0]
        else:
            # no `raise` keyword
            ValueError(
                "You are trying to use an adapter on a `DltSource` with multiple resources. You can"
                " only use adapters on: pure data, a `DltResouce` or a `DltSource` with a single"
                " `DltResource`."
            )

Good reminder for writing unit tests: red -> green -> refactor. This test never failed!

@@ -23,8 +23,6 @@ class ConfigurationValueError(ConfigurationException, ValueError):
class ContainerException(DltException):
"""base exception for all exceptions related to injectable container"""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

learning question: why not pass?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pass keyword is not needed since there's a docstring.

details

These are valid ways to define a "empty" class or function in Python (an "empty" function returns None):

# with a pass
class MyClass:
  pass
  
def my_fn():
  pass
  
# with a docstring
class MyClass:
  """This is my class"""

def  my_fn():
  """This is my function"""
  
# with ellipsis
class MyClass:
  ...

def my_fn():
  ...

# ellipsis is typically used inline
class MyClass: ...

def my_fn(): ...

Those are all equivalent and can be combined. Personally, I find pass to be the most ambiguous because it can be used for other purposes in loops.

"Empty" classes and functions are mostly used in typing stubs and @typing.overload to define type signatures.

# dlt.common.destination.dataset

class SupportsReadableRelation:
    # ...
    @overload
    def __getitem__(self, column: str) -> Self: ...

    @overload
    def __getitem__(self, columns: Sequence[str]) -> Self: ...

    def __getitem__(self, columns: Union[str, Sequence[str]]) -> Self:
        """Returns a new relation with the given columns selected.

        Args:
            columns (Union[str, Sequence[str]]): The columns to select.

        Returns:
            Self: The relation with the columns selected.
        """
        raise NotImplementedError("`__getitem__()` method is not supported for this relation")

Comment on lines +815 to 839
pivoted_rows = np.asarray(rows, dtype="object", order="K").T
return {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: nice fix! apparently 'k' didn't do anything, but maybe it then defaulted to K...
was there a no test for this? maybe we should consider adding it, if it's easy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I completely missed this. Good catch!

The numpy docs version 2.2 (latest) and 1.26 indicate that order="K" is the default value. Then, I think the change from "k" (did nothing and defaulted to "K") to "K" is safe

@@ -440,7 +441,7 @@ def merge_columns(
* incomplete columns in `columns_a` that got completed in `columns_b` are removed to preserve order
"""
if columns_partial is False:
raise NotImplementedError("columns_partial must be False for merge_columns")
raise NotImplementedError("`columns_partial` must be `False` for `merge_columns`")
Copy link
Collaborator

@djudjuu djudjuu Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I don't think the error message is helpful here, as columns_partial being false is exactly what raises the error here, isn't it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a problem this pr created , but it could fix it

Copy link
Collaborator Author

@zilto zilto Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure. I think it's legacy code from a refactoring. The developer didn't want to change the function signature by removing columns_partial (this would break downstream code), but instead added an explicit exception with a more useful error message.

Tip: when using git or GitHub, you can use git blame to blame who made the changes :). It points to specific commit where you'll find a useful commit message or PR hopefully.

@@ -114,8 +114,8 @@ def _sync_locks(self) -> t.List[str]:
output.append(name)
if not output:
raise RuntimeError(
f"When syncing locks for path {self.path} and lock {self.lock_path} no lock file"
" was found"
f"Lock syncing failed. No lock file found for path `{self.path}` and lock"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: I like this format much better. is there a wor for it?
in emails or conversations, I call it frontloading

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, but "frontloading" is a good name for it! I don't have a rigorous process, but I try to write exceptions:

  1. what's wrong in the fewest words possible. Here, it would be even better to raise a LockSyncingError
  2. explain why the error is raised
  3. include hints to start debugging
  4. make it explicit when you interpolate a variable
  5. keep long variables at the end (e.g., self.path)

Didn't want to rewrite all exceptions, but this one was hiding the key message at the very end "no lock file was found"

Copy link
Collaborator

@djudjuu djudjuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey thierry,
what a ride. Reading nothing but exception messages is kinda fun. never did that before.

I found only two or three places that really need changes imo (prefixed my comment with issue:, the rest is just a bunch of small things to adjust like missing backticks or ' that failed to get replaced.
I corrected or marked many of them, not all, but I think the codebase will be more unified for sure after this PR. (achieving 100% similarity wouldnt be a wise use of our time for sure)

the other main issue is that you probably need to adjust a couple more test-messages to match the new messages.

I'm on vacation until next week, so I won't mark this as changes requested to no tblock this from being merged becasue I am slow to respond, but my requested changes would be: the issues i marked and make the tests pass. then its green from me.

@rudolfix rudolfix moved this from Todo to In Progress in dlt core library Jun 10, 2025
@zilto zilto force-pushed the qol/clearer-exceptions branch from e0f6450 to 79eb192 Compare June 10, 2025 14:57
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! tests are passing

@rudolfix rudolfix merged commit d2e29fa into devel Jun 11, 2025
82 of 86 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in dlt core library Jun 11, 2025
@rudolfix rudolfix deleted the qol/clearer-exceptions branch June 11, 2025 09:43
zilto added a commit that referenced this pull request Jun 11, 2025
* first pass formatting raise statements

* added custom exception

* updated Exception classes

* linting & formatting

* fixed tests

* fix imports

* capitalize argument to please mypy

* add raise keyword

* applied review comments

* passing tests

* fixed test

* fixed broken check logic

* format

* fixes bigquery test

---------

Co-authored-by: Marcin Rudolf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QoL Quality of Life
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants