Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype of Dev/catalog interface #499

Closed
wants to merge 5 commits into from
Closed

Conversation

anshumankomawar
Copy link
Collaborator

Summary

This is a WIP progress implementation of the v2 catalog interface table functions.

Checklist

  • Unit tests covering the changes have been added

    • If this is a bugfix, regression tests have been added
  • E2E testing has been performed

Additional Notes

Any additional information or context relevant to this PR.

Comment on lines +1356 to +1363
namespace: Optional[Namespace] = get_namespace(
*args,
namespace=namespace,
**kwargs,
)

if not namespace:
raise ValueError(f"Namespace `{namespace}` does not exist.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this get_namespace call and just catch/reraise a more clear ValueError from the commit itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the transaction expects a metafile so we would need to do a get_namespace using the id to get the metafile.

Copy link
Member

@pdames pdames Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... can we just create the namespace to delete in-memory, since it will already contain the unique locator that the transaction needs to find the delete target?

namespace = Namespace.of(NamespaceLocator.of(namespace))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this comment, going to see if I can get it to work.


table_version = storage_impl.get_table_version(namespace, table_name, table.latest_table_version, catalog=catalog)

stream = storage_impl.get_stream(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future when we have multiple streams, how are users going to specify which stream they want in get / list / etc?

Copy link
Member

@pdames pdames Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for example, I'm imagining they'll explicitly need to ask for the iceberg stream if they want to query the Iceberg variant, so it seems like we'll need a way for them to explicitly pass in at least StreamFormat for all table-level catalog APIs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this out for my PR but will add a TODO to address this.

)

# Verify the error message
assert f"Table {self.test_namespace}.{table_name} already exists" in str(excinfo.value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you could put the expected error string in the with pytest.raises line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we have a lot of better custom error types in exceptions.py that we should start making better use of (and adding new exception types as needed) to avoid any cases that otherwise require string inspection like this (we probably need a corresponding high-level issue to comb through all of the high-level base Python errors we're throwing in the native DeltaCAT catalog/storage impl, and turn them into more appropriately typed DeltaCAT exceptions).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the ones directly included in my changes but there are still some errors thrown in the storage level that need to be updated. Don't want to dilute this cr will track a separate ticket for it.

@anshumankomawar anshumankomawar marked this pull request as ready for review March 11, 2025 22:12
Comment on lines +1322 to +1330
table: Optional[Table] = get_table(
*args,
namespace=namespace,
table_name=name,
**kwargs,
)

if not table:
raise TableNotFoundError(f"Table `{namespace}.{name}` does not exist.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to delete_namespace below, can we remove this get_table call and just catch/reraise a more clear ValueError from the commit itself?

Copy link
Member

@pdames pdames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Remaining comments are relatively low-priority, but would be great if they can also be addressed prior to merge (or, if not, feel free to submit a small fast-follow PR). Also, let's just ensure that all failing approval workflows have succeeded prior to merge.

@anshumankomawar anshumankomawar changed the title Dev/catalog interface Prototype of Dev/catalog interface Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants