Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement catalog table functions #504

Merged
merged 2 commits into from
Mar 12, 2025

Conversation

anshumankomawar
Copy link
Collaborator

Summary

This PR continues work on the catalog interface layer for Deltacat. It primarily focuses on implementing table related functions.

Functions Implemented:

  • alter_table
  • create_table
  • drop_table
  • list_tables
  • get_table
  • rename_table

Also includes minor bugfixes and better exception handling in the storage interface layer.

Testing

  • ran existing tests using make test
  • added new tests for catalog table functions

Checklist

  • Unit tests covering the changes have been added

    • If this is a bugfix, regression tests have been added
  • E2E testing has been performed

Additional Notes

There are some follow up tasks that have been added as TODO's.

@anshumankomawar anshumankomawar requested a review from pdames March 12, 2025 01:27
@@ -82,12 +82,12 @@ make test
Runs all unit tests.

> [!NOTE]
> To run an individual unit test where `my_deltacat_test` exists in either the test file, class,
> To run an individual unit test where `my_deltacat_test` exists in either the test file, class,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linter errors from previous commit.

@@ -899,7 +901,7 @@ def update_table(
**kwargs,
)
if not old_table:
raise ValueError(f"Table `{namespace}.{table_name}` does not exist.")
raise TableNotFoundError(f"Table `{namespace}.{table_name}` does not exist.")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out by @pdames update the rest of the storage implementation to use Deltacat exceptions instead of throwing ValueErrors.


TODO: Honor purge once garbage collection is implemented.
"""
table: Optional[Table] = get_table(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by @pdames

Can we remove this get_namespace call and just catch/reraise a more clear ValueError from the commit itself?

Copy link
Collaborator Author

@anshumankomawar anshumankomawar Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to create an in memory Table/TableLocator instead of calling get_table.

Drops the given table namespace and all its contents. Raises an error if the
given namespace does not exist.
"""
namespace: Optional[Namespace] = get_namespace(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous get_table call. Try to use an in memory solution.

name: str,
*args,
namespace: Optional[str] = None,
table_version: Optional[str] = None,
Copy link
Collaborator Author

@anshumankomawar anshumankomawar Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested by @mcember

Should we also optionally accept a version?

Should namespace here go after *args? It feels weird to have positional arguments in the order [table, namespace], since people will think of it as the hierarchy from left to right [namespace, table]. At least in create table I thought to demote it from positional argument into keyword only argument for this reason

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an optional TableVersion parameter. Also decided to ensure consistency with param ordering.

positional params,
*args,
optional params,
**kwargs

*args,
namespace: Optional[str] = None,
table_version: Optional[str] = None,
stream_format: StreamFormat = StreamFormat.DELTACAT,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested by @mattcember

In the future when we have multiple streams, how are users going to specify which stream they want in get / list / etc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an explicit stream_format parameter to specify the stream. It defaults to the Deltacat stream.

@anshumankomawar anshumankomawar marked this pull request as ready for review March 12, 2025 01:47
@anshumankomawar anshumankomawar requested a review from mcember March 12, 2025 01:47
@pdames pdames merged commit 618d98f into 2.0 Mar 12, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants