Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5202] feat(client-python): Support Column and its default value part1 #6542

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

tsungchih
Copy link

What changes were proposed in this pull request?

This is first part (totally 4 planned) of implementation to the following classes from Java to support Column and its default value, including:

  • Column.java
  • SupportsTags.java
  • Tag.java
  • NoSuchTagException.java
  • TagAlreadyExistsException.java

Why are the changes needed?

We need to support Column and its default value in python client.

#5202

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests in test_column.py

add NoSuchTagException

Signed-off-by: George T. C. Lai <[email protected]>
add TagAlreadyExistsException

Signed-off-by: George T. C. Lai <[email protected]>
add interfaces Tag and AssociatedObjects

Signed-off-by: George T. C. Lai <[email protected]>
add interface SupportsTags

Signed-off-by: George T. C. Lai <[email protected]>
add unit tests for class Column

Signed-off-by: George T. C. Lai <[email protected]>
add implementation of class Column

apache#5202

Signed-off-by: George T. C. Lai <[email protected]>
@tsungchih tsungchih changed the title [#5201] feat(client-python): Support Column and its default value part1 [#5202] feat(client-python): Support Column and its default value part1 Feb 26, 2025
@tengqm
Copy link
Contributor

tengqm commented Feb 26, 2025

I'm wondering if the Python code is auto-generated or manually written.

@tsungchih
Copy link
Author

@tengqm The Python code is manually written according to the corresponding Java code. Only the unit tests were auto-generated and revised by myself. Are there concerns with the auto-generated unit tests?

@tengqm
Copy link
Contributor

tengqm commented Feb 27, 2025

@tengqm The Python code is manually written according to the corresponding Java code. Only the unit tests were auto-generated and revised by myself. Are there concerns with the auto-generated unit tests?

No. I don't have concerns about the unit tests. The reason I asked this question is that I am (still) wondering whether we can write the code in a more Python way. If, however, we have an automation to generate these Python files out of Java source, it should be fine. Otherwise, we don't need to copy them.

For example, in class ColumnImpl, we can write the code as:

class ColumnImpl(Column):
  """The implementation of `Column` for users to use API."""
  def __init__(
      self, name: str, data_type: Type, comment: Optional[str],
      nullable: bool, auto_increment: bool,
      default_value: Optional[Expression]):

      Precondition.check_string_not_empty(name, "Column name cannot be null")
      Precondition.check_argument(
          data_type is not None, "Column data type cannot be null")
      self.name = name
      self.data_type = data_type
      self.comment = comment
      self.nullable = nullable
      self.auto_increment = auto_increment
      self.default_value = default_value

Isn't this pretty straightforward?

@tsungchih
Copy link
Author

@tengqm The Python code is manually written according to the corresponding Java code. Only the unit tests were auto-generated and revised by myself. Are there concerns with the auto-generated unit tests?

No. I don't have concerns about the unit tests. The reason I asked this question is that I am (still) wondering whether we can write the code in a more Python way. If, however, we have an automation to generate these Python files out of Java source, it should be fine. Otherwise, we don't need to copy them.

For example, in class ColumnImpl, we can write the code as:

class ColumnImpl(Column):
  """The implementation of `Column` for users to use API."""
  def __init__(
      self, name: str, data_type: Type, comment: Optional[str],
      nullable: bool, auto_increment: bool,
      default_value: Optional[Expression]):

      Precondition.check_string_not_empty(name, "Column name cannot be null")
      Precondition.check_argument(
          data_type is not None, "Column data type cannot be null")
      self.name = name
      self.data_type = data_type
      self.comment = comment
      self.nullable = nullable
      self.auto_increment = auto_increment
      self.default_value = default_value

Isn't this pretty straightforward?

Allow me to clarify myself. In the above example you demonstrated, we don't have to mimic how Java makes the data members, like name, comment, etc., private/protected and creates getter/setter to access them. Just make them public for direct access would be more straightforward. Is that correct?

@tsungchih
Copy link
Author

tsungchih commented Feb 27, 2025

Here I got another question when I was implementing the xxxDTO classes. Most of them have a factory method of or builder class inside. I was wondering if I should implement those builder classes since I feel we don't need them. As you suggested, perhaps I should avoid copying them side by side from Java source but implement these classes in a more straight forward ways. Do I get your point?

@tengqm
Copy link
Contributor

tengqm commented Feb 28, 2025

Yes. There are two extremes here. Either you are auto-generating the Python code from Java, you only care about the correctness of the generator, you don't care if the code is complicated because they will be overwritten next time when you run the generator.

Or you may want to focus on producing usable, useful, working and MAINTAINABLE code with the least lines of code. One rule of thumb for me is that it is always easy to add something to an existing code base, but it is never so easy to get rid of them once they are merged.

My overall suggestion is to drastically simplify the code without sacrificing the its functionality. It helps save our lives for something else more interesting/challenging.

WDYT?

@tsungchih
Copy link
Author

Yes. There are two extremes here. Either you are auto-generating the Python code from Java, you only care about the correctness of the generator, you don't care if the code is complicated because they will be overwritten next time when you run the generator.

Or you may want to focus on producing usable, useful, working and MAINTAINABLE code with the least lines of code. One rule of thumb for me is that it is always easy to add something to an existing code base, but it is never so easy to get rid of them once they are merged.

My overall suggestion is to drastically simplify the code without sacrificing the its functionality. It helps save our lives for something else more interesting/challenging.

WDYT?

Yes, totally agree with you as to producing usable, useful, working and MAINTAINABLE code. Therefore, based on your rule of thumb, we will have the following class Column rather than interface Column and its implementation ColumnImpl in column.py.

class Column:
    """The `Column` for users to use API."""

    def __init__(
        self,
        name: str,
        data_type: Type,
        comment: Optional[str],
        nullable: bool,
        auto_increment: bool,
        default_value: Optional[Expression],
    ):
        Precondition.check_string_not_empty(name, "Column name cannot be null")
        Precondition.check_argument(
            data_type is not None, "Column data type cannot be null"
        )
        self.name = name
        self.data_type = data_type
        self.comment = comment
        self.nullable = nullable
        self.auto_increment = auto_increment
        self.default_value = default_value

How do you think of this way?

@xunliu
Copy link
Member

xunliu commented Feb 28, 2025

@tsungchih Thank you for your contributions. :-)

hi @tengqm

The reason I asked this question is that I am (still) wondering whether we can write the code in a more Python way.

I've thought about this as well, and I think we should code the Python client the same way as Python, but we want to keep the interface functions of the Python client consistent with the JAVA client.

rename method to follow snak_case convention

apache#5202

Signed-off-by: George T. C. Lai <[email protected]>
@tsungchih
Copy link
Author

@tsungchih Thank you for your contributions. :-)

hi @tengqm

The reason I asked this question is that I am (still) wondering whether we can write the code in a more Python way.

I've thought about this as well, and I think we should code the Python client the same way as Python, but we want to keep the interface functions of the Python client consistent with the JAVA client.

In doing so, I'll have to keep the interface Column consistent with the Java client that encapsulates the data members as private and exposes getter methods to access them. This goes to the implementation in the PR. I'm confused here. Which strategy for implementing Python client do we have?

add licence comment to unit test file

Signed-off-by: George T. C. Lai <[email protected]>
@tengqm
Copy link
Contributor

tengqm commented Feb 28, 2025

@tsungchih Thank you for your contributions. :-)
hi @tengqm

The reason I asked this question is that I am (still) wondering whether we can write the code in a more Python way.

I've thought about this as well, and I think we should code the Python client the same way as Python, but we want to keep the interface functions of the Python client consistent with the JAVA client.

In doing so, I'll have to keep the interface Column consistent with the Java client that encapsulates the data members as private and exposes getter methods to access them. This goes to the implementation in the PR. I'm confused here. Which strategy for implementing Python client do we have?

One way of reasoning this is that we consider the use cases. Does the interface consistency matter for the users of client-python? Who are the users of this library? What do they care most, the APIs and utility functions or the implementation structure? There will never be a deterministic answer to this question. It all depends on the use cases.

@tsungchih
Copy link
Author

Looked into the failed messages in CI. It showed that gravitino.exceptions.base.InternalError: HDFS container startup failed!. But I can successfully pass tests by running ./gradlew build command. I was wondering how I could fix this error in CI.

@xunliu
Copy link
Member

xunliu commented Mar 5, 2025

hi @tsungchih. Thank you for your contributions.
I helped you re-trigger Python CI check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants