Skip to content

Conversation

@acrylJonny
Copy link
Collaborator

No description provided.

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Dec 19, 2025
@github-actions
Copy link
Contributor

Linear: ING-1303

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Dec 19, 2025
@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 74.00000% with 13 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...c/datahub/ingestion/source/dremio/dremio_source.py 66.66% 12 Missing ⚠️
...adata-ingestion/src/datahub/sql_parsing/_models.py 92.85% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

database: Optional[str] = None
db_schema: Optional[str] = None
table: str
parts: Optional[tuple] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

databse, db_schema and table are self-explained

⚠️ however this new parts field would require some docs about the purpose

💡 additionally, we transform some comments into actual code to emphasize the identity constraint

eg

@property
def identity(self) -> tuple:
    return (self.database, self.db_schema, self.table) # parts excluded on purpose

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Dec 19, 2025

# Store parts as string tuple for connector-specific handling
parts_tuple = None
if hasattr(table, "parts") and table.parts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the hasattr? in which cases sqlglot.exp.Table have/miss this parts field?

similar with the gettattr below, quite confusing

maybe AI agent being excessively/unnecessarily conservative?


logger = logging.getLogger(__name__)

# Dremio uses 'dremio' as the default database name in all SQL contexts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where this constraint comes from? us, sqlglot, dremio itself?

Comment on lines 93 to 94
table_name_parts = [DREMIO_DATABASE_NAME] + list(table.parts)
table_name = ".".join(filter(None, table_name_parts))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to have a method in _TableName providing this or part of this logic?

# _TableName
@property
def name_with_parts(self): # I'm sure you can find a better name
    return ".".join(filter(None, list(self.parts)))

# here
table_name = DREMIO_DATABASE_NAME + "." + table.name_with_parts

And I also think part of this could also be pushed to the _TableName

                table_name_parts = [
                    DREMIO_DATABASE_NAME,
                    table.database,
                    table.db_schema,
                    table.table,
                ]
            else:
                table_name_parts = [
                    table.database or DREMIO_DATABASE_NAME,
                    table.db_schema,
                    table.table,
                ]

self.database == other.database
and self.db_schema == other.db_schema
and self.table == other.table
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bart also found this

BLOCKERS - Must Fix Before Merge:*

  1. qualified() method loses the parts field (line 74-78 in _models.py)
    • When creating a new _TableName instance with defaults applied, the parts field isn't preserved
    • This breaks multi-part table handling - the entire feature falls apart here
    • Fix: Add parts=self.parts when returning the new instance

Copy link
Contributor

@sgomezvillamor sgomezvillamor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

Main concern is: given we have new fields in _TableName let's try to keep the logic around that encapsulated in the _TableName class.

@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Dec 19, 2025
database: Optional[str] = None
db_schema: Optional[str] = None
table: str
parts: Optional[Tuple[str, ...]] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, add docs around this field

Copy link
Contributor

@sgomezvillamor sgomezvillamor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving to unblock

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants