Skip to content

Conversation

MiguelElGallo
Copy link
Owner

  • Renew token automatically
  • get token using AzureDefaultCredential()
  • Improved Token caching

MiguelElGallo and others added 12 commits September 26, 2025 14:26
- Remove problematic _ensure_arrow_table method and its dependencies
- Restore direct .arrow() calls without additional materialization wrappers
- Remove collections.abc.Iterable and typing.Any imports
- Revert all DuckDB 1.4.0 compatibility changes that caused issues

This rollback restores the duckdb_backend.py file to commit 5910739,
which was the last known working state before the problems started.
…scripts

- Updated client.sh to enhance usability and provide clearer connection setup.
- Removed client_fixed.py, demo.py, and run_client.py as they are no longer needed.
- Improved error handling and configuration loading in client.sh.
- Added usage examples and connection configuration display in client.sh.
- Simplified command structure and removed unnecessary commands from the client.
- Implemented `execute_sql_file` method in `client.py` to execute SQL commands from a specified file.
- Updated command-line interface in `client.sh` to include an option for executing SQL files.
- Enhanced error handling and logging during SQL file execution.
- Added comprehensive test scripts for schema detection and SQL operations.
- Created multiple SQL test files to validate various functionalities including DDL, DML, and analytical queries.
- Introduced a new demo script to showcase write and read operations in the DuckLake database.
…s; clean up SQL test files and verification scripts for improved clarity and maintainability.
- Implemented `run_insertsmall_simple.sh` for batch inserts with basic monitoring and error detection.
- Introduced `run_quick_schema_test.sh` for rapid validation of schema detection.
- Enhanced `client_debug.py` with memory monitoring, error handling, and connection management.
- Created SQL test scripts for listing and merging Ducklake files (`01_lakefiles.sql` and `02_lakemergfiles.sql`).
…483-8a89-e37b16bf98ed

Prepare test suite for release: Fix all failing tests and improve coverage
@Copilot Copilot AI review requested due to automatic review settings September 27, 2025 17:36
Copy link

codecov bot commented Sep 27, 2025

Codecov Report

❌ Patch coverage is 49.00000% with 204 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/mpzsql/cli.py 46.66% 144 Missing ⚠️
src/mpzsql/backends/duckdb_backend.py 53.84% 60 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements automatic token renewal for PostgreSQL with Azure authentication and includes comprehensive improvements to the token management system.

Key changes:

  • Automatic token renewal for PostgreSQL using DefaultAzureCredential()
  • Enhanced token caching and refresh mechanisms
  • Clean up of obsolete Azure CLI authentication tests
  • Comprehensive test suite improvements and fixes

Reviewed Changes

Copilot reviewed 40 out of 42 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/mpzsql/cli.py Added AzureCredentialManager class for token renewal and caching
src/mpzsql/backends/duckdb_backend.py Fixed RecordBatchReader handling and schema detection
tests/test_duckdb_backend_improvements.py Added new tests for backend improvements
tests/test_duckdb_backend_arrow_utilities.py Removed obsolete utility tests
tests/test_cli.py Removed deprecated Azure CLI authentication tests
pyproject.toml Added psutil dependency
TEST_STATUS_REPORT.md Comprehensive test improvement documentation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +295 to +305
def _clean_sql_query(self, query: str) -> str:
"""Clean SQL query by removing comments and normalizing whitespace."""
lines = []
for line in query.split("\n"):
# Remove single-line comments (-- comments)
if "--" in line:
line = line[: line.index("--")]
# Strip whitespace and skip empty lines
line = line.strip()
if line:
lines.append(line)
Copy link

Copilot AI Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment removal logic may incorrectly remove '--' that appears within string literals. For example, the query SELECT 'hello--world' would be incorrectly truncated to SELECT 'hello. Consider using a proper SQL parser or implementing string literal detection to avoid this issue.

Copilot uses AI. Check for mistakes.

Comment on lines +60 to +66
def __init__(self):
self._credential = None
self._cached_token = None
self._token_expires_at = None
self._postgresql_scope = "https://ossrdbms-aad.database.windows.net/.default"
self._duckdb_connection = None
self._server_config = None
Copy link

Copilot AI Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AzureCredentialManager class doesn't implement thread safety. If accessed from multiple threads simultaneously, race conditions could occur when updating cached tokens, connection objects, or configuration. Consider adding locking mechanisms or documenting that this class should only be used from a single thread.

Copilot uses AI. Check for mistakes.

Comment on lines +89 to +90
Raises:
Exception: If token acquisition fails
Copy link

Copilot AI Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method raises a generic 'Exception' which doesn't provide specific error information for callers. Consider creating specific exception types or at least documenting the specific exceptions that can be raised (e.g., CredentialUnavailableError, ClientAuthenticationError) to help callers handle different error scenarios appropriately.

Suggested change
Raises:
Exception: If token acquisition fails

Copilot uses AI. Check for mistakes.

Comment on lines +403 to +410
"SELECT * FROM duckdb_prepared_statements()"
).fetchall()

except Exception as param_error:
duckdb_log.error(
f"Query-based schema detection failed: {param_error}"
)
# Try the PREPARE approach as fallback
try:
# Replace parameters with NULL for PREPARE
schema_query = query
our_stmt = None
for row in result:
if row[0] == stmt_name:
our_stmt = row
break
Copy link

Copilot AI Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Querying all prepared statements with SELECT * when only looking for a specific statement by name is inefficient. Consider filtering the query with a WHERE clause: SELECT * FROM duckdb_prepared_statements() WHERE name = ? to reduce unnecessary data transfer and processing.

Copilot uses AI. Check for mistakes.

PORT {port},
DATABASE '{escape_sql_string(database or "postgres")}',
USER '{escape_sql_string(user)}',
PASSWORD '{password}'
Copy link

Copilot AI Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The password (Azure token) is not escaped when creating the SQL secret, while other parameters are. This inconsistency could lead to SQL injection if the token contains special characters like single quotes. Either escape the password consistently or document why Azure tokens are safe from SQL injection.

Suggested change
PASSWORD '{password}'
PASSWORD '{escape_sql_string(password)}'

Copilot uses AI. Check for mistakes.

@MiguelElGallo
Copy link
Owner Author

@copilot

Can you add more test to fix this:

❌ Patch coverage is 49.00000% with 204 lines in your changes missing coverage. Please review.

Files with missing lines | Patch % | Lines -- | -- | -- src/mpzsql/cli.py | 46.66% | 144 Missing ⚠️ src/mpzsql/backends/duckdb_backend.py | 53.84% | 60 Missing ⚠️

❌ Patch coverage is 49.00000% with 204 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/mpzsql/cli.py 46.66% 144 Missing ⚠️
src/mpzsql/backends/duckdb_backend.py 53.84% 60 Missing ⚠️

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 40 out of 42 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

src/mpzsql/cli.py:1

  • Consider consolidating the Azure credential imports to avoid confusion. The async version is only used once, so you could import it directly at the point of use or use a more descriptive alias.
"""CLI interface for MPZSQL server using typer.

src/mpzsql/cli.py:1

  • The password is inserted directly into the SQL string without proper escaping. While this is for Azure tokens which should be safe, consider using parameterized queries or a proper SQL builder to prevent potential injection issues.
"""CLI interface for MPZSQL server using typer.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +1527 to +1529
raise SystemExit(
f"FATAL: Azure container verification failed: {e}"
) from e
Copy link

Copilot AI Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message could be more helpful by including specific troubleshooting steps or configuration details about the Azure container that failed.

Copilot uses AI. Check for mistakes.

Comment on lines 779 to +785
try:
result_table = self._ensure_arrow_table(
self.connection.execute(query, params).arrow()
)
result = self.connection.execute(query, params).arrow()
# Convert RecordBatchReader to Table if needed
if isinstance(result, pa.RecordBatchReader):
result_table = result.read_all()
else:
result_table = result
Copy link

Copilot AI Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RecordBatchReader to Table conversion pattern is repeated multiple times throughout the file. Consider extracting this into a helper method like _ensure_arrow_table() to reduce code duplication and improve maintainability.

Copilot uses AI. Check for mistakes.

@MiguelElGallo
Copy link
Owner Author

@copilot

Can you add more tests to fix this:

❌ Patch coverage is 49.00000% with 204 lines in your changes missing coverage. Please review.

Files with missing lines | Patch % | Lines -- | -- | -- src/mpzsql/cli.py | 46.66% | 144 Missing ⚠️ src/mpzsql/backends/duckdb_backend.py | 53.84% | 60 Missing ⚠️
❌ Patch coverage is 49.00000% with 204 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/mpzsql/cli.py 46.66% 144 Missing ⚠️
src/mpzsql/backends/duckdb_backend.py 53.84% 60 Missing ⚠️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants