Test by evb123 · Pull Request #1 · octoenergy/databricks-sql-python

evb123 · 2024-01-26T11:00:02Z

TEst

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com>

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

--------- Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

## Summary Support OAuth flow for Databricks Azure ## Background Some OAuth endpoints (e.g. Open ID Configuration) and scopes are different between Databricks Azure and AWS. Current code only supports OAuth flow on Databricks in AWS ## What changes are proposed in this pull request? - Change `OAuthManager` to decouple Databricks AWS specific configuration from OAuth flow - Add `sql/auth/endpoint.py` that implements cloud specific OAuth endpoint configuration - Change `DatabricksOAuthProvider` to work with the OAuth configurations in different Databricks cloud (AWS, Azure) - Add the corresponding unit tests

--------- Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

* Cloud Fetch download handler Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Issue fix: final result link compressed data has multiple LZ4 end-of-frame markers Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Addressing PR comments - Linting - Type annotations - Use response.ok - Log exception - Remove semaphore and only use threading.event - reset() flags method - Fix tests after removing semaphore - Link expiry logic should be in secs - Decompress data static function - link_expiry_buffer and static public methods - Docstrings and comments Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Changing logger.debug to remove url Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * _reset() comment to docstring Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * link_expiry_buffer -> link_expiry_buffer_secs Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> --------- Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com>

* Cloud Fetch download manager Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Bug fix: submit handler.run Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Type annotations Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Namedtuple -> dataclass Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Shutdown thread pool and clear handlers Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Docstrings and comments Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * handler.run is the correct call Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Link expiry buffer in secs Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Adding type annotations for download_handlers and downloadable_result_settings Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Move DownloadableResultSettings to downloader.py to avoid circular import Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Black linting Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Timeout is never None Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> --------- Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com>

* Cloud fetch queue and integration Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Enable cloudfetch with direct results Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Typing and style changes Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Client-settable max_download_threads Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Docstrings and comments Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Increase default buffer size bytes to 104857600 Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Move max_download_threads to kwargs of ThriftBackend, fix unit tests Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Fix tests: staticmethod make_arrow_table mock not callable Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * cancel_futures in shutdown() only available in python >=3.9.0 Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Black linting Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Fix typing errors Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> --------- Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com>

* Cloud Fetch e2e tests Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Test case works for e2-dogfood shared unity catalog Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Moving test to LargeQueriesSuite and setting catalog to hive_metastore Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Align default value of buffer_size_bytes in driver tests Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> * Adding comment to specify what's needed to run successfully Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com> --------- Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com>

Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com>

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Signed-off-by: Sebastian Eckweiler <sebastian.eckweiler@mercedes-benz.com> Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> Co-authored-by: Sebastian Eckweiler <sebastian.eckweiler@mercedes-benz.com> Co-authored-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Signed-off-by: Daniel Segesdi <daniel.segesdi@turbine.ai> Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> Co-authored-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

--------- Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> Co-authored-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

--------- Signed-off-by: Bogdan Kyryliuk <b.kyryliuk@gmail.com> Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> Co-authored-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Signed-off-by: William Gentry <william.barr.gentry@gmail.com> Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> Co-authored-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

--------- Co-authored-by: Jesse <jesse.whitehouse@databricks.com>

Resolves #187 Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Behaviour is gated behind `enable_v3_retries` config. This will be removed and become the default behaviour in a subsequent release. Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* query tags integration Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> * query tags with sea flow Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> * optmized tests Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> --------- Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>

…688)

* token federation for python driver * address comment * address comments * lint * lint fix * nit * change import

release-4.1.4

* Added a workflow to parallelise the E2E tests. Updated E2E tests to create new table names for each run to avoid issue in parallelisation * Modified parallel code coverage workflow to fail when e2e tests fail * Fixed parallel coverage check pytest command * Modified e2e tests to support parallel execution * Added fallbacks for exit code 5 where we find no test for SEA * Fixed coverage artifact for parallel test workflow * Debugging coverage check merge * Improved coverage report merge and removed the test_driver test for faster testing * Debug commit for coverage merge * Debugging coverage merge 2 * Debugging coverage merge 3 * Removed unnecessary debug statements from the parallel code coverage workflow * Added unit test and common e2e tests * Added null checks for coverage workflow * Improved the null check for test list * Improved the visibility for test list * Added check for exit code 5 * Updated the workflowfor coverage check to use pytst -xdist to run the tests parallely * Enforced the e2e tests should pass * Changed name for workflow job * Updated poetry * Removed integration and previous code coverage workflow * Added the integration workflow again

* Added driver connection params Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * Added model fields for chunk/result latency Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fixed linting issues Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * lint issue fixing Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> --------- Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>

Implement PEP 249-compliant transaction control with extensions for manual commit/rollback operations. This enables atomic multi-table operations with REPEATABLE_READ isolation semantics. Core API additions: - connection.autocommit property for enabling/disabling auto-commit mode - connection.commit() to commit active transactions - connection.rollback() to rollback active transactions - connection.get_transaction_isolation() returns current isolation level - connection.set_transaction_isolation() validates isolation level - TransactionError exception for transaction-specific failures Implementation details: - Added autocommit state caching in Session with optional server query - Added TRANSACTION_ISOLATION_LEVEL_REPEATABLE_READ constant - All transaction operations include proper error handling and telemetry - Supports fetch_autocommit_from_server connection parameter Testing: - Unit tests covering all transaction methods and error scenarios - e2e integration tests validating transaction behavior including multi-table atomicity, sequential transactions, and isolation semantics Documentation: - Comprehensive TRANSACTIONS.md guide with examples and best practices - Updated README.md with basic usage and reference to detailed docs Requires MST-enabled Databricks SQL warehouse and Delta tables with 'delta.feature.catalogOwned-preview' table property. --------- Signed-off-by: Jayant Singh <jayant.singh@databricks.com>

Bump to version 4.2.0 Signed-off-by: Jayant Singh <jayant.singh@databricks.com>

Introduces a new `ignore_transactions` configuration parameter (default: True) to control transaction-related behavior in the Connection class. When ignore_transactions=True (default): - commit(): no-op, returns immediately - rollback(): raises NotSupportedError with message "Transactions are not supported on Databricks" - autocommit setter: no-op, returns immediately When ignore_transactions=False: - All transaction methods execute normally Changes: - Added ignore_transactions parameter to Connection.__init__() with default value True - Modified commit(), rollback(), and autocommit setter to check ignore_transactions flag - Updated unit tests to pass ignore_transactions=False when testing transaction functionality - Updated e2e transaction tests to pass ignore_transactions=False - Added three new unit tests to verify ignore_transactions

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

This changes the default value of use_hybrid_disposition from True to False in the SEA backend, disabling hybrid disposition by default.

* Added driver connection params Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * Added model fields for chunk/result latency Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fixed linting issues Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * lint issue fixing Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * circuit breaker changes using pybreaker Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * Added interface layer top of http client to use circuit rbeaker Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * Added test cases to validate ciruit breaker Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fixing broken tests Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fixed linting issues Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fixed failing test cases Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fixed urllib3 issue Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * added more test cases for telemetry Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * simplified CB config Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * poetry lock Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fix minor issues & improvement Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * improved circuit breaker for handling only 429/503 Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * linting issue fixed Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * raise CB only for 429/503 Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fix broken test cases Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fixed untyped references Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * added more test to verify the changes Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * description changed Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * remove cb congig class to constants Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * removed mocked reponse and use a new exlucded exception in CB Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fixed broken test Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * added e2e test to verify circuit breaker Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * lower log level for telemetry Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fixed broken test, removed tests on log assertions Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * modified unit to reduce the noise and follow dry principle Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> --------- Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>

perf: Optimize telemetry latency logging to reduce overhead Optimizations implemented: 1. Eliminated extractor pattern - replaced wrapper classes with direct attribute access functions, removing object creation overhead 2. Added feature flag early exit - checks cached telemetry_enabled flag to skip heavy work when telemetry is disabled 3. Simplified code structure with early returns for better readability Signed-off-by: Samikshya Chand <samikshya.chand@databricks.com>

* basic e2e test for force telemetry verification Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * Added more integration test scenarios Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * default on telemetry + logs to investigate failing test Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fixed linting issue Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * added more logs to identify server side flag evaluation Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * remove unused logs Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fix broken test case for default enable telemetry Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * redcude test length and made more reusable code Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * removed telemetry e2e to daily single run Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> --------- Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>

…#718) * feat: Implement host-level telemetry batching to reduce rate limiting Changes telemetry client architecture from per-session to per-host batching, matching the JDBC driver implementation. This reduces the number of HTTP requests to the telemetry endpoint and prevents rate limiting in test environments. Key changes: - Add _TelemetryClientHolder with reference counting for shared clients - Change TelemetryClientFactory to key clients by host_url instead of session_id - Add getHostUrlSafely() helper for defensive null handling - Update all callers (client.py, exc.py, latency_logger.py) to pass host_url Before: 100 connections to same host = 100 separate TelemetryClients After: 100 connections to same host = 1 shared TelemetryClient (refcount=100) This fixes rate limiting issues seen in e2e tests where 300+ parallel connections were overwhelming the telemetry endpoint with 429 errors. * chore: Change all telemetry logging to DEBUG level Reduces log noise by changing all telemetry-related log statements (info, warning, error) to debug level. Telemetry operations are background tasks and should not clutter logs with operational messages. Changes: - Circuit breaker state changes: info/warning -> debug - Telemetry send failures: error -> debug - All telemetry operations now consistently use debug level * chore: Fix remaining telemetry warning log to debug Changes remaining logger.warning in telemetry_push_client.py to debug level for consistency with other telemetry logging. * fix: Update tests to use host_url instead of session_id_hex - Update circuit breaker test to check logger.debug instead of logger.info - Replace all session_id_hex test parameters with host_url - Apply Black formatting to exc.py and telemetry_client.py This fixes test failures caused by the signature change from session_id_hex to host_url in the Error class and TelemetryClientFactory. * fix: Revert session_id_hex in tests for functions that still use it Only Error classes changed from session_id_hex to host_url. Other classes (TelemetryClient, ResultSetDownloadHandler, etc.) still use session_id_hex. Reverted: - test_telemetry.py: TelemetryClient and initialize_telemetry_client - test_downloader.py: ResultSetDownloadHandler - test_download_manager.py: ResultFileDownloadManager Kept as host_url: - test_client.py: Error class instantiation * fix: Update all Error raises and test calls to use host_url Changes: 1. client.py: Changed all error raises from session_id_hex to host_url - Connection class: session_id_hex=self.get_session_id_hex() -> host_url=self.session.host - Cursor class: session_id_hex=self.connection.get_session_id_hex() -> host_url=self.connection.session.host 2. test_telemetry.py: Updated get_telemetry_client() and close() calls - get_telemetry_client(session_id) -> get_telemetry_client(host_url) - close(session_id) -> close(host_url=host_url) 3. test_telemetry_push_client.py: Changed logger.warning to logger.debug - Updated test assertion to match debug logging level These changes complete the migration from session-level to host-level telemetry client management. * fix: Update thrift_backend.py to use host_url instead of session_id_hex Changes: 1. Added self._host attribute to store server_hostname 2. Updated all error raises to use host_url=self._host 3. Changed method signatures from session_id_hex to host_url: - _check_response_for_error - _hive_schema_to_arrow_schema - _col_to_description - _hive_schema_to_description - _check_direct_results_for_error 4. Updated all method calls to pass self._host instead of self._session_id_hex This completes the migration from session-level to host-level error reporting. * Fix Black formatting by adjusting fmt directive placement Moved the `# fmt: on` directive to the except block level instead of inside the if statement to resolve Black parsing confusion. * Fix telemetry feature flag tests to set mock session host The tests were failing because they called get_telemetry_client("test") but the mock session didn't have .host set, so the telemetry client was registered under a different key (likely None or MagicMock). This caused the factory to return NoopTelemetryClient instead of the expected client. Fixed by setting mock_session_instance.host = "test" in all three tests. * Add teardown_method to clear telemetry factory state between tests Without this cleanup, tests were sharing telemetry clients because they all used the same host key ("test"), causing test pollution. The first test would create an enabled client, and subsequent tests would reuse it even when they expected a disabled client. * Clear feature flag context cache in teardown to fix test pollution The FeatureFlagsContextFactory caches feature flag contexts per session, causing tests to share the same feature flag state. This resulted in the first test creating a context with telemetry enabled, and subsequent tests incorrectly reusing that enabled state even when they expected disabled. * fix: Access actual client from holder in flush worker The flush worker was calling _flush() on _TelemetryClientHolder objects instead of the actual TelemetryClient. Fixed by accessing holder.client before calling _flush(). Fixes AttributeError in e2e tests: '_TelemetryClientHolder' object has no attribute '_flush' * Clear telemetry client cache in e2e test teardown Added _clients.clear() to the teardown fixture to prevent telemetry clients from persisting across e2e tests, which was causing session ID pollution in test_concurrent_queries_sends_telemetry. * Pass session_id parameter to telemetry export methods With host-level telemetry batching, multiple connections share one TelemetryClient. Each client stores session_id_hex from the first connection that created it. This caused all subsequent connections' telemetry events to use the wrong session ID. Changes: - Modified telemetry export method signatures to accept optional session_id - Updated Connection.export_initial_telemetry_log() to pass session_id - Updated latency_logger.py export_latency_log() to pass session_id - Updated Error.__init__() to accept optional session_id_hex and pass it - Updated all error raises in Connection and Cursor to pass session_id_hex 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Black formatting in telemetry_client.py * Use 'test-host' instead of 'test' for mock host in telemetry tests * Replace test-session-id with test-host in test_client.py * Fix telemetry client lookup to use test-host in tests * Make session_id_hex keyword-only parameter in Error.__init__ --------- Co-authored-by: Claude <noreply@anthropic.com>

* Prepare for a release with telemetry on by default Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com> * Make edits Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com> * Update version Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com> * Fix CHANGELOG formatting to match previous style Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com> * Fix telemetry e2e tests for default-enabled behavior - Update test expectations to reflect telemetry being enabled by default - Add feature flags cache cleanup in teardown to prevent state leakage between tests - This ensures each test runs with fresh feature flag state * Add wait after connection close for async telemetry submission * Remove debug logging from telemetry tests * Mark telemetry e2e tests as serial - must not run in parallel Root cause: Telemetry tests share host-level client across pytest-xdist workers, causing test isolation issues with patches. Tests pass serially but fail with -n auto. Solution: Add @pytest.mark.serial marker. CI needs to run these separately without -n auto. * Split test execution to run serial tests separately Telemetry e2e tests must run serially due to shared host-level telemetry client across pytest-xdist workers. Running with -n auto causes test isolation issues where futures aren't properly captured. Changes: - Run parallel tests with -m 'not serial' -n auto - Run serial tests with -m 'serial' without parallelization - Use --cov-append for serial tests to combine coverage - Mark telemetry e2e tests with @pytest.mark.serial - Update test expectations for default telemetry behavior - Add feature flags cache cleanup in test teardown * Mark telemetry e2e tests as serial - must not run in parallel The concurrent telemetry e2e test globally patches telemetry methods to capture events. When run in parallel with other tests via pytest-xdist, it captures telemetry events from other concurrent tests, causing assertion failures (expected 60 events, got 88). All telemetry e2e tests must run serially to avoid cross-test interference with the shared host-level telemetry client. --------- Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>

* added pandas 2.3.3 support and tests for py 3.14 Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> * generated poetry.lock Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> * lz4 version update for py 3.14 Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> * dependency selection based on py version Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> * pyarrow version update for py 3.14 Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> * poetry.lock with latest poetry version Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> --------- Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>

* pandas 2.3.3 support for py < 3.14 Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> * poetry lock Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> --------- Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>

Fixed the exception handler calls close() on _TelemetryClientHolder objects instead of accessing the client inside them.

* created util method to normalise http protocol in http path Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * Added impacted files using util method Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * Fixed linting issues Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * fixed broken test with mock host string Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * mocked http client Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * made case sensitive check in url utils Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * linting issue resolved Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * removed unnecessary md files Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * made test readbale Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> * changes done in auth util as well as sea http Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com> --------- Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>

New minor version release

* query tags telemetry Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> * code linting fix Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com> --------- Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>

…heck failures in the repo (#735) * Fix 60 seconds delay in gov cloud connections * keep it simple :) * Add fix for krb error * pin poetry * Pin for publish flow too * Fix failing tests * Edit order for pypi * One last fix : pls work

* Fix #729 and #731: Telemetry lifecycle management Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * Address review comments: revert timeout and telemetry_enabled changes Per reviewer feedback on PR #734: 1. Revert timeout from 30s back to 900s (line 299) - Reviewer noted that with wait=False, timeout is not critical - The async nature and wait=False handle the exit speed 2. Revert telemetry_enabled parameter back to True (line 734) - Reviewer noted this is redundant given the early return - If enable_telemetry=False, we return early (line 729) - Line 734 only executes when enable_telemetry=True - Therefore using the parameter here is unnecessary These changes address the reviewer's valid technical concerns while keeping the core fixes intact: - wait=False for non-blocking shutdown (critical for Issue #729) - Early return when enable_telemetry=False (critical for Issue #729) - All Issue #731 fixes (null-safety, __del__, documentation) Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * Fix Black formatting violations Apply Black formatting to files modified in previous commits: - src/databricks/sql/common/unified_http_client.py - src/databricks/sql/telemetry/telemetry_client.py Changes are purely cosmetic (quote style consistency). Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * Fix CI test failure: Prevent parallel execution of telemetry tests Add @pytest.mark.xdist_group to telemetry test classes to ensure they run sequentially on the same worker when using pytest-xdist (-n auto). Root cause: Tests marked @pytest.mark.serial were still being parallelized in CI because pytest-xdist doesn't respect custom markers by default. With host-level telemetry batching (PR #718), tests running in parallel would share the same TelemetryClient and interfere with each other's event counting, causing test_concurrent_queries_sends_telemetry to see 88 events instead of the expected 60. The xdist_group marker ensures all tests in the "serial_telemetry" group run on the same worker sequentially, preventing state interference. Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix telemetry test fixtures: Clean up state before AND after tests Modified telemetry_setup_teardown fixtures to clean up TelemetryClientFactory state both BEFORE and AFTER each test, not just after. This prevents leftover state from previous tests (pending events, active executors) from interfering with the current test. Root cause: In CI with sequential execution on the same worker, if a previous test left pending telemetry events in the executor, those events could be captured by the next test's mock, causing inflated event counts (88 instead of 60). Now ensures complete isolation between tests by resetting all shared state before each test starts. Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix CI test failure: Clear _flush_event between tests The _flush_event threading.Event was never cleared after stopping the flush thread, remaining in "set" state. This caused timing issues in subsequent tests where the Event was already signaled, triggering unexpected flush behavior and causing extra telemetry events to be captured (88 instead of 60). Now explicitly clear the _flush_event flag in both setup (before test) and teardown (after test) to ensure clean state isolation between tests. This explains why CI consistently got 88 events - the flush_event from previous tests triggered additional flushes during test execution. Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com> * Add debug workflow and output to diagnose CI test failure 1. Created new workflow 'test-telemetry-only.yml' that runs only the failing telemetry test with -n auto, mimicking real CI but much faster 2. Added debug output to test showing: - Client-side captured events - Number of futures/batches - Number of server responses - Server-reported successful events This will help identify why CI gets 88 events vs local 60 events. Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix workflow: Add krb5 system dependency The workflow was failing during poetry install due to missing krb5 system libraries needed for kerberos dependencies. Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix xdist_group: Add --dist=loadgroup to pytest commands The @pytest.mark.xdist_group markers were being ignored because pytest-xdist uses --dist=load by default, which doesn't respect groups. With --dist=loadgroup, tests in the same xdist_group run sequentially on the same worker, preventing telemetry state interference between tests. This is the ROOT CAUSE of the 88 vs 60 events issue - tests were running in parallel across workers instead of sequentially on one worker as intended. Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com> * Add aggressive flush before test to prevent event interference CI shows 72 events instead of 60. Debug output reveals: - Client captured: 60 events (correct) - Server received: 72 events across 2 batches The 12 extra events accumulate in the timing window between fixture cleanup and mock setup. Other tests (like circuit breaker tests not in our xdist_group) may be sending telemetry concurrently. Solution: Add an explicit flush+shutdown RIGHT BEFORE setting up the mock to ensure a completely clean slate with zero buffered events. Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com> * Split workflow: Isolate telemetry tests in separate job To prevent interference from other e2e tests, split into two jobs: Job 1 (run-non-telemetry-tests): - Runs all e2e tests EXCEPT telemetry tests - Uses -n auto for parallel execution Job 2 (run-telemetry-tests): - Runs ONLY telemetry tests - Depends on Job 1 completing (needs: run-non-telemetry-tests) - Fresh Python process = complete isolation - No ambient telemetry from other tests This eliminates the 68 vs 60 event discrepancy by ensuring telemetry tests run in a clean environment with zero interference. Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix workflows: Add krb5 deps and cleanup debug code Changes across multiple workflows: 1. integration.yml: - Add krb5 system dependency to telemetry job - Fixes: krb5-config command not found error during poetry install 2. code-coverage.yml: - Add krb5 system dependency - Split telemetry tests into separate step for isolation - Maintains coverage accumulation with --cov-append 3. publish-test.yml: - Add krb5 system dependency for consistent builds 4. test_concurrent_telemetry.py: - Remove debug print statements 5. Delete test-telemetry-only.yml: - Remove temporary debug workflow All workflows now have proper telemetry test isolation and required system dependencies for kerberos packages. Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix publish-test.yml: Update Python 3.9 -> 3.10 Poetry 2.3.2 installation fails with Python 3.9: Installing Poetry (2.3.2): An error occurred. Other workflows use Python 3.10 and work fine. Updating to match ensures consistency and avoids Poetry installation issues. Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix integration workflow: Remove --dist=loadgroup from non-telemetry tests - Remove --dist=loadgroup from non-telemetry job (only needed for telemetry) - Remove test_telemetry_e2e.py from telemetry job (was skipped before) - This should fix test_uc_volume_life_cycle failure caused by changed test distribution * Fix code-coverage workflow: Remove test_telemetry_e2e.py from coverage tests - Only run test_concurrent_telemetry.py in isolated telemetry step - test_telemetry_e2e.py was excluded in original workflow, keep it excluded * Fix publish-test workflow: Remove cache conditional - Always run poetry install (not just on cache miss) - Ensures fresh install with system dependencies (krb5) - Matches pattern used in integration.yml * Fix publish-test.yml: Remove duplicate krb5 install, restore cache conditional - Remove duplicate system dependencies step - Restore cache conditional to match main branch - Keep Python 3.10 (our change from 3.9) * Fix code-coverage: Remove serial tests step - All serial tests are telemetry tests (test_concurrent_telemetry.py and test_telemetry_e2e.py) - They're already run in the isolated telemetry step - Running -m serial with --ignore on both files results in 0 tests (exit code 5) --------- Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: Jayant Singh <jayant.singh@databricks.com>

Jesse and others added 30 commits June 7, 2023 14:02

Use urllib3 for thrift transport + reuse http connections (#131)

5a3f83e

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Default socket timeout to 15 min (#137)

9ef50e8

Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com>

Bump version to 2.6.0 (#139)

dfabbdd

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Fix: some thrift RPCs failed with BadStatusLine (#141)

3d359bc

--------- Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Bump version to 2.6.1 (#142)

5379803

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

[ES-706907] Retry GetOperationStatus for http errors (#145)

8698039

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Bump version to 2.6.2 (#147)

bbe539e

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Use a separate logger for unsafe thrift responses (#153)

7fcfa7b

--------- Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Improve e2e test development ergonomics (#155)

fecfa88

--------- Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Don't raise exception when closing a stale Thrift session (#159)

8d70f6c

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Bump to version 2.7.0 (#161)

c351b57

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Update changelog for cloudfetch (#172)

0e5c244

Signed-off-by: Matthew Kim <11141331+mattdeekay@users.noreply.github.com>

Improve sqlalchemy backward compatibility with 1.3.24 (#173)

f45280d

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

OAuth: don't override auth headers with contents of .netrc file (#122)

7382631

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Relax pandas dependency constraint to allow ^2.0.0 (#164)

d7f76e4

Signed-off-by: Daniel Segesdi <daniel.segesdi@turbine.ai> Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> Co-authored-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Use hex string version of operation ID instead of bytes (#170)

207dd7c

--------- Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

SQLAlchemy: fix has_table so it honours schema= argument (#174)

22e5aaa

--------- Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Fix socket timeout test (#144)

1eef432

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> Co-authored-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Disable non_native_boolean_check_constraint (#120)

ec58144

--------- Signed-off-by: Bogdan Kyryliuk <b.kyryliuk@gmail.com> Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> Co-authored-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Remove unused import for SQLAlchemy 2 compatibility (#128)

728d33a

Signed-off-by: William Gentry <william.barr.gentry@gmail.com> Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> Co-authored-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Bump version to 2.8.0 (#178)

6a1d3b5

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Fix typo in python README quick start example (#186)

b894605

--------- Co-authored-by: Jesse <jesse.whitehouse@databricks.com>

Configure autospec for mocked Client objects (#188)

00a3928

Resolves #187 Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

Use urllib3 for retries (#182)

019acd8

Behaviour is gated behind `enable_v3_retries` config. This will be removed and become the default behaviour in a subsequent release. Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

vikrantpuppala and others added 30 commits August 22, 2025 11:17

Ready for 4.1.2 release (#685)

d9d070c

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

Increased the limit for long running query (#686)

415fb53

[PECOBLR-201] add variant support (#560)

048fae1

Ready for 4.1.3 (#693)

07c541b

[PECOBLR-681] added new session conf to enable metric view metadata (#…

54dd646

…688)

Add Token Federation Support for Databricks SQL Python Driver (#691)

f835aca

* token federation for python driver * address comment * address comments * lint * lint fix * nit * change import

Release version to 4.1.4 (#699)

30286ad

release-4.1.4

Bump to version 4.2.0 (#707)

cca421b

Bump to version 4.2.0 Signed-off-by: Jayant Singh <jayant.singh@databricks.com>

Ready for 4.2.1 release (#713)

ad227ca

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

Change default use_hybrid_disposition to False (#714)

b8494ff

This changes the default value of use_hybrid_disposition from True to False in the SEA backend, disabling hybrid disposition by default.

New minor release (#722)

ce55e7b

Fixed the exception handler close() on _TelemetryClientHolder (#723)

9b4e577

Fixed the exception handler calls close() on _TelemetryClientHolder objects instead of accessing the client inside them.

New minor version release 4.2.4 (#725)

03eb369

New minor version release

Bump to version 4.2.5 (#737)

9fe7356

Signed-off-by: Jayant Singh <jayant.singh@databricks.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test#1

Test#1
evb123 wants to merge 242 commits intooctoenergy:mainfrom
databricks:main

evb123 commented Jan 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Comments

Conversation

evb123 commented Jan 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Comments