Skip to content

Conversation

@hiroyukinakazato-db
Copy link

@hiroyukinakazato-db hiroyukinakazato-db commented Oct 6, 2025

Changes

This PR adds the llm-transpile command for LLM-powered SQL conversion using the Switch transpiler.

What does this PR do?

Adds llm-transpile CLI command that runs Switch transpiler jobs with parameter passing support.

Relevant implementation details

CLI Integration:

  • Add llm-transpile command to Lakebridge CLI
  • Input source validation (workspace paths and local files)
  • Parameter passing to Switch job runs

Switch Runner Implementation:

  • SwitchConfig: manages Switch resources and job ID retrieval from InstallState
  • SwitchRunner: orchestrates Switch job execution with parameters

Testing:

  • Unit tests for Switch runner with parameter verification
  • Integration tests for Switch installation lifecycle

Development Environment:

  • Add .env to .gitignore for local development credentials

Caveats/things to watch out for when reviewing:

  • Parameter design: Follows transpile and recon command patterns
  • Catalog/schema usage: Uses values configured during Switch installation (following recon pattern)
  • Output parameter naming: Uses --output-ws-folder (not --output-folder) to explicitly indicate workspace folder
  • Dependencies: Requires PR Add Switch transpiler with --include-llm-transpiler flag #2066 (Switch installation) to be merged first
~ ❯ databricks labs lakebridge llm-transpile --input-source $HOME/IdeaProjects/switch/examples/workflow/airflow/input --output-ws-folder /Workspace/Users/<>/transpiled --source-dialect airflow

17:23:54     INFO [d.labs.lakebridge] Please read and accept the following comments before proceeding:
        This Feature leverages a large language model (LLM) to analyse and convert your provided content, code and data.
        You consent to your content being transmitted to, processed by, and returned from the LLM hosted by Databricks foundational models or other external models you may configure during the runtime.
        The outputs of the LLM are generated automatically without human review, and may contain inaccuracies or errors.
        You are responsible for reviewing and validating all outputs before relying on them for any critical or production use.
        By running this feature you accept these conditions.

Enter catalog name (default: lakebridge): lakebridge
17:24:11     INFO [d.l.l.deployment.configurator] Found existing catalog `lakebridge`
Enter schema name (default: switch):
17:24:15     INFO [d.l.l.deployment.configurator] Found existing schema `switch` in catalog `lakebridge`
Enter volume name (default: switch_volume):
17:24:18     INFO [d.l.l.deployment.configurator] Found existing volume `switch_volume` in catalog `lakebridge` and schema `switch`
Select a Foundation Model serving endpoint:
[0] [Recommended] databricks-claude-sonnet-4-5
[1] databricks-bge-large-en
[2] databricks-claude-3-7-sonnet
[3] databricks-claude-opus-4
[4] databricks-claude-opus-4-1
[5] databricks-claude-sonnet-4
[6] databricks-gemini-2-5-flash
[7] databricks-gemini-2-5-pro
[8] databricks-gemma-3-12b
[9] databricks-gpt-5
[10] databricks-gpt-5-mini
[11] databricks-gpt-5-nano
[12] databricks-gpt-oss-120b
[13] databricks-gpt-oss-20b
[14] databricks-gte-large-en
[15] databricks-llama-4-maverick
[16] databricks-meta-llama-3-1-405b-instruct
[17] databricks-meta-llama-3-1-8b-instruct
[18] databricks-meta-llama-3-3-70b-instruct
[19] databricks-qwen3-next-80b-a3b-instruct
[20] databricks-shutterstock-imageai
Enter a number between 0 and 20: 0
17:24:32     INFO [d.l.l.transpiler.switch_runner] Uploading /Users/<>/IdeaProjects/switch/examples/workflow/airflow/input to /Volumes/lakebridge/switch/switch_volume/input_20251105115432_n2iz...
17:24:33     INFO [d.l.l.transpiler.switch_runner] Upload complete: /Volumes/lakebridge/switch/switch_volume/input_20251105115432_n2iz
17:24:33     INFO [d.l.l.transpiler.switch_runner] Triggering Switch job with job_id: job_id
17:24:34     INFO [d.l.l.transpiler.switch_runner] Switch LLM transpilation job started: https://<workspacename>/jobs/job_id/runs/run_id
[
  {
    "job_id": job_id,
    "run_id": run_id,
    "run_url": "https://<workspacename/jobs/job_id/runs/run_id"
  }
]%

Linked issues

Resolves #2047

Functionality

  • added relevant user documentation
  • added new CLI command: databricks labs lakebridge llm-transpile
  • modified existing command

Tests

  • manually tested
  • added unit tests
  • added integration tests

@hiroyukinakazato-db hiroyukinakazato-db added enhancement New feature or request feat/cli actions that are visible to the user labels Oct 6, 2025
@github-actions
Copy link

github-actions bot commented Oct 6, 2025

✅ 51/51 passed, 9 flaky, 3m31s total

Flaky tests:

  • 🤪 test_validate_mixed_checks (160ms)
  • 🤪 test_validate_invalid_schema_path (1ms)
  • 🤪 test_validate_successful_schema_check (184ms)
  • 🤪 test_validate_non_empty_tables (11ms)
  • 🤪 test_validate_invalid_schema_check (1ms)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (13.936s)
  • 🤪 test_transpiles_informatica_to_sparksql (15.902s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (17.204s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (4.268s)

Running from acceptance #2885

Implement llm-transpile command for LLM-based code transpilation:
- Add SwitchInstaller for Switch transpiler package management
  - Install Switch package and deploy to workspace
  - Create and manage Databricks jobs with job-level parameters
  - Configure Switch resources (catalog, schema, volume)
- Add SwitchRunner for executing Switch transpilation jobs
  - Upload source files to workspace volume
  - Execute transpilation via Databricks job
  - Download results and handle job lifecycle
- Add llm-transpile CLI command with Switch transpiler support
- Add comprehensive unit and integration tests
Move _get_switch_package_path() from WorkspaceInstallation to SwitchDeployment
as a protected method, following Single Responsibility Principle. SwitchDeployment
now resolves its own package path internally.

Changes:
- Add _get_switch_package_path() protected method to SwitchDeployment
- Update SwitchDeployment.install() signature to remove path parameter
- Remove duplicate _get_switch_package_path() from WorkspaceInstallation
- Remove unused sys and TranspilerRepository imports from installation.py
- Update tests to use new interface with mocked path resolution
Update test_installation.py to match the refactored SwitchDeployment.install()
interface that now takes only resources parameter (path resolution is internal).

Changes:
- Remove switch_repository fixture parameter from test methods
- Delete unused _StubTranspilerRepository stub class
- Remove unused imports (Path, TranspilerRepository)
- Update assertions to check only resources argument

The tests verify that:
1. Switch installation uses configured resources correctly
2. Missing resources logs appropriate error message
Sync with main branch to incorporate latest documentation updates

# Conflicts:
#	labs.yml
The wait_for_completion option is intended for local CLI execution only
and should not be included in Databricks job parameters. This change
filters it out when building job parameter definitions.

Changes:
- Add excluded_options set to filter local-only options
- Skip wait_for_completion when converting config.yml options
- Add test using FriendOfSwitchDeployment pattern to verify exclusion
…ntrol

Add test_switch_install_with_transpile for full workflow testing including
job execution and output verification. Test automatically skips without
LAKEBRIDGE_SWITCH_E2E=true environment variable.

Refactor existing test to test_switch_install and extract helper functions
for DRY implementation. All changes in single file with no CI impact.
Merged latest changes from main branch including:
- Profiler skeleton and Synapse profiler scripts
- Transpiler product_name to transpiler_id rename
- Additional transpile command arguments support
- Test improvements for MSSQL and transpiler repository
@sundarshankar89 sundarshankar89 changed the base branch from main to switch-integration October 27, 2025 06:31
@sundarshankar89 sundarshankar89 added the stacked PR Should be reviewed, but not merged label Oct 27, 2025
@sundarshankar89 sundarshankar89 changed the base branch from switch-integration to main October 27, 2025 06:32
Copy link
Contributor

@asnare asnare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we need to do is make sure we're consistently using the term foundation model, not foundational model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file intentionally blank?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to remove this, since tests for test_cli_llm_transpile.py cover tests for switch_runner.py

Comment on lines +963 to +964
except Exception as ex:
raise RuntimeError(ex) from ex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure what this is for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case there is an error while running the job!!

Comment on lines +53 to +59
def upload_to_volume(
self,
local_path: Path,
catalog: str,
schema: str,
volume: str,
) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should skip files and directories that start with a period (.).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a small logic for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feat/cli actions that are visible to the user stacked PR Should be reviewed, but not merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Add Switch transpiler CLI integration and testing

5 participants