Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
a0612a8
feat: add LLM-based primary key detection with clean dependency injec…
jominjohny Sep 17, 2025
8725a75
Merge branch 'main' into llm_based_pk_identification
mwojtyczka Sep 18, 2025
c69c933
updated the dependency version and some fixes
jominjohny Sep 18, 2025
a5bae54
Merge branch 'llm_based_pk_identification' of github.com:databricksla…
jominjohny Sep 18, 2025
4b60ac1
refactor
mwojtyczka Sep 18, 2025
c149cd6
Merge remote-tracking branch 'origin/llm_based_pk_identification' int…
mwojtyczka Sep 18, 2025
5e181bb
refactor
mwojtyczka Sep 18, 2025
85f1522
refactor
mwojtyczka Sep 18, 2025
ced8ba5
refactor
mwojtyczka Sep 18, 2025
7809228
refactor
mwojtyczka Sep 18, 2025
477c9c1
fixes added
jominjohny Sep 18, 2025
5477222
table_name changed to table
jominjohny Sep 18, 2025
fcc31df
fixes added
jominjohny Sep 18, 2025
cf4f5c1
fmt fix added
jominjohny Sep 19, 2025
e156c34
Merge branch 'main' into llm_based_pk_identification
mwojtyczka Sep 19, 2025
8695c57
table_name to table
jominjohny Sep 22, 2025
0f5e394
fmt issues fixed
jominjohny Sep 22, 2025
3c07faf
Update demos/dqx_llm_demo.py
jominjohny Sep 23, 2025
2a4ee58
Update docs/dqx/docs/guide/data_profiling.mdx
jominjohny Sep 23, 2025
eaaf3a6
updated the review comments
jominjohny Oct 1, 2025
792cf8d
Merge branch 'main' into llm_based_pk_identification
jominjohny Oct 1, 2025
fb97038
Merge branch 'llm_based_pk_identification' of github.com:databricksla…
jominjohny Oct 1, 2025
f72ccf6
Merge branch 'main' into llm_based_pk_identification
mwojtyczka Oct 2, 2025
0b88dee
Merge branch 'main' into llm_based_pk_identification
mwojtyczka Oct 3, 2025
4789fa1
Merge branch 'main' into llm_based_pk_identification
jominjohny Oct 6, 2025
c6c096a
Merge branch 'main' into llm_based_pk_identification
mwojtyczka Oct 7, 2025
68603b0
fix added
jominjohny Oct 10, 2025
1773ee3
Merge branch 'main' into llm_based_pk_identification
jominjohny Oct 10, 2025
ece62da
Merge branch 'main' into llm_based_pk_identification
jominjohny Oct 16, 2025
4d325c6
Merge branch 'main' into llm_based_pk_identification
mwojtyczka Oct 16, 2025
47304bb
Merge branch 'main' into llm_based_pk_identification
mwojtyczka Oct 18, 2025
dbed293
compare dataset changes
jominjohny Oct 21, 2025
145ed70
Merge branch 'llm_based_pk_identification' of github.com:databricksla…
jominjohny Oct 21, 2025
6fdf927
Update demos/dqx_llm_demo.py
jominjohny Oct 21, 2025
dc542f3
Update demos/dqx_llm_demo.py
jominjohny Oct 21, 2025
2afc615
Update docs/dqx/docs/guide/data_profiling.mdx
jominjohny Oct 21, 2025
fd89fa7
Update demos/dqx_llm_demo.py
jominjohny Oct 21, 2025
1d958a9
Update docs/dqx/docs/guide/data_profiling.mdx
jominjohny Oct 21, 2025
9db5068
fixes added
jominjohny Oct 21, 2025
a415cb7
Merge branch 'llm_based_pk_identification' of github.com:databricksla…
jominjohny Oct 21, 2025
033cc3d
fix
jominjohny Oct 21, 2025
d44cad8
fix added
jominjohny Oct 21, 2025
c9d97d9
fix
jominjohny Oct 21, 2025
84bf2c9
Merge branch 'main' into llm_based_pk_identification
mwojtyczka Oct 21, 2025
2b04529
fix added
jominjohny Oct 22, 2025
511d203
integration test fixed
jominjohny Oct 27, 2025
fd54d69
fixes added
jominjohny Oct 31, 2025
ca82ba0
fixes added
jominjohny Oct 31, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 3 additions & 36 deletions .github/workflows/acceptance.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,18 +47,12 @@ jobs:
run: make test

# Integration tests are run from within tests/integration folder.
# Create .coveragerc with correct relative path to source code.
# We need to make sure .coveragerc is there so that code coverage is generated for the right modules.
- name: Prepare code coverage configuration for integration tests
run: |
cat > tests/integration/.coveragerc << EOF
[run]
source = ../../src
relative_files = true
EOF
run: cp .coveragerc tests/integration

# Run tests from `tests/integration` as defined in .codegen.json
# and generate code coverage for modules defined in .coveragerc
# Run 10 tests in parallel: https://github.com/databrickslabs/sandbox/blob/main/acceptance/ecosystem/pytest_run.py
- name: Run integration tests and generate test coverage report
uses: databrickslabs/sandbox/acceptance@acceptance/v0.4.4
with:
Expand All @@ -69,14 +63,8 @@ jobs:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ARM_CLIENT_ID: ${{ secrets.ARM_CLIENT_ID }}
ARM_TENANT_ID: ${{ secrets.ARM_TENANT_ID }}
COVERAGE_FILE: ${{ github.workspace }}/.coverage # make sure the coverage report is preserved

- name: Merge coverage reports and convert them to XML
run: |
hatch run combine_coverage

# Recursively search the entire workspace directory for all coverage reports.
# All uploaded test coverage reports will be used even if publish is done multiple time.
# collects all coverage reports: coverage.xml from integration tests, coverage-unit.xml from unit tests
- name: Publish test coverage
uses: codecov/codecov-action@v5
with:
Expand Down Expand Up @@ -107,16 +95,6 @@ jobs:
- name: Install hatch
run: pip install hatch==1.15.0

# Integration tests are run from within tests/integration folder.
# Create .coveragerc with correct relative path to source code.
- name: Prepare code coverage configuration for integration tests
run: |
cat > tests/integration/.coveragerc << EOF
[run]
source = ../../src
relative_files = true
EOF

- name: Run integration tests on serverless cluster
uses: databrickslabs/sandbox/acceptance@acceptance/v0.4.4
with:
Expand All @@ -128,17 +106,6 @@ jobs:
ARM_CLIENT_ID: ${{ secrets.ARM_CLIENT_ID }}
ARM_TENANT_ID: ${{ secrets.ARM_TENANT_ID }}
DATABRICKS_SERVERLESS_COMPUTE_ID: ${{ env.DATABRICKS_SERVERLESS_COMPUTE_ID }}
COVERAGE_FILE: ${{ github.workspace }}/.coverage # make sure the coverage report is preserved

- name: Merge coverage reports and convert them to XML
run: |
hatch run combine_coverage

# collects all coverage reports
- name: Publish test coverage
uses: codecov/codecov-action@v5
with:
use_oidc: true

e2e:
if: github.event_name == 'pull_request' && !github.event.pull_request.draft && !github.event.pull_request.head.repo.fork
Expand Down
41 changes: 5 additions & 36 deletions .github/workflows/nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,10 @@ jobs:
- name: Run unit tests and generate test coverage report
run: make test

# Integration tests are run from within tests/integration folder.
# Create .coveragerc with correct relative path to source code.
- name: Prepare code coverage configuration for integration tests
run: |
cat > tests/integration/.coveragerc << EOF
[run]
source = ../../src
relative_files = true
EOF
# Acceptance tests are run from within tests/integration folder.
# We need to make sure .coveragerc is there so that code coverage is generated for the right modules.
- name: Prepare .coveragerc for integration tests
run: cp .coveragerc tests/integration

# Run tests from `tests/integration` as defined in .codegen.json
# and generate code coverage for modules defined in .coveragerc
Expand All @@ -60,13 +55,8 @@ jobs:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ARM_CLIENT_ID: ${{ secrets.ARM_CLIENT_ID }}
ARM_TENANT_ID: ${{ secrets.ARM_TENANT_ID }}
COVERAGE_FILE: ${{ github.workspace }}/.coverage # make sure the coverage report is preserved

- name: Merge coverage reports and convert them to XML
run: |
hatch run combine_coverage

# collects all coverage reports
# collects all coverage reports: coverage.xml from integration tests, coverage-unit.xml from unit tests
- name: Publish test coverage
uses: codecov/codecov-action@v5
with:
Expand All @@ -93,16 +83,6 @@ jobs:
- name: Install hatch
run: pip install hatch==1.15.0

# Integration tests are run from within tests/integration folder.
# Create .coveragerc with correct relative path to source code.
- name: Prepare code coverage configuration for integration tests
run: |
cat > tests/integration/.coveragerc << EOF
[run]
source = ../../src
relative_files = true
EOF

- name: Run integration tests on serverless cluster
uses: databrickslabs/sandbox/acceptance@acceptance/v0.4.4
with:
Expand All @@ -115,17 +95,6 @@ jobs:
ARM_CLIENT_ID: ${{ secrets.ARM_CLIENT_ID }}
ARM_TENANT_ID: ${{ secrets.ARM_TENANT_ID }}
DATABRICKS_SERVERLESS_COMPUTE_ID: ${{ env.DATABRICKS_SERVERLESS_COMPUTE_ID }}
COVERAGE_FILE: ${{ github.workspace }}/.coverage # make sure the coverage report is preserved

- name: Merge coverage reports and convert them to XML
run: |
hatch run combine_coverage

# collects all coverage reports
- name: Publish test coverage
uses: codecov/codecov-action@v5
with:
use_oidc: true

e2e:
environment: tool
Expand Down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@ htmlcov/
nosetests.xml
coverage.xml
coverage-unit.xml
coverage-integration.xml
*.cover
*.py,cover
.hypothesis/
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ clean: docs-clean
.venv/bin/python:
pip install hatch
hatch env create
hatch run pip install ".[llm,pii]"

dev: .venv/bin/python
@hatch run which python
Expand Down
Loading
Loading