Skip to content

Update github workflow setup #2728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 56 commits into from
Jun 11, 2025
Merged

Update github workflow setup #2728

merged 56 commits into from
Jun 11, 2025

Conversation

sh-rp
Copy link
Collaborator

@sh-rp sh-rp commented Jun 6, 2025

Description

We need to be more economical with our CI minutes. Proposed changes are:

  • Only run common tests tests if linter passes
  • Only run destination tests if common tests pass
  • Try to convert remote and possible local destinations into a matrix job so there is less margin of error

TODO before finalizing

  • * Remove -m "smoke" flag which was only added to speed up the test process
  • * Re-enable commented out tests in common
  • * Figure out correct concurrency settings.

Review checklist

  • Are all destinations tested and are the correct tests selected in the output?
  • Is the order of execution of the jobs correct? (first lint, then common tests, then all the other stuff)
  • Is the setup clear on first read? If not, explain what is confusing so I we can add comments

Copy link

netlify bot commented Jun 6, 2025

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit f0cffee
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/68496b2b6a0b850008ef36f5

@sh-rp sh-rp force-pushed the tests/change_workflow_targets branch from 71d42be to 5c1212c Compare June 6, 2025 14:01
@sh-rp sh-rp changed the title change trigger of workflow files to "pull_request" Update github workflow setup Jun 6, 2025
@sh-rp sh-rp force-pushed the tests/change_workflow_targets branch from 227c99c to 7f4c04e Compare June 6, 2025 15:58
@sh-rp sh-rp force-pushed the tests/change_workflow_targets branch from a7a3aea to 85b9880 Compare June 6, 2025 16:42
@sh-rp sh-rp force-pushed the tests/change_workflow_targets branch from 322fee4 to ca30439 Compare June 6, 2025 17:09
@sh-rp sh-rp force-pushed the tests/change_workflow_targets branch from 20e02e3 to 2e31b94 Compare June 6, 2025 17:15
extras: "postgres postgis parquet duckdb cli filesystem"
needs_postgres: true

# Clickhouse OSS (TODO: test with minio s3)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clickhouse oss is disabled for now, it would be good to make it work without any remote services, for example test it with s3 using minio

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm why? it is only using local filesystem. I'm running tests locally on it with network disconnected. we have compose file for it:

services:
  clickhouse:
    image: clickhouse/clickhouse-server
    ports:
      - "9000:9000"
      - "8123:8123"
    environment:
      - CLICKHOUSE_DB=dlt_data
      - CLICKHOUSE_USER=loader
      - CLICKHOUSE_PASSWORD=loader
      - CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1
    volumes:
      - clickhouse_data:/var/lib/clickhouse/
      - clickhouse_logs:/var/log/clickhouse-server/
    restart: unless-stopped
    healthcheck:
      test: [ "CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8123/ping" ]
      interval: 3s
      timeout: 5s
      retries: 5


volumes:
  clickhouse_data:
  clickhouse_logs:

Copy link
Collaborator Author

@sh-rp sh-rp Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it uses the local clickhouse but depends on s3, az etc. are you sure you can run the full test suite without network connection? I will uncomment it again and have another look tomorrow, but I think it will not work without secrets.

- name: postgres and redshift
# TODO: check wether duckdb and all filesystem drivers need to be here, if so, explain why
destinations: "[\"redshift\", \"postgres\", \"duckdb\", \"dummy\"]"
filesystem_drivers: "[\"memory\", \"file\", \"r2\", \"s3\", \"gs\", \"az\", \"abfss\", \"gdrive\"]" # excludes sftp
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need all these drivers here, I took this from test_destinations.yml, so it is a pretty ancient part of this setup and I was not sure if there are any combinations of the above destinations that need these bucket settings, but afaik filesystem_drivers only applies to filesystem tests, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are left there after I extracted filesystem to a separate workflow. I think you can leave memory and file. let's see what happens

extras: "postgres redshift postgis s3 gs az parquet duckdb"
with: ",adbc"

# Qdrant (disabled, TODO: explain why)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this workflow currently is disabled in the repo. I'd like to add a note here why exactly (because I don't know)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we lost our test account. we only test local version now

@sh-rp sh-rp force-pushed the tests/change_workflow_targets branch from fead3e6 to 83978ba Compare June 10, 2025 14:12
@sh-rp sh-rp marked this pull request as ready for review June 10, 2025 15:19
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for doing this!

branches: [ master, devel ]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this workflow will cancel all the workflows it started?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it will, it's pretty cool!

authorize_run_from_fork:
name: check if fork and if so wether secrets are available
# run when label is assigned OR when we are not a fork
if: ${{ github.event.label.name == 'ci from fork' || (github.event.pull_request.head.repo.full_name == github.repository && (github.event.action == 'opened' || github.event.action == 'synchronize'))}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now the pull_request_target is deactivated so this authorization is defunct (github.event.pull_request.head.repo.full_name == github.repository as always true)

we can enable PRs from forks in a separate ticket

# Destination and Sources local tests, do not provide secrets
# Other tests that do not require remote connections
#
test_destinations_local:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those run in parallel right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, all tests that have their conditions met start immediately, so you have a bunch of tests running in parallel which also include matrixes.

needs: test_common
uses: ./.github/workflows/test_sources_local.yml

test_plus:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe those 3 workflows should only depend on lint?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it

extras: "postgres postgis parquet duckdb cli filesystem"
needs_postgres: true

# Clickhouse OSS (TODO: test with minio s3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm why? it is only using local filesystem. I'm running tests locally on it with network disconnected. we have compose file for it:

services:
  clickhouse:
    image: clickhouse/clickhouse-server
    ports:
      - "9000:9000"
      - "8123:8123"
    environment:
      - CLICKHOUSE_DB=dlt_data
      - CLICKHOUSE_USER=loader
      - CLICKHOUSE_PASSWORD=loader
      - CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1
    volumes:
      - clickhouse_data:/var/lib/clickhouse/
      - clickhouse_logs:/var/log/clickhouse-server/
    restart: unless-stopped
    healthcheck:
      test: [ "CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8123/ping" ]
      interval: 3s
      timeout: 5s
      retries: 5


volumes:
  clickhouse_data:
  clickhouse_logs:

- name: postgres and redshift
# TODO: check wether duckdb and all filesystem drivers need to be here, if so, explain why
destinations: "[\"redshift\", \"postgres\", \"duckdb\", \"dummy\"]"
filesystem_drivers: "[\"memory\", \"file\", \"r2\", \"s3\", \"gs\", \"az\", \"abfss\", \"gdrive\"]" # excludes sftp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are left there after I extracted filesystem to a separate workflow. I think you can leave memory and file. let's see what happens

post_install_commands: "poetry run pip install sqlalchemy==2.0.18" # minimum version required by `pyiceberg`
additional_tests: "poetry run pytest tests/cli"

- name: postgres, duckdb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were also running dummy destination

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in the local tests, but I moved it here now from the remote tests where it also was before (in test_destinations.yml)

filesystem_drivers: "[\"memory\"]"
extras: "motherduck s3 gs az parquet"

# MSSQL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mssql was always running a full suite. this one is fast. we should do the same for remote postgres

extras: "postgres redshift postgis s3 gs az parquet duckdb"
with: ",adbc"

# Qdrant (disabled, TODO: explain why)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we lost our test account. we only test local version now

poetry run pytest tests/load --ignore tests/load/sources
poetry run pytest tests/cli
env:
DESTINATION__POSTGRES__CREDENTIALS: postgresql://loader:loader@localhost:5432/dlt_data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did those go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to this new file, see comment above. there is one step in the workflow that copies these back in.

@sh-rp sh-rp force-pushed the tests/change_workflow_targets branch from 4aa0388 to 1149593 Compare June 11, 2025 08:47
@sh-rp sh-rp force-pushed the tests/change_workflow_targets branch from e86fb4a to f0cffee Compare June 11, 2025 11:40
@sh-rp sh-rp merged commit 36ee706 into devel Jun 11, 2025
97 of 102 checks passed
@sh-rp sh-rp deleted the tests/change_workflow_targets branch June 11, 2025 13:09
zilto pushed a commit that referenced this pull request Jun 11, 2025
* use both pull request and pull request target on destination workflows

* remove additional triggers

* marks one test as smoke test and only runs this for the time being

* only run one test in common, needs to be reverted later

* run common tests only on linter success

* fix common workflow

* only start workflows on call (do not call them yet)

* test master workflow

* remove docs changes step from lint

* remove local destinations docs change

* rename master trigger workflows

* change concurrency key

* try other dependencies

* add destination tests with authorize step

* remove authorize and docs step from destination tests

* fix destination test

* rename main workflow

* test inherit secrets

* add more workflows to main file

* fix starting conditions for some workflows

* rename plus tests matrix job

* remove concurrency settings for now

* add first remote destinations workflow version

* move some more remote destinations

* remove pytest args

* try to fix extras string

* add more remote destination tests

* rename some workflows and add concurrency settings to main workflow

* move test_destinations

* fix link to called workflow

* add better main workflow labels
move clickhouse remote tests

* create local destinations test

* disabled some workflows

* disable clickhouse oss for now
split duckdb and postgres local tests into own matrix job

* copy ssh agent key

* move all local destination secrets into template secrets file

* small fixes

* enable all tests again

* fix local tests

* add missing openai dep

* try to fix qdrant creds

* fix qdrant server / local file differentiation

* fix cli test

* change workflow dependencies

* remove telemetry info and other small changes

* run dummy destination with the local tests

* remove duckdb from remote tests,
always run all mssql and postgres tests

* enable clickhouse oss

* fix condition for always running all tests

* move cli commands to postgres tests

* rename clickhouse-compose to be inline with other services

* fix clickhouse local credentials and disable tests which require staging destinations

* adapt postgres to postgres example to new fixture

* fix clickhouse excluded configs

* update essential test handling

* skip gcs compat test for local clickhouse tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants