Skip to content

feat(starrocks): add backend#12017

Open
geoHeil wants to merge 4 commits into
ibis-project:mainfrom
geoHeil:2026-06-06-starrocks-backend
Open

feat(starrocks): add backend#12017
geoHeil wants to merge 4 commits into
ibis-project:mainfrom
geoHeil:2026-06-06-starrocks-backend

Conversation

@geoHeil

@geoHeil geoHeil commented Jun 6, 2026

Copy link
Copy Markdown

Summary

  • add a StarRocks backend over the MySQL protocol, using SQLGlot's StarRocks dialect and StarRocks type mapper
  • add StarRocks test schema/data loading, backend-specific tests, docs, install metadata, labels, and lock metadata
  • add a backend CI entry that starts StarRocks from ascii-supply-networks/flaky-stars-on-the-rocks pinned at dc5391f572431e7151e2120898be112bb5ed29d0

Resolves #8423

Validation

  • uv run --extra starrocks --group tests pytest ibis/backends/starrocks/tests/test_compiler.py ibis/backends/starrocks/tests/test_datatypes.py -q
  • uv run --extra starrocks --group tests pytest -m starrocks --collect-only -q
  • uv lock --check && git diff --check && bash -n ci/start-starrocks.sh
  • uv run pre-commit run --files ... passed available hooks; actionlint-system could not run because actionlint is not installed locally
  • nix run nixpkgs#actionlint -- .github/workflows/ibis-backends.yml
  • local ci/start-starrocks.sh on macOS fails before FE readiness because the StarRocks wrapper requires Homebrew-provided GNU getopt on Darwin; the GitHub Actions path is Ubuntu and installs mariadb-client/util-linux getopt

@github-actions github-actions Bot added docs Documentation related issues or PRs tests Issues or PRs related to tests ci Continuous Integration issues or PRs dependencies Issues or PRs related to dependencies sql Backends that generate SQL labels Jun 6, 2026
@geoHeil geoHeil mentioned this pull request Jun 6, 2026
1 task
@geoHeil geoHeil force-pushed the 2026-06-06-starrocks-backend branch from c962345 to c8ced01 Compare June 6, 2026 20:08
@geoHeil geoHeil marked this pull request as ready for review June 7, 2026 06:22
@geoHeil

geoHeil commented Jun 10, 2026

Copy link
Copy Markdown
Author

CI triage update:

  • Pushed c5cac39da to fix the Docs PR test failure. The failed doctest was fetching ibis.examples.us_rent_income through pins/GCS before exercising pivot_wider; the example table fetch is now skipped in that doctest.
  • The ClickHouse 3.14 failure looks unrelated to this PR: just up clickhouse failed while pulling from Docker Hub with context deadline exceeded.
  • The DuckDB Windows 3.10 failure also looks unrelated to the StarRocks changes. It is the same pins/GCS TypeError: the JSON object must be str, bytes or bytearray, not NoneType failure while fetching example data.
  • The StarRocks 3.10 job is the PR-specific runner problem: it fails during ci/start-starrocks.sh while Nix is materializing StarRocks, before backend tests run, with No space left on device from the Actions runner log path. Could maintainers advise whether this job can run on a larger/self-hosted runner label with more disk? The standard public ubuntu-latest runner only has 14 GB SSD, and this StarRocks setup appears to exceed that during Nix materialization.

@deepyaman

Copy link
Copy Markdown
Collaborator

CI triage update:

  • Pushed c5cac39da to fix the Docs PR test failure. The failed doctest was fetching ibis.examples.us_rent_income through pins/GCS before exercising pivot_wider; the example table fetch is now skipped in that doctest.
  • The ClickHouse 3.14 failure looks unrelated to this PR: just up clickhouse failed while pulling from Docker Hub with context deadline exceeded.
  • The DuckDB Windows 3.10 failure also looks unrelated to the StarRocks changes. It is the same pins/GCS TypeError: the JSON object must be str, bytes or bytearray, not NoneType failure while fetching example data.
  • The StarRocks 3.10 job is the PR-specific runner problem: it fails during ci/start-starrocks.sh while Nix is materializing StarRocks, before backend tests run, with No space left on device from the Actions runner log path. Could maintainers advise whether this job can run on a larger/self-hosted runner label with more disk? The standard public ubuntu-latest runner only has 14 GB SSD, and this StarRocks setup appears to exceed that during Nix materialization.

Yeah, feel free to ignore issues on other backends. I'll also try to find some proper time to review the backend in the next week or so. 🤞

@geoHeil

geoHeil commented Jun 11, 2026

Copy link
Copy Markdown
Author

we should decide if we want to have this nix based or docker. currently the official - non-community build is on docker - and possilby would not need the additional CI runner capacity

@deepyaman

Copy link
Copy Markdown
Collaborator

we should decide if we want to have this nix based or docker. currently the official - non-community build is on docker - and possilby would not need the additional CI runner capacity

Would be inclined to go Docker-based, following the pattern used by the majority of backends. I don't think getting a larger runner is reasonable, since it would require funding from somewhere (but I'm not an expert on this/wouldn't have the rights to approve that change anyway).

@geoHeil geoHeil force-pushed the 2026-06-06-starrocks-backend branch from 47e3cb0 to 36e4dd4 Compare June 14, 2026 11:38
@NickCrews

Copy link
Copy Markdown
Contributor

I am curious what other maintainers feel about this (controversial) statement: I don't think ibis is in any position to be taking on new backends. We have 400 open issues and 93 open PRs and no real maintainership. I think we should focus our already overloaded maintenance efforts on keeping what we have working with new versions of existing backends, and fixing bugs, not adding new maintenance burden. I'm sorry @geoHeil, this is absolutely not what you want to hear. I would love to hear if you have any suggestions on how to reconcile this. What is preventing you from implementing this backend in a separate repo that we would not have to maintain, similar to gizmoSQL?

@geoHeil

geoHeil commented Jun 16, 2026

Copy link
Copy Markdown
Author

I am curious what other maintainers feel about this (controversial) statement: I don't think ibis is in any position to be taking on new backends. We have 400 open issues and 93 open PRs and no real maintainership. I think we should focus our already overloaded maintenance efforts on keeping what we have working with new versions of existing backends, and fixing bugs, not adding new maintenance burden. I'm sorry @geoHeil, this is absolutely not what you want to hear. I would love to hear if you have any suggestions on how to reconcile this. What is preventing you from implementing this backend in a separate repo that we would not have to maintain, similar to gizmoSQL?

honstely - nothing. I can build this as a separate package. But I was thinking that for end users it is much more convenient to have oone place - and not X different ibis-X.

Unless ibis would from the start built around pluggability and everything would follow that standard ...

So should I close this and make some ibis-starrocks?

Further what are your plans for standardizing all the components of ibis into such a plugin ecosystem?

@deepyaman

Copy link
Copy Markdown
Collaborator

I am curious what other maintainers feel about this (controversial) statement: I don't think ibis is in any position to be taking on new backends. We have 400 open issues and 93 open PRs and no real maintainership. I think we should focus our already overloaded maintenance efforts on keeping what we have working with new versions of existing backends, and fixing bugs, not adding new maintenance burden. I'm sorry @geoHeil, this is absolutely not what you want to hear. I would love to hear if you have any suggestions on how to reconcile this. What is preventing you from implementing this backend in a separate repo that we would not have to maintain, similar to gizmoSQL?

FWIW I don't fully agree with this; I think should 100% be deliberate about which backends to accept, always but especially at this time because of the maintenance challenges. That said, I think there is a tangible difference between StarRocks (widely used, brings users and name recognition to Ibis) vs. GizmoSQL (new, exciting, but not as well-known or adopted).

@tokoko

tokoko commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

I agree, Starrocks has become too important to ignore.

another note though... and I truly feel guilty saying this 😆 since this is reusing some of mysql backend, how would you feel about delaying until we merge #11958? I can ping Columnar folks and try to get them to release the new mysql driver soon, hopefully.

@geoHeil

geoHeil commented Jun 17, 2026

Copy link
Copy Markdown
Author

Fine with me

@NickCrews

Copy link
Copy Markdown
Contributor

OK, I can be overruled here. Long term I would like ibis to structure itself so that backends could better support themselves. But until then we can keep bringing them into core.

+1 to waiting for the adbc PR to land, that will help ease the burden a lot for us here in ibis.

Out of curiosity, I made this script to check the PyPI popularity of the ibis backends. Curious if yall have ideasas to better metrics to use to measure "popularity". Run with uv run https://gist.github.com/NickCrews/a10156b48d6031d7dcb33dd68efbdc24. It gives this plot on june 17 2026.
ibis_backend_popularity

This confirms that starrocks is 3x more popular than our existing flink backend, but it still is near the bottom of the pack. IDK if we want to make any more of an official stance on this, eg only support backends with more than 1M monthly pypi downloads. Or perhaps 500k. Then no hard feelings.

@deepyaman

Copy link
Copy Markdown
Collaborator

IDK if we want to make any more of an official stance on this, eg only support backends with more than 1M monthly pypi downloads. Or perhaps 500k. Then no hard feelings.

I wouldn't bother encoding it as a hard and fast rule; I think it's an interesting data point, but PyPI downloads isn't a perfect proxy (could just be something that gets used as a dependency of a popular package), and should probably be combined with other aspects to make the final determination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continuous Integration issues or PRs dependencies Issues or PRs related to dependencies docs Documentation related issues or PRs sql Backends that generate SQL tests Issues or PRs related to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

StarRocks backend support

4 participants