From 6b48176dae2993e2940c0446188d54de5b521304 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Giedrius=20Statkevi=C4=8Dius?= Date: Tue, 17 Feb 2026 14:04:29 +0200 Subject: [PATCH 1/2] docs: rm CIRCLECI_SETUP It provides no value. --- docs/CIRCLECI_SETUP.md | 298 ----------------------------------------- 1 file changed, 298 deletions(-) delete mode 100644 docs/CIRCLECI_SETUP.md diff --git a/docs/CIRCLECI_SETUP.md b/docs/CIRCLECI_SETUP.md deleted file mode 100644 index 263b209..0000000 --- a/docs/CIRCLECI_SETUP.md +++ /dev/null @@ -1,298 +0,0 @@ -# CircleCI Setup for thanos-parquet-gateway - -This document provides a comprehensive overview of the CircleCI configuration set up for the thanos-parquet-gateway project, similar to the Thanos project's CI/CD pipeline. - -## Overview - -The CircleCI configuration provides: -- **Automated testing** on every commit -- **Docker image building and publishing** to Quay.io -- **Multi-architecture support** (linux/amd64, linux/arm64) -- **Release automation** with GitHub releases and artifacts - -## File Structure - -``` -.circleci/ -├── config.yml # Main CircleCI configuration - -.github/workflows/ # Alternative GitHub Actions (optional) -├── ci.yml # GitHub Actions workflow - -scripts/ -├── build-and-test.sh # Local development script - -Dockerfile # Multi-stage Docker build -.dockerignore # Docker build optimization -``` - -## CircleCI Jobs - -### 1. `test` -- Runs on every commit -- Executes unit tests with `make test-norace` -- Runs linting with `make lint` -- Generates and stores test coverage - -### 2. `cross_build` -- Runs on tagged releases only -- Builds binaries for multiple OS/architecture combinations -- Uses promu for cross-compilation -- Stores build artifacts - -### 3. `publish_main` -- Runs on `main` branch commits -- Builds multi-architecture Docker images -- Pushes to `quay.io/thanos-io/thanos-parquet-gateway` -- Tags with `latest` and git commit SHA - -### 4. `publish_release` -- Runs on version tags (v*.*.*) -- Creates release tarballs -- Builds and publishes Docker images with version tags -- Stores release artifacts - -## Version Management - -### How Versioning Works - -The project uses build-time version injection, following the same approach as Thanos and other Prometheus ecosystem projects: - -1. **Version Variables**: Defined in `pkg/version/version.go` -2. **Build-time Injection**: Go's `-ldflags` injects values at build time -3. **Prometheus Compatibility**: Uses `github.com/prometheus/common/version` for consistent output - -### Version Sources - -- **Git Tags**: Primary source for releases (`git describe --tags`) -- **Development Fallback**: Uses `v0.0.0-dev` when no tags exist -- **VERSION File**: Optional local override (not committed, for development) -- **Git Info**: Automatic fallback using `runtime/debug` build info - -### Version Components - -```bash -make version # Show current version information -``` - -Output includes: -- **VERSION**: Git tag or development version -- **REVISION**: Git commit SHA (short) -- **BRANCH**: Current git branch -- **BUILD_USER**: User@hostname who built the binary -- **BUILD_DATE**: ISO 8601 timestamp - -### Local Development - -For local development without git tags, you can optionally create a VERSION file: -```bash -echo "v0.1.0-dev" > VERSION # Optional: Set your development version -make build # Build with version injection -./parquet-gateway --version # Verify version -``` - -**Note**: The VERSION file is ignored by git and is only for local development convenience. - -### Release Process - -1. **Tag Release**: `git tag v1.0.0` -2. **Push Tag**: `git push origin v1.0.0` -3. **CI/CD**: Automatically builds and publishes with proper version - -## Docker Configuration - -### Multi-stage Build -The Dockerfile uses a multi-stage build pattern: -1. **Builder stage**: Go 1.24 Alpine with build tools and version injection -2. **Runtime stage**: Minimal Alpine Linux with CA certificates - -### Version Injection -Version information is injected at build time using Go's `-ldflags`: -- **Version**: Git tag or VERSION file content -- **Revision**: Git commit SHA (short) -- **Branch**: Git branch name -- **Build User**: User@hostname who built the binary -- **Build Date**: ISO 8601 timestamp of build - -This follows Thanos's approach using the Prometheus version package for consistent output format. - -### Security Features -- Non-root user (`thanos:thanos`) -- Minimal attack surface -- CA certificates for HTTPS -- Proper file permissions - -### Build Targets -- `make docker-build-local`: Simple local build (single architecture) -- `make docker-build`: Multi-architecture build for CI -- `make docker-test`: Test Docker image functionality -- `make docker-push`: Push to registry -- `make docker-manifest`: Create multi-arch manifest - -## Environment Variables - -The following environment variables must be configured in CircleCI: - -| Variable | Description | Required | -|----------|-------------|----------| -| `QUAY_USERNAME` | Quay.io username or robot account | Yes | -| `QUAY_PASSWORD` | Quay.io password or token | Yes | - -## Docker Registry - -Images are published to: -- **Registry**: `quay.io` -- **Repository**: `thanos-io/thanos-parquet-gateway` -- **Tags**: - - `latest`: Latest main branch - - ``: Specific commits - - `v*.*.*`: Release versions - -## Local Development - -### Prerequisites -- Docker with BuildKit support -- Go 1.24+ -- Make - -### Quick Start -```bash -# Build and test everything -./scripts/build-and-test.sh - -# Or manually: -make build # Build binary -make docker-build-local # Build Docker image -make docker-test # Test Docker image -``` - -### Testing CircleCI Locally -Install CircleCI CLI and run: -```bash -circleci config validate .circleci/config.yml -circleci local execute --job test -``` - -## Makefile Targets - -### Core Targets -- `make build`: Build the Go binary with version injection -- `make test`: Run tests with race detection -- `make test-norace`: Run tests without race detection -- `make lint`: Run linting and formatting -- `make version`: Display version information that will be injected - -### Version Targets -- `make version`: Show current version variables -- `make deps`: Ensure Go dependencies are up to date -- `make clean`: Clean build artifacts - -### Docker Targets -- `make docker-build-local`: Local Docker build -- `make docker-build`: Multi-arch Docker build -- `make docker-test`: Test Docker image -- `make docker-push`: Push to registry -- `make docker-manifest`: Create manifest - -### Release Targets -- `make crossbuild`: Build for multiple platforms -- `make tarballs-release`: Create release archives - -## GitHub Actions Alternative - -An alternative GitHub Actions workflow is provided in `.github/workflows/ci.yml` that offers: -- Similar functionality to CircleCI -- Better GitHub integration -- Free for public repositories -- Automatic release creation - -To use GitHub Actions instead of CircleCI: -1. Set up the same environment variables as GitHub secrets -2. Disable CircleCI -3. The workflow will trigger automatically - -## Workflow Triggers - -### CircleCI -- **All branches**: Run tests -- **Main branch**: Run tests + publish Docker images -- **Version tags**: Run tests + cross-build + publish release - -### GitHub Actions -- **All PRs**: Run tests -- **Main branch**: Run tests + publish Docker images -- **Version tags**: Run tests + publish release + create GitHub release - -## Troubleshooting - -### Common Issues - -1. **Docker rate limiting** - - Use Docker Hub credentials in CI - - Consider using a Docker registry mirror - -2. **Build failures** - - Check Go version compatibility - - Verify all dependencies are available - - Review test failures in CI logs - -3. **Registry push failures** - - Verify Quay.io credentials - - Check repository permissions - - Ensure registry URLs are correct - -### Debug Commands -```bash -# Test local build -make version # Show version info -make build && ./parquet-gateway --version - -# Test Docker build -make docker-build-local -make docker-test - -# Validate CircleCI config -circleci config validate .circleci/config.yml -``` - -## Security Considerations - -1. **Container Security** - - Non-root user execution - - Minimal base image (Alpine) - - No unnecessary packages - -2. **CI/CD Security** - - Environment variables for secrets - - Limited scope tokens for registry access - - Signed Docker images (future enhancement) - -3. **Access Control** - - Proper repository permissions - - Robot accounts for CI access - - Regular credential rotation - -## Future Enhancements - -1. **Security** - - Docker image signing with Cosign - - SBOM generation - - Vulnerability scanning - -2. **Performance** - - Build caching optimization - - Parallel test execution - - Registry mirror usage - -3. **Monitoring** - - Build notifications - - Performance metrics - - Dependency update automation - -## Support - -For issues with the CI/CD setup: -1. Check the CircleCI or GitHub Actions logs -2. Verify environment variables are set correctly -3. Test the build process locally -4. Review this documentation for common solutions From f220a65744d1ec69248fe0ae91616ae8bb885381 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Giedrius=20Statkevi=C4=8Dius?= Date: Tue, 17 Feb 2026 17:24:31 +0200 Subject: [PATCH 2/2] rfcs: add scalable converter Jot down some of my thoughts and share it with others. --- docs/rfcs/0001-scalable-converter.md | 126 +++++++++++++++++++++++++++ 1 file changed, 126 insertions(+) create mode 100644 docs/rfcs/0001-scalable-converter.md diff --git a/docs/rfcs/0001-scalable-converter.md b/docs/rfcs/0001-scalable-converter.md new file mode 100644 index 0000000..eefe7e7 --- /dev/null +++ b/docs/rfcs/0001-scalable-converter.md @@ -0,0 +1,126 @@ +# Scalable converters + +Currently, the design of the converter is very simple - you get some files in +and some files out. It totally makes sense since we are still in the infancy of +this project. However, as it is getting adopted more and more, there is a need +to do the conversion more quickly. For that, we should be able to spread the load +across multiple servers. + +Indeed, it is possible right now to use external labels to separate converters +and make a converter for each stream separately but that involves either some +automation that each user will need to do or, even worse, it is a manual job. + +Let’s say we would need to convert 800TiB of data with a 10Gbps network card. +According to this napkin math, it would take 177 hours just to download all of +that data. + +Another thing is that doing the planning in each service is wasteful - one has +to load all meta files, filter out what is not needed (if using filtering based +on external labels, for example), and do the planning step. For a long time +we've been talking about having a client/server architecture inside of Thanos +Compactor. Since we are still in the beginning, it seems like a good idea to +separate these two concerns since the start. + +This also prevents doing changes en masse. Let’s say Parquet evolves and we want +to enable some flashy new features on all the data. Or if we want to experiment +with setting a bigger page size on a subset of block streams. Having a +client/server architecture place from the get go would allow us to easily do +some modifications again. + +## Separation + +Internally, let's separate the planner and the converter into two separate +subsystems. Keep the same ability to run the planner and converter in one +process if the user so requires. Hence, let's add new and optional command line +flags to control the behaviour if separation is requested. + +This also begs the question: how to pass the "work items" from the planner to +the converter? For that, we will need to have some kind of queue from which the +converters will consume. + +A very traditional way of doing so would be to use a "heavy-weight" system like +Kafka or RabbitMQ. However, we would like to avoid adding such big dependencies +as per the development philosophy of Thanos - keep everything as simple as +possible. + +Also, note that all conversions (if the parameters do not change) are idempotent +so we can do the same conversion many times and it does not impact the system +negatively except for extraneous remote object storage operations that could be +avoided. + +## Implementation + +Inside of Cortex, they convert block and put data inside of the same directory +so it is possible to reuse the same sharding mechanism i.e. the same that the +Compactor component has. Inside of this project, we decided to use the day of +the data inside of a block as a top-level key so it doesn't work for us. Cortex +also supports rings everywhere so a user of Cortex already needs to have some +extra system which could maintain the state of the ring. + +There are two separate concerns that we need to take care of: how to pass down +the units of work to converters and how the planner component could know whether +progress is being done, and when to pass the units of work again. + +- For passing down the unit of work this part can be abstracted and any modicum + can be used - Redis, memcached, the same remote object storage, etc., in + general any key/value storage. All such systems are called "storage" from here + on. The "queue" can look like the following objects: + +```raw +/converter//_0.json +/converter//_1.json +/converter//_2.json +... +``` + +We will need to identify each planner and converter with some string so it will +also reflect in the object path. Having this name will allow users to run +multiple converter/planner pairs on the same storage. + +On each planner loop, generate a ULID that will uniquely identify that work +stream. + +At the moment, the only work that can be done is "conversion" so it will contain +data about which blocks to download and convert. + +After uploading all the work, the planner will ping through gRPC the workers +that need work is available and that they need to reset their internal history. + +After that, the workers will start downloading the files and doing the work. +Deleting a object will "consume" it and act as a signal that the worker for +which the deletion succeeded, can do the work. + +The workers must through gRPC expose what they are doing (or what has been +done). This will act as a guardrail and also it will allow us to have nice logs +inside of the planner. + +Periodically, the planner shall check if the work has been completed and whether +it can try to do the planning again. + +Let's use DNS to discover all workers. + +## States + +1. At the start, the worker has no history and does nothing. +2. If it receives a ping through gRPC from the planner, it shall mark that it in + its state and start trying to do work. In other words, it should list objects + and try to consume them. +3. Periodiocally check state to see if no work is queued anymore. If not then do + the planning step again. +4. What if some file disappears while consuming is happening? It's not an issue + because we will just do the planning again. We can repeat all operations + because conversion is idempotent. +5. If a worker dies in the middle then it's not an issue and we can always retry + again due to idempotency. + +The planner should maintain state of what jobs have been previously queued so +that it could know if some job is being done over and over again to inform the +operator about this. + +In practice, we could move the keeping of the state completely to object +storage but that would mean extra writes and reads, and doing it +synchronously through gRPC means that users are more protected against +accidental errors. + +We should use `If-None-Match` (or equivalent) to ensure that objects are +only written once as a defensive measure.