Description
Motivation
Today Druid integration tests are slow to run in Travis, and painful to use as a developer. We wish to fix this issue.
This proposal starts the discussion. A prototype of the redesign is in progress. However, the integration tests are complex: let's get more eyes on the issue to ensure we have a workable plan to modernize the tests.
Proposed changes
Proposed is a restructuring of the integration tests to allow faster Travis runs and far easier debugging. Key goals include:
- Speed-up the Druid test image build by avoiding download of dependencies. (Instead, any such dependencies are managed by Maven and reside in the local build cache.)
- Use official images for dependencies to avoid the need to download, install, and manage those dependencies.
- Ensure that it is easy to manually build the image, launch a cluster, and run a test against the cluster.
- Convert tests to JUnit so that they will easily run in your favorite IDE, just like other Druid tests.
- Use the actual Druid build from
distribution
so we know what is tested. - Leverage, don't fight, Maven.
- Run the integration tests easily on a typical development machine.
This project is mostly a matter of managing many details. Rather than spell out the gory details here, please see the project documentation in the prototype branch.
Rationale
The current integration tests are characterized by a number of quirks resulting from their evolution.
- The tests run in Maven before the
distribution
project, forcing the integration tests to do their own build. This means that what we test is not the software we "ship." - The tests are in a single Maven project. That single project can start the cluster, run tests, and shut down the cluster. Since we have multiple test groups, this means that we need a separate Travis run for each test group.
- The test image contains Druid plus dependencies such as MySQL, ZooKeeper and Kafka. Each build pulls these dependencies down from the public repository, resulting in very slow image builds (and unkind load on the upstream repositories.)
- The plumbing (scripts, Maven tasks, etc.) show their age: they are very complex and quite difficult to understand and extend.
- Tests are based on the TestNG framework and are hard to run within an IDE.
- The directory mounted into the container is placed in the user's home directory, which makes it hard to manage and outside of the Maven build tree. When run locally, tests share the same directory, leading to non-deterministic results.
Each of these issues contains the seed of its resolution:
- Move tests to run in Maven after
distribution
so that the tests run against the artifacts produced by the main Maven build. This eliminates the need for a second build within the tests. - Split the tests into multiple Maven subprojects. Each can start a Docker cluster as that test needs, run the test, and shuts down the cluster. Maven will step through the subprojects (former test groups) one by one in a single run.
- MySQL, ZK and Kafka all provide "official" images. We can use those to avoid pulling down the software on each build. We can use Maven to pull down things like the MySQL driver, MariaDB driver and Kafka protobuf provider. Both Docker and Maven provide caches, so that we hit the upstream repositories only when the dependencies change, not once per test run.
- By splitting tests into multiple projects, we can simplify the "plumbing": each test project performs the setup that it needs, without complex scripts and and if-statements.
- TestNG usage is replaced by the JUnit library which has built-in IDE support. A bit of shim code ensures that most test code can remain unchanged.
- The shared directory mounted into the containers now resides in each project's
target
folder, so that Maven will clean up the directory the same way it cleans all other build artifacts.
Migration
The Druid integration tests represent a large amount of code. One-shot conversion will not be possible. Instead, we must work step-by-step.
- The prototype branch has worked out how to convert a single test group using the new structure. (The current code is a work in progress.)
- We propose to check in a final version of that code as a new Druid project. The project would not yet be built in Travis, but can be used for local testing by developers.
- Once we feel the new code is stable, we can replace the existing Travis tests group-by-group. A single new Travis job will eventually include all of the current test groups, resulting in a faster build cycle.
- If we don't have capacity to convert all tests, then some groups may continue to run in the old system for the time being.
This approach does mean some short-term redundancy, but provides a safe roll-out path.
Future work
This proposal is an outline of an approach, along with a prototype to demonstrate the idea. Additional work includes:
- Complete the foundational work. Mostly a matter of final clean-ups, verifying that tests flow smoothly, etc.
- Conversion of the remaining tests.
- Ensure the tests work in the Kubernetes and Quickstart environments which some tests seem to support.
- Support multiple environments: MariaDB vs. MySQL drivers, Hadoop vs. S3 vs. Azure vs. GCS, etc.