Thank you for your interest in contributing! This guide will help you get started.
Please note that this project is released with a Code of Conduct. By participating in this project you agree to abide by its terms.
- Java 17+ (required by Spark 3.5.x and 4.x)
- SBT 1.9+ (installation guide)
- Scala (managed by SBT, no separate installation needed)
# Clone the repository
git clone https://github.com/dwsmith1983/spark-pipeline-framework.git
cd spark-pipeline-framework
# Compile all modules
sbt compile
# Run tests
sbt test
# Run tests with coverage
sbt coverage test coverageReportIntelliJ IDEA (recommended):
- Install the Scala plugin
- Open the project directory
- Import as an SBT project
- Wait for indexing to complete
VS Code:
- Install the Metals extension
- Open the project directory
- Allow Metals to import the build
Set up pre-commit hooks to catch formatting issues before pushing:
# Option 1: Using pre-commit framework (recommended)
pip install pre-commit
pre-commit install
# Option 2: Manual git hook
cp scripts/pre-commit.sh .git/hooks/pre-commit
chmod +x .git/hooks/pre-commitspark-pipeline-framework/
├── core/ # Config models, traits (no Spark dependency)
├── runtime/ # SparkSession management, DataFlow trait
├── runner/ # SimplePipelineRunner entry point
├── example/ # Canonical examples (BatchPipelineExample, StreamingPipelineExample)
└── project/ # SBT build configuration
main- stable, release-ready codefeat/*- new featuresfix/*- bug fixesdocs/*- documentation updateschore/*- maintenance tasks
We use Conventional Commits for automatic versioning:
feat: add new feature → bumps minor version (post-1.0.0)
fix: fix a bug → bumps patch version
docs: update documentation → no version bump
chore: maintenance task → no version bump
Breaking changes:
feat!: breaking change → bumps major version (post-1.0.0)
- Formatting: Run
sbt scalafmtAllbefore committing - Style checks: Run
sbt scalastyleto check for style issues - Compiler warnings: All warnings are treated as errors (
-Xfatal-warnings)
# Run all tests
sbt test
# Run tests for a specific module
sbt "project core" test
sbt "project runnerspark3" test
# Run tests with coverage
sbt coverage test coverageReport
# Run a specific test class
sbt "testOnly *SimplePipelineRunnerSpec"Coverage minimum is 75%. Tests must pass before merging.
The project supports multiple Spark and Scala versions:
| Configuration | Spark | Scala |
|---|---|---|
*spark32_12 |
3.5.7 | 2.12 |
*spark3 |
3.5.7 | 2.13 |
*spark4 |
4.0.1 | 2.13 |
To test a specific configuration:
sbt "project runnerspark32_12" test
sbt "project runnerspark3" test
sbt "project runnerspark4" test- Create a branch from
main - Make your changes with tests
- Run checks locally:
sbt scalafmtCheckAll scalastyle test - Push and create a PR against
main - Wait for CI - all checks must pass
- Address review feedback if any
- Merge - maintainers will merge when ready
Use conventional commit format for PR titles:
feat: add retry configuration supportfix: handle empty pipeline componentsdocs: improve getting started guide
To add a new example component:
- Create the component in
example/src/main/scala/:
package io.github.dwsmith1983.pipelines
import io.github.dwsmith1983.spark.pipeline.config.ConfigurableInstance
import io.github.dwsmith1983.spark.pipeline.runtime.DataFlow
import com.typesafe.config.Config
import pureconfig._
import pureconfig.generic.auto._
object MyComponent extends ConfigurableInstance {
case class Config(inputPath: String, outputPath: String)
override def createFromConfig(conf: Config): MyComponent =
new MyComponent(ConfigSource.fromConfig(conf).loadOrThrow[Config])
}
class MyComponent(conf: MyComponent.Config) extends DataFlow {
override def run(): Unit = {
logInfo(s"Processing ${conf.inputPath}")
// Your Spark logic here
}
}- Add tests in
example/src/test/scala/ - Update
example-pipeline.confif appropriate
When reporting bugs, please include:
- Spark version and Scala version
- Minimal configuration to reproduce
- Full stack trace if applicable
- Expected vs actual behavior
- Open a GitHub Discussion
- Check existing issues for similar questions
By contributing, you agree that your contributions will be licensed under the Apache 2.0 License.