spark-pipeline-framework

A configuration-driven framework for building Spark pipelines with HOCON config files and PureConfig.

Python/PySpark users may also be interested in pyspark-pipeline-framework, the Python implementation of this framework using dataconf and HOCON. You can find it on GitHub and PyPI.

Features

Type-safe configuration via PureConfig with automatic case class binding
Dynamic component instantiation via reflection (no compile-time coupling)
Lifecycle hooks for monitoring, metrics, and custom error handling
Built-in hooks for structured logging, Micrometer metrics, and audit trails
Configuration validation for CI/CD pre-flight checks without Spark
Secrets management with pluggable providers (env, AWS, Vault)
Streaming support for Spark Structured Streaming pipelines
Cross-compilation for Spark 3.x/4.x and Scala 2.12/2.13

📚 Full Documentation

Modules

Module	Description
`spark-pipeline-core`	Traits, config models, instantiation (no Spark dependency)
`spark-pipeline-runtime`	SparkSessionWrapper, DataFlow trait
`spark-pipeline-runner`	SimplePipelineRunner entry point

Quick Start

1. Add dependency

// build.sbt
libraryDependencies += "io.github.dwsmith1983" %% "spark-pipeline-runtime-spark3" % "<version>"

2. Create a component

import io.github.dwsmith1983.spark.pipeline.config.ConfigurableInstance
import io.github.dwsmith1983.spark.pipeline.runtime.DataFlow
import pureconfig._
import pureconfig.generic.auto._

object MyComponent extends ConfigurableInstance {
  case class Config(inputTable: String, outputPath: String)

  override def createFromConfig(conf: com.typesafe.config.Config): MyComponent =
    new MyComponent(ConfigSource.fromConfig(conf).loadOrThrow[Config])
}

class MyComponent(conf: MyComponent.Config) extends DataFlow {
  override def run(): Unit = {
    spark.table(conf.inputTable).write.parquet(conf.outputPath)
  }
}

3. Create config file

# pipeline.conf
spark {
  app-name = "My Pipeline"
}

pipeline {
  pipeline-name = "My Data Pipeline"
  pipeline-components = [
    {
      instance-type = "com.mycompany.MyComponent"
      instance-name = "MyComponent(prod)"
      instance-config {
        input-table = "raw_data"
        output-path = "/data/processed"
      }
    }
  ]
}

4. Run

spark-submit \
  --class io.github.dwsmith1983.spark.pipeline.runner.SimplePipelineRunner \
  --jars /path/to/my-pipeline.jar \
  /path/to/spark-pipeline-runner-spark3_2.12.jar \
  -Dconfig.file=/path/to/pipeline.conf

Documentation

Getting Started - Quick start guide
Configuration - HOCON configuration reference
Config Validation - CI/CD validation
Secrets Management - Secure credential handling
Lifecycle Hooks - Logging, metrics, audit trails
Streaming - Structured Streaming support
Deployment - Production deployment guides
Contributing - Development setup

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github		.github
core/src		core/src
example/src		example/src
project		project
runner/src		runner/src
runtime/src		runtime/src
scripts		scripts
website		website
.gitignore		.gitignore
.jvmopts		.jvmopts
.pre-commit-config.yaml		.pre-commit-config.yaml
.sbtopts		.sbtopts
.scalafix.conf		.scalafix.conf
.scalafmt.conf		.scalafmt.conf
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
build.sbt		build.sbt
dependency-check-suppressions.xml		dependency-check-suppressions.xml
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-pipeline-framework

Features

Modules

Quick Start

1. Add dependency

2. Create a component

3. Create config file

4. Run

Documentation

License

About

Uh oh!

Releases 21

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

spark-pipeline-framework

Features

Modules

Quick Start

1. Add dependency

2. Create a component

3. Create config file

4. Run

Documentation

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages