Sustainability Smells in Microservice Architectures

This repository contains the experimental infrastructure, benchmark data and analysis scripts for my Master's thesis on sustainability smells in microservice architectures. The thesis defines sustainability smells as recurring architectural or deployment patterns that lead to measurable, unnecessary energy consumption, and validates them through controlled energy experiments.

The System Under Test (SUT) is the OpenTelemetry Demo v2.0.2 ("Astronomy Shop"), a microservice-based e-commerce application with 14 primary services in 11 programming languages.

Investigated Smells

ID	Smell	Injection	Validation
C1	Chatty Services	Replace batch gRPC call with 10 individual calls in the recommendation service	✅ Validated (system-level)
C3	Missing Caching	Remove TTL-based in-memory cache from the recommendation service	✅ Validated (system-level)
C5	Nobody Home	Deploy 4 additional idle services that receive no traffic	❌ Not validated
C6	Missing Resource Limits	Remove `resources.limits` from all 20 service containers	⚠️ Conditionally validated (service-level, low load only)

Each smell is implemented on a dedicated Git branch (smell/<name>) with a corresponding baseline branch (baseline/<name>).

Repository Structure

├── benchmark/
│   ├── scripts/              # Cluster lifecycle & benchmark automation
│   │   ├── createKindCluster.sh
│   │   ├── destroyKindCluster.sh
│   │   ├── installOpenTelemetry.sh
│   │   ├── execute_benchmark.sh     # Main benchmark runner (interleaved pairs)
│   │   ├── collect_metrics.sh       # Prometheus metric export
│   │   └── metrics_config.sh        # PromQL queries for all metrics
│   ├── locust/               # Load generator (TPC-W Shopping Mix)
│   │   ├── locustfile.py
│   │   ├── people.json               # Fixed checkout dataset
│   │   └── Dockerfile
│   ├── analysis/             # Statistical analysis pipeline
│   │   ├── 01_load_and_transform.py  # Load CSVs → compute 4-component energy model
│   │   ├── 02_statistical_tests.py   # Wilcoxon signed-rank + Cohen's d
│   │   ├── 03_visualize.py           # Generate thesis charts
│   │   ├── 04_cross_smell_summary.py # Cross-smell comparison table
│   │   └── requirements.txt
│   ├── results/              # Raw + analyzed experiment data
│   │   ├── c1-low-full/      # 10 interleaved pairs per smell × load
│   │   ├── c1-medium-full/
│   │   ├── ...
│   │   └── cross_smell/
│   └── ramp-up/              # Capacity test protocol
├── kubernetes/
│   ├── otel-demo-manifest.yaml   # Pre-rendered SUT manifest
│   └── kepler-values.yaml        # Kepler Helm values (fake-CPU-meter mode)
├── src/                      # OpenTelemetry Demo source (smell injections on branches)
└── thesis.txt                # Thesis draft (LaTeX source)

Branches

Branch	Description
`main`	Baseline OpenTelemetry Demo + all benchmark infrastructure
`baseline/chatty-services`	Baseline for C1 (original batch call)
`smell/chatty-services`	C1 injection (10× individual gRPC calls)
`baseline/missing-caching`	Baseline for C3 (cache enabled)
`smell/missing-caching`	C3 injection (cache removed)
`baseline/nobody-home`	Baseline for C5 (no extra services)
`smell/nobody-home`	C5 injection (4 idle services added)
`baseline/missing-resource-limits`	Baseline for C6 (limits set)
`smell/missing-resource-limits`	C6 injection (all limits removed)

Experimental Setup

Hardware: MacBook Pro 14" (Nov 2024), Apple M4, 32 GB RAM
Cluster: Single-node Kind (Kubernetes in Docker)
Energy: Kepler v0.9.0 in KEPLER_FAKE_CPU_METER mode (model-based estimation)
Monitoring: kube-prometheus stack (cAdvisor, node-exporter, Grafana) + OTel Demo Prometheus
Load: Locust (Docker container, outside cluster)
Runs: 10 interleaved baseline–smell pairs per experiment = 20 runs × 11 experiments = 220 runs total

Running the Analysis

The analysis pipeline processes the raw experiment data and produces statistical results and charts.

cd benchmark/analysis
pip install -r requirements.txt

# Step 1: Load raw CSVs and compute the 4-component energy model
python 01_load_and_transform.py ../results/c1-low-full

# Step 2: Run Wilcoxon signed-rank tests + Cohen's d
python 02_statistical_tests.py ../results/c1-low-full

# Step 3: Generate charts for the thesis
python 03_visualize.py ../results/c1-low-full

# Step 4: Cross-smell comparison table
python 04_cross_smell_summary.py ../results

Each script reads from and writes to the analysis/ subfolder inside each result directory.

Running a Benchmark

Note: A single smell × load experiment takes approximately 8 hours (20 runs × ~25 min each). A full smell across all three load levels takes about 24 hours. The script handles the full lifecycle for each run: cluster creation, SUT deployment, warm-up, measurement, metric export and teardown. Locust must already be running as a Docker container (started once via createLocustBenchmark.sh).

cd benchmark/scripts

# Example: C1 (Chatty Services), low load, 10 interleaved pairs
./execute_benchmark.sh \
  --smell-branch "smell/chatty-services" \
  --baseline-branch "baseline/chatty-services" \
  --load-profile low \
  --pairs 10 \
  --output-dir "../results/c1-low-full"

The script alternates baseline and smell runs within each pair. For every run it creates a fresh Kind cluster, deploys the SUT from the specified branch, runs the measurement and tears down the cluster again.

Acknowledgments

This thesis builds on the OpenTelemetry Demo project and on the energy measurement methodology by Legler et al. The experimental infrastructure was adapted from prior work at the ISE research group at TU Berlin.

License

The OpenTelemetry Demo source code is licensed under the Apache License 2.0. The benchmark scripts, analysis code and thesis content in this repository are part of a Master's thesis at TU Berlin.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
benchmark		benchmark
internal/tools		internal/tools
kubernetes		kubernetes
pb		pb
src		src
test		test
.dockerignore		.dockerignore
.env		.env
.env.override		.env.override
.gitattributes		.gitattributes
.gitignore		.gitignore
.licenserc.json		.licenserc.json
.lychee.toml		.lychee.toml
.markdownlint.yaml		.markdownlint.yaml
.yamlignore		.yamlignore
.yamllint		.yamllint
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
buildkitd.toml		buildkitd.toml
docker-compose-tests.yml		docker-compose-tests.yml
docker-compose-tests_include-override.yml		docker-compose-tests_include-override.yml
docker-compose.minimal.yml		docker-compose.minimal.yml
docker-compose.yml		docker-compose.yml
docker-gen-proto.sh		docker-gen-proto.sh
ide-gen-proto.sh		ide-gen-proto.sh
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sustainability Smells in Microservice Architectures

Investigated Smells

Repository Structure

Branches

Experimental Setup

Running the Analysis

Running a Benchmark

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sustainability Smells in Microservice Architectures

Investigated Smells

Repository Structure

Branches

Experimental Setup

Running the Analysis

Running a Benchmark

Acknowledgments

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages