ShardSeal - Open S3-compatible, self-healing object store written in Go.

(Work in progress)

Project Status & Goals

Current State

This is an experimental project in early development, primarily designed for:

Understanding distributed storage system internals
Testing novel approaches to erasure coding and data placement algorithms
Learning S3 protocol implementation details
Experimenting with self-healing storage architectures

This is NOT production-ready software.

Implemented
- S3 basics: ListBuckets (/), CreateBucket (PUT /{bucket}), DeleteBucket (DELETE /{bucket})
- Objects: Put (PUT /{bucket}/{key}), Get (GET), Head (HEAD), Delete (DELETE)
- Range GET support (single range, requires seekable storage)
- ListObjectsV2 (bucket object listing with prefix, delimiter, common prefixes, pagination)
- Multipart uploads (initiate/upload-part/complete/abort)
- Multipart: streaming completion with S3-compatible ETag (MD5 of part ETags + -N)
- Config (YAML + env), structured logging, CI
- Prometheus metrics (/metrics) and HTTP instrumentation
- Tracing: OpenTelemetry scaffold (optional; OTLP gRPC/HTTP); spans include s3.error_code; optional s3.key_hash via config
- Authentication: AWS Signature V4 (optional; header and presigned URL) with clock-skew enforcement and X-Amz-Expires validation
- Local filesystem storage backend (dev/MVP), in-memory metadata store
Admin API (optional, separate port) with optional OIDC + RBAC: /admin/health, /admin/version; multipart GC endpoint (/admin/gc/multipart)
Repair pipeline (experimental): sealed integrity failures during GET/HEAD and scrubber scans enqueue repair items to an in-memory queue; a background repair worker runs as a no-op with admin controls
Repair worker (single-shard rewrite): validates payload hashes, regenerates sealed headers/footers, updates manifests, and exports success/failure metrics
Repair queue/worker can be enabled via config even when the Admin API is disabled (set repair.enabled: true / SHARDSEAL_REPAIR_ENABLED=true); storage and scrubber enqueues continue and metrics are exported
Unit tests for buckets/objects/multipart
- Production-ready fixes: Streaming multipart completion, safe range handling, improved error logging, manifest fsync after atomic writes
Not yet implemented / in progress
- Self-healing (erasure coding and background rewriter): verification-only scrubber implemented; integrity failures are enqueued for repair, but the worker is currently a no-op (no healing yet). Sealed I/O and integrity verification are available behind feature flags.
- Distributed metadata/placement

Roadmap / TODO (Summary)

High priority
- Extend the repair worker to multi-shard/RS layouts (streaming rewrite + backoff)
- Add repair orchestration controls (reason-aware scheduling, rate limiting, queue histograms surfaced to admin/UI)
- Expand SigV4 coverage for chunked uploads and odd canonicalization cases (e.g., duplicate headers, session tokens)
Short term
- S3 op metrics for API (get/put/head/delete/list/multipart)
- Admin: scrubber pause/resume endpoints
- Sealed range tests for payload section reads
- Docs: capture repair queue configuration + admin host-port override tips, and document dashboard/alert wiring for queue depth metrics
Medium term
- Real RS codec and multi-shard layout; reconstruct on read
- Placement ring across dataDirs; prep for multi-node
- Repair worker: reconstruct + rewrite with retry/backoff
See project.md for the full, prioritized list.

Quick start

Prerequisites

Go 1.22+ installed

Build and run

make build
# Run with sample config (will ensure ./data exists)
SHARDSEAL_CONFIG=configs/local.yaml make run
# Or
# go run ./cmd/shardseal

Default address: :8080 (override with env SHARDSEAL_ADDR).
Data dirs: ./data (override with env SHARDSEAL_DATA_DIRS as comma-separated list).

Using with curl (auth disabled by default; SigV4 optional)

Bucket naming: 3-63 chars; lowercase letters, digits, dots, hyphens; must start/end with letter or digit.

# List all buckets
curl -v http://localhost:8080/

# Create a bucket
curl -v -X PUT http://localhost:8080/my-bucket

# Put an object (from stdin)
printf 'Hello, ShardSeal!\n' | curl -v -X PUT http://localhost:8080/my-bucket/hello.txt --data-binary @-

# Get an object
curl -v http://localhost:8080/my-bucket/hello.txt

# Range GET (first 10 bytes)
curl -v -H 'Range: bytes=0-9' http://localhost:8080/my-bucket/hello.txt

# Head object
curl -I http://localhost:8080/my-bucket/hello.txt

# List objects in bucket
curl -s "http://localhost:8080/my-bucket?list-type=2"

# List with prefix filter
curl -s "http://localhost:8080/my-bucket?list-type=2&prefix=folder/"

# Delete object
curl -X DELETE http://localhost:8080/my-bucket/hello.txt

# Delete bucket (must be empty - excludes internal .multipart files)
curl -X DELETE http://localhost:8080/my-bucket

Multipart upload example (ETag behavior)

Requirements: Admin API not required. Ensure bucket exists. Example uses two parts.
After completion, ETag equals MD5 of concatenated part MD5s with "-N" suffix.

bucket=my-bucket
object=big.bin

# 1) Initiate multipart upload
uploadId=$(curl -s -X POST "http://localhost:8080/$bucket/$object?uploads" \
  | sed -n 's:.*<UploadId>\(.*\)</UploadId>.*:\1:p')
echo "UploadId=$uploadId"

# 2) Upload two parts; capture each returned ETag from response headers
part1ETag=$(printf 'A%.0s' {1..6000000} | \
  curl -s -i -X PUT "http://localhost:8080/$bucket/$object?partNumber=1&uploadId=$uploadId" \
       --data-binary @- | tr -d '\r' | awk -F': ' '/^ETag:/ {gsub(/\"/,"",$2); print $2}')

part2ETag=$(printf 'B%.0s' {1..6000000} | \
  curl -s -i -X PUT "http://localhost:8080/$bucket/$object?partNumber=2&uploadId=$uploadId" \
       --data-binary @- | tr -d '\r' | awk -F': ' '/^ETag:/ {gsub(/\"/,"",$2); print $2}')

echo "Part1 ETag=$part1ETag" ; echo "Part2 ETag=$part2ETag"

# 3) Complete using the part list; server streams parts and returns multipart ETag
cat > complete.xml <<XML
<CompleteMultipartUpload>
  <Part><PartNumber>1</PartNumber><ETag>"$part1ETag"</ETag></Part>
  <Part><PartNumber>2</PartNumber><ETag>"$part2ETag"</ETag></Part>
</CompleteMultipartUpload>
XML

curl -s -X POST "http://localhost:8080/$bucket/$object?uploadId=$uploadId" \
     -H 'Content-Type: application/xml' --data-binary @complete.xml
# Response ETag => md5(concat(md5(part1), md5(part2))) - 2

# 4) Verify object is retrievable
curl -I "http://localhost:8080/$bucket/$object"

Testing

go test ./...
# Verbose tests for just the S3 API package
go test ./pkg/api/s3 -v

Docker (dev)

Two options are provided: local Docker build and docker-compose. The image exposes:

8080: S3 data-plane (configurable via SHARDSEAL_ADDR)
9090: Admin API (when adminAddress is configured; docker-compose publishes this on host port ${SHARDSEAL_ADMIN_HOST_PORT:-19090} to avoid clashes with local Prometheus instances)

Build and run (Dockerfile)

# Build the image locally
docker build -t shardseal:dev .

# Run with a mounted data directory and config
# Ensure your config mounts to /home/app/config/config.yaml or set SHARDSEAL_CONFIG accordingly.
docker run --rm -p 8080:8080 -p 9090:9090 \
  -v "$(pwd)/data:/home/app/data" \
  -v "$(pwd)/configs:/home/app/config:ro" \
  -e SHARDSEAL_CONFIG=/home/app/config/local.yaml \
  --name shardseal shardseal:dev

Compose (docker-compose.yml)

# Up/Down
docker compose up --build
docker compose down

# Override env from your shell or edit docker-compose.yml as needed.
# Data is mounted at ./data, config at ./configs (read-only) by default.

Notes:

The container user is a non-root user (app). Data and config are mounted under /home/app.
To enable Admin API, configure adminAddress in the config or set SHARDSEAL_ADMIN_ADDR (see configs/local.yaml and cmd.shardseal.main).
By default the compose file publishes the admin listener on host port ${SHARDSEAL_ADMIN_HOST_PORT:-19090} (container still listens on :9090). Export SHARDSEAL_ADMIN_HOST_PORT=9090 before docker compose up if the default 9090 is free on your machine.
Repair queue priorities: read-time integrity failures run at highest priority, scrub detections at normal priority, and admin-enqueued tasks at low priority. Metrics are tagged by reason/result for dashboards/alerts.
Sealed mode can be enabled via:
- YAML: sealed.enabled: true
- Env: SHARDSEAL_SEALED_ENABLED=true
Integrity scrubber (experimental verification-only) can be enabled via:
- Env: SHARDSEAL_SCRUBBER_ENABLED=true
- Optional overrides:
  - SHARDSEAL_SCRUBBER_INTERVAL=1h
  - SHARDSEAL_SCRUBBER_CONCURRENCY=2
  - SHARDSEAL_SCRUBBER_VERIFY_PAYLOAD=true # overrides sealed.verifyOnRead inheritance
Admin scrub endpoints (experimental, sealed integrity verification):
- GET /admin/scrub/stats (RBAC: admin.read)
- POST /admin/scrub/runonce (RBAC: admin.scrub)
- The scrubber verifies sealed headers/footers and compares footer content-hash to the manifest. Payload re-hash verification is enabled when sealed.verifyOnRead is true (or forced via SHARDSEAL_SCRUBBER_VERIFY_PAYLOAD). Protect these with OIDC/RBAC as needed (see security.oidc.rbac and cmd.shardseal.main).
Repair pipeline (experimental): when Admin API is enabled, an in-memory repair queue is created. The storage layer enqueues items on sealed integrity failures during GET/HEAD, and the scrubber enqueues detected failures. A background repair worker starts (currently a no-op) and can be inspected/controlled via admin endpoints.
The provided docker-compose.yml includes commented environment toggles for sealed mode, scrubber, tracing, admin OIDC, and GC; uncomment to enable as needed.

Admin repair examples

Enable Admin API (e.g., SHARDSEAL_ADMIN_ADDR=:9090). If OIDC is enabled, include a valid Bearer token; otherwise these endpoints are unauthenticated. When using the provided docker-compose file, ShardSeal publishes the admin listener on host port ${SHARDSEAL_ADMIN_HOST_PORT:-19090} (default 19090), so use http://localhost:19090/admin/health (or whichever host port you exported) when running health checks from the host.

# Queue length
curl -s http://localhost:${SHARDSEAL_ADMIN_HOST_PORT:-19090}/admin/repair/stats

# Enqueue a repair item (e.g., detected externally)
curl -s -X POST http://localhost:${SHARDSEAL_ADMIN_HOST_PORT:-19090}/admin/repair/enqueue \
  -H 'Content-Type: application/json' \
  -d '{
    "bucket":"bkt",
    "key":"dir/obj.txt",
    "shardPath":"objects/bkt/dir/obj.txt/data.ss1",
    "reason":"admin"
  }'

# Scrubber controls
curl -s http://localhost:${SHARDSEAL_ADMIN_HOST_PORT:-19090}/admin/scrub/stats
curl -s -X POST http://localhost:${SHARDSEAL_ADMIN_HOST_PORT:-19090}/admin/scrub/runonce

# Repair worker controls
curl -s http://localhost:${SHARDSEAL_ADMIN_HOST_PORT:-19090}/admin/repair/worker/stats
curl -s -X POST http://localhost:${SHARDSEAL_ADMIN_HOST_PORT:-19090}/admin/repair/worker/pause
curl -s -X POST http://localhost:${SHARDSEAL_ADMIN_HOST_PORT:-19090}/admin/repair/worker/resume

Notes on authentication (OIDC)

Enable OIDC via config (oidc.*) or env (SHARDSEAL_OIDC_*). Set issuer (or jwksURL) and expected clientID/audience.
Obtain a JWT from your IdP (ID token or access token) whose aud matches the configured audience.
Pass the token in the Authorization header:
- Example: curl -H "Authorization: Bearer $TOKEN" http://localhost:${SHARDSEAL_ADMIN_HOST_PORT:-19090}/admin/repair/stats
Health/version exemptions: if configured, /admin/health and /admin/version can be accessed without a token.
RBAC: endpoints require roles like admin.read, admin.scrub, admin.repair.* (see pkg/security/oidc/rbac.go).

Note: The repair queue/worker can be enabled without the Admin API via config (repair.enabled: true). In that case, the queue and worker run in the background, and metrics are exported; admin endpoints are simply unavailable.

Metrics

Exposes Prometheus metrics at /metrics on the same HTTP server.
Default counters and histograms include:
- shardseal_http_requests_total{method,code}
- shardseal_http_request_duration_seconds_bucket/sum/count{method,code}
- shardseal_http_inflight_requests
- shardseal_storage_bytes_total{op}
- shardseal_storage_ops_total{op,result}
- shardseal_storage_op_duration_seconds_bucket/sum/count{op}
- shardseal_storage_sealed_ops_total{op,sealed,result,integrity_fail}
- shardseal_storage_sealed_op_duration_seconds_bucket/sum/count{op,sealed,integrity_fail}
- shardseal_storage_integrity_failures_total{op}
- shardseal_scrubber_scanned_total
- shardseal_scrubber_errors_total
- shardseal_scrubber_last_run_timestamp_seconds
- shardseal_scrubber_uptime_seconds
- shardseal_repair_queue_depth
- shardseal_repair_enqueued_total{reason}
- shardseal_repair_completed_total{result}
- shardseal_repair_duration_seconds_bucket/sum/count{result}
Example:

curl -s http://localhost:8080/metrics | head -n 20

Health endpoints

/livez: liveness probe (always OK when process is running)
/readyz: readiness probe gated on initialization completion
/metrics: Prometheus metrics endpoint

Monitoring (Prometheus + Grafana)

Prometheus sample config: configs/monitoring/prometheus/prometheus.yml
Example alert rules: configs/monitoring/prometheus/rules.yml
Grafana dashboard (import JSON): configs/monitoring/grafana/shardseal_overview.json
Includes sealed I/O metrics, scrubber metrics (scanned/errors/last_run/uptime), and repair metrics (queue_depth). The server polls scrubber stats and repair queue length every 10s and exports to the main registry.

Compose profile (optional monitoring stack):

# 1. Bring up shardseal as usual (uses service 'shardseal')
docker compose up --build -d

# 2. Bring up monitoring stack (Prometheus + Grafana) using the 'monitoring' profile
docker compose --profile monitoring up -d

# Access:
# - ShardSeal (S3 plane): http://localhost:8080
# - ShardSeal Admin (if enabled): http://localhost:${SHARDSEAL_ADMIN_HOST_PORT:-19090}/admin/health
# - Prometheus: http://localhost:9091
# - Grafana: http://localhost:3000  (default admin/admin)
#   Add Prometheus data source at http://prometheus:9090 and import the dashboard:
#   configs/monitoring/grafana/shardseal_overview.json

Troubleshooting infos: To clean up stale compose state and networks, and to re-create containers run:

# Stop and remove services/anonymous resources from previous runs
# One liner to remove monitoring and base profiles:
docker compose --profile monitoring down --remove-orphans && docker compose down --remove-orphans 

# remove base profile only
docker compose down --remove-orphans

# Remove dangling user-defined networks that may reference old IDs
docker network prune -f

# (Optional) If Prometheus data retention is not required, remove its anonymous volume too
# docker volume prune -f

# Rebuild and start the base service
docker compose up --build -d

# Start the monitoring profile (creates the explicit shardseal_net if missing)
docker compose --profile monitoring up -d

Validation

ShardSeal: http://localhost:8080
Admin (if enabled): http://localhost:${SHARDSEAL_ADMIN_HOST_PORT:-19090}/admin/health (default 19090 when using docker compose; use the admin host port you configured otherwise)
Prometheus: http://localhost:9091 (Targets page should show shardseal:8080 as UP)
Grafana: http://localhost:3000 (default admin/admin). Add Prometheus datasource at URL http://prometheus:9090 and import dashboard from configs/monitoring/grafana/shardseal_overview.json

Notes:

Explicit Docker network: docker-compose.yml defines a bridge network "shardseal_net" and attaches shardseal, prometheus, and grafana to it. This avoids stale/implicit network IDs across runs.
Prometheus scrape target: configs/monitoring/prometheus/prometheus.yml uses "shardseal:8080" (service DNS on the Docker network), not "localhost:8080".

Also verify:

The Prometheus target inside the container is "shardseal:8080" per configs/monitoring/prometheus/prometheus.yml.
The Grafana Prometheus datasource URL is "http://prometheus:9090" (both services share the "shardseal_net" network defined in docker-compose.yml). Tracing and S3 error headers
Server spans include: http.method, http.target, http.route, http.status_code, user_agent.original, net.peer.ip, http.server_duration_ms.
S3 attributes (low cardinality): s3.op, s3.bucket_present, s3.admin, s3.error. New: s3.error_code on failures; optional s3.key_hash when enabled.
Enable s3.key_hash via config (tracing.keyHashEnabled: true) or env (SHARDSEAL_TRACING_KEY_HASH=true). The key hash is sha256(key) truncated to 8 bytes (16 hex chars).
Error responses include the header X-S3-Error-Code mirroring the S3 error code for observability. This header is only set on error responses.

Admin endpoints (optional; if admin server enabled). If OIDC is enabled, these endpoints require a valid Bearer token. RBAC defaults are enforced:

admin.read for GET endpoints
admin.gc for POST /admin/gc/multipart
admin.scrub for POST /admin/scrub/runonce
admin.repair.read for GET /admin/repair/stats
admin.repair.enqueue for POST /admin/repair/enqueue
admin.repair.control for POST /admin/repair/worker/pause and /admin/repair/worker/resume
/admin/health: JSON status with ready/version/addresses
/admin/version: JSON version info
POST /admin/gc/multipart: run a single multipart GC pass (requires RBAC admin.gc; OIDC-protected if enabled)
/admin/scrub/stats: get current scrubber stats (requires RBAC admin.read)
POST /admin/scrub/runonce: trigger a single scrub pass (requires RBAC admin.scrub)
/admin/repair/stats: current repair queue length (requires RBAC admin.repair.read)
POST /admin/repair/enqueue: enqueue a repair item (requires RBAC admin.repair.enqueue). Body JSON accepts RepairItem fields {bucket, key, shardPath, reason, priority}; discovered timestamp is auto-populated when omitted. The queue is in-memory in this release.
/admin/repair/worker/stats: repair worker status and counters (requires RBAC admin.repair.read)
POST /admin/repair/worker/pause: pause the repair worker (requires RBAC admin.repair.control)
POST /admin/repair/worker/resume: resume the repair worker (requires RBAC admin.repair.control)

Configuration

Example at configs/local.yaml:

address: ":8080"
# Optional admin/control plane on a separate port (read-only endpoints)
# adminAddress: ":9090"

dataDirs:
  - "./data"

# Authentication (optional)
# authMode: "none"        # "none" or "sigv4"
# accessKeys:
#   - accessKey: "AKIAEXAMPLE"
#     secretKey: "secret"
#     user: "local"

# Tracing (optional - OpenTelemetry OTLP)
# tracing:
#   enabled: false
#   endpoint: "localhost:4317"  # grpc default; or "localhost:4318" for http
#   protocol: "grpc"            # "grpc" or "http"
#   sampleRatio: 0.0            # 0.0-1.0
#   serviceName: "shardseal"
#   keyHashEnabled: false      # emit s3.key_hash; or set SHARDSEAL_TRACING_KEY_HASH=true
#
# Sealed mode (experimental)
# sealed:
#   enabled: false
#   verifyOnRead: false
#
# Integrity Scrubber (experimental - verification only)
# Verifies sealed header/footer CRCs and compares footer content-hash with manifest.
# Payload re-hash verification follows sealed.verifyOnRead (enabled when true).
# scrubber:
#   enabled: false
#   interval: "1h"
#   concurrency: 1

# Repair pipeline (optional; can run without Admin API)
# repair:
#   enabled: false            # when true, create repair queue and wire storage/scrubber
#   workerEnabled: true       # start background repair worker (no-op in current milestone)
#   workerConcurrency: 1

Additional optional request size limits:

# Request size limits (optional)
limits:
  singlePutMaxBytes: 5368709120    # 5 GiB cap for single PUT
  minMultipartPartSize: 5242880    # 5 MiB minimum for non-final multipart parts

Environment overrides:
- SHARDSEAL_CONFIG                 // path to YAML config
- SHARDSEAL_ADDR                   // data-plane listen address (e.g., 0.0.0.0:8080)
- SHARDSEAL_ADMIN_ADDR             // admin-plane listen address (e.g., 0.0.0.0:9090) to enable admin endpoints
- SHARDSEAL_DATA_DIRS              // comma-separated data directories
- SHARDSEAL_AUTH_MODE              // "none" (default) or "sigv4"
- SHARDSEAL_ACCESS_KEYS            // comma-separated ACCESS_KEY:SECRET_KEY[:USER]
- SHARDSEAL_TRACING_ENABLED        // "true"/"false"
- SHARDSEAL_TRACING_ENDPOINT       // e.g., localhost:4317 (grpc) or localhost:4318 (http)
- SHARDSEAL_TRACING_PROTOCOL       // "grpc" or "http"
- SHARDSEAL_TRACING_SAMPLE         // 0.0 - 1.0
- SHARDSEAL_TRACING_SERVICE        // service.name override
- SHARDSEAL_TRACING_KEY_HASH       // "true"/"false"; when true, emit s3.key_hash (sha256 first 8 bytes hex of object key)
- SHARDSEAL_SEALED_ENABLED         // "true"/"false" to store objects using sealed format (experimental)
- SHARDSEAL_SEALED_VERIFY_ON_READ  // "true"/"false" to verify integrity on GET/HEAD
- SHARDSEAL_SCRUBBER_ENABLED       // "true"/"false" to enable background scrubber
- SHARDSEAL_SCRUBBER_INTERVAL      // e.g., "1h"
- SHARDSEAL_SCRUBBER_CONCURRENCY   // e.g., "2"
- SHARDSEAL_SCRUBBER_VERIFY_PAYLOAD // "true"/"false" to force payload re-hash verification (overrides sealed.verifyOnRead inheritance)
- SHARDSEAL_GC_ENABLED             // "true"/"false" to enable multipart GC
- SHARDSEAL_GC_INTERVAL            // e.g., "15m"
- SHARDSEAL_GC_OLDER_THAN          // e.g., "24h"
- SHARDSEAL_OIDC_ENABLED           // "true"/"false" to protect Admin API with OIDC
- SHARDSEAL_OIDC_ISSUER            // issuer URL for discovery (preferred)
- SHARDSEAL_OIDC_CLIENT_ID         // expected client_id (audience)
- SHARDSEAL_OIDC_AUDIENCE          // optional, overrides client_id
- SHARDSEAL_OIDC_JWKS_URL          // direct JWKS URL alternative to issuer
- SHARDSEAL_OIDC_ALLOW_UNAUTH_HEALTH   // "true"/"false" to allow unauthenticated /admin/health
- SHARDSEAL_OIDC_ALLOW_UNAUTH_VERSION  // "true"/"false" to allow unauthenticated /admin/version
- SHARDSEAL_LIMIT_SINGLE_PUT_MAX_BYTES     // e.g., 5368709120 (5 GiB)
- SHARDSEAL_LIMIT_MIN_MULTIPART_PART_SIZE  // e.g., 5242880 (5 MiB)
- SHARDSEAL_REPAIR_ENABLED                 // "true"/"false" to enable repair queue without Admin API
- SHARDSEAL_REPAIR_WORKER_ENABLED          // "true"/"false" to start repair worker
- SHARDSEAL_REPAIR_WORKER_CONCURRENCY      // integer >= 1

Sealed mode (experimental)

Summary

When enabled, objects are stored as sealed shard files with a header | payload | footer encoding and a JSON manifest persisted alongside per-object metadata. The S3 API remains unchanged (ETag is still MD5 of the payload; SigV4 works the same).
Range GETs are served by seeking past the header and reading a SectionReader over just the payload. See storage.localfs and storage.manifest.

On-disk layout

Object directory: ./data/objects/{bucket}/{key}/
Data file: data.ss1
- Header (little-endian): magic "ShardSealv1" | version:u16 | headerSize:u16 | payloadLen:u64 | headerCRC32C:u32
- Footer: contentHash[32] (sha256 of payload) | footerCRC32C:u32
- Format primitives implemented in erasure.rs with unit tests in erasure.rs_test.
Manifest: object.meta (JSON, v1)
- Records bucket, key, size, ETag (MD5), lastModified, RS params, and a Shards[] slice with path, content hash algo/hex, payload length, header/footer CRCs.

Behavior

GET/HEAD prefer sealed objects when a manifest exists; otherwise fall back to plain files (mixing sealed and plain is supported).
Range GETs use io.SectionReader on the payload region (efficient partial reads).
DELETE removes the sealed shard and the manifest; LIST derives keys from the parent dir of data.ss1 and reads metadata from the manifest. Implementation details in storage.localfs.

Integrity verification (optional)

Set sealed.verifyOnRead: true to validate footer CRC and sha256(payload) against the manifest during GET/HEAD.
Integrity failures are surfaced as 500 InternalError at the S3 layer and annotated in tracing. S3 mapping handled in api.s3.

Configuration

YAML (see sample in configs/local.yaml):
- sealed.enabled: false (default)
- sealed.verifyOnRead: false (default)
Environment:
- SHARDSEAL_SEALED_ENABLED=true|false
- SHARDSEAL_SEALED_VERIFY_ON_READ=true|false
Sample config and env wiring in cmd.shardseal.main.

Observability

Tracing: storage.sealed=true for sealed ops; storage.integrity_fail=true when verification fails.
Prometheus (emitted by obs.metrics.storage):
- shardseal_storage_bytes_total{op}
- shardseal_storage_ops_total{op,result}
- shardseal_storage_op_duration_seconds_bucket/sum/count{op}
- shardseal_storage_sealed_ops_total{op,sealed,result,integrity_fail}
- shardseal_storage_sealed_op_duration_seconds_bucket/sum/count{op,sealed,integrity_fail}
- shardseal_storage_integrity_failures_total{op}

Migration and compatibility

Enabling sealed mode affects only newly written objects. Existing plain files remain readable; GET/HEAD fall back to plain when no manifest is present.
Disabling sealed mode does not delete existing sealed objects; they continue to be served via manifest. You can transition gradually and mix sealed/plain safely.
ETag policy: MD5 of full object payload is preserved for S3 compatibility (even in sealed mode). For CompleteMultipartUpload, the ETag is MD5 of the final combined object (not AWS multipart-style ETag with a dash and part count). This may become configurable in a future release.

Scrubber behavior

Performs sealed integrity verification: validates sealed headers/footers and footer content-hash against the manifest; optional payload re-hash when sealed.verifyOnRead is true.
See the Admin endpoints section above for routes and RBAC.

Authentication (optional SigV4)

Disabled by default. Enable verification and provide credentials either via config or environment:

export SHARDSEAL_AUTH_MODE=sigv4
export SHARDSEAL_ACCESS_KEYS='AKIAEXAMPLE:secret:local'
# Run server after setting env
SHARDSEAL_CONFIG=configs/local.yaml make run

When enabled, the server requires valid AWS Signature V4 on S3 requests (both Authorization header and presigned URLs are supported). Health endpoints (/livez, /readyz, /metrics) remain unauthenticated.

Notes & limitations (current MVP)

Authentication: optional. AWS SigV4 supported (header and presigned; disabled by default via config/env).
ETag is MD5 of full object for single-part PUTs; for multipart completes, ETag is also MD5 of the full final object (not AWS multipart-style ETag).
Objects stored under ./data/objects/{bucket}/{key}
Multipart temporary parts stored in separate staging bucket: .multipart////part.N (excluded from user listings and bucket empty checks; cleaned up on complete/abort)
Range requests require seekable storage (LocalFS supports this)
Single PUT size cap: 5 GiB (configurable via limits.singlePutMaxBytes or env SHARDSEAL_LIMIT_SINGLE_PUT_MAX_BYTES). Larger uploads must use Multipart Upload (responds with S3 error code EntityTooLarge).
Error detail: EntityTooLarge responses include MaxAllowedSize and a hint to use Multipart Upload.
Multipart part size: 5 MiB minimum for all parts except the final part (configurable via limits.minMultipartPartSize or env SHARDSEAL_LIMIT_MIN_MULTIPART_PART_SIZE). Intended for S3 compatibility; very small multi-part aggregates used in tests may bypass this check.
LocalFS writes are atomic via temp+rename on Put, reducing risk of partial files on error.

Recent Improvements (2025-10-29)

Implemented AWS SigV4 authentication verification (headers and presigned) with unit tests
Exposed Prometheus metrics at /metrics and added HTTP instrumentation middleware
Added liveness (/livez) and readiness (/readyz) endpoints; readiness gated after initialization
Fixed critical memory issues: streaming multipart completion; safe handling for non-seekable Range GET
Hid internal multipart files from listings and bucket-empty checks; normalized temp part layout

Recent Improvements (2025-10-30)

Tracing enrichment: error responses now set X-S3-Error-Code; tracing middleware records s3.error_code.
Optional s3.key_hash attribute on spans (sha256(key) truncated to 8 bytes hex), configurable via tracing.keyHashEnabled or env SHARDSEAL_TRACING_KEY_HASH=true.
README, sample config, and tests updated accordingly.

Recent Improvements (2025-10-31)

ShardSeal v1 sealed mode (experimental, feature-flagged):
- LocalFS now writes sealed shard files (header | payload | footer) and persists a JSON manifest; Range GETs are served via a SectionReader. Delete/List are aware of sealed layout (see storage.localfs, storage.manifest).
- Optional verifyOnRead validates footer CRC and sha256(payload) on GET/HEAD; integrity failures are mapped to 500 InternalError at the S3 layer (see api.s3).
- Tests added for storage-level and S3-level sealed behavior including corruption detection (see storage.localfs_sealed_test, api.s3.server_sealed_test).
- Observability: tracing annotates storage.sealed and storage.integrity_fail; Prometheus sealed I/O metrics added (see obs.metrics.storage).

Recent Improvements (2025-11-01)

Admin repair control surface:
- Queue endpoints: GET /admin/repair/stats, POST /admin/repair/enqueue
- Worker endpoints: GET /admin/repair/worker/stats, POST /admin/repair/worker/pause, POST /admin/repair/worker/resume
- RBAC roles: admin.repair.read, admin.repair.enqueue, admin.repair.control
Observability:
- shardseal_repair_queue_depth metric with periodic polling
- Prometheus recording rules and alerts for repair queue depth
- Grafana panels for repair queue depth (stat and timeseries)
Documentation: Admin endpoints, RBAC roles, and monitoring sections updated to reflect current state.

Recent Improvements (2025-11-02)

Fixed the sealed shard header encoder so the emitted byte length always matches the advertised header size, preventing payload shifts after repairs or rewrites.
Added regression testing around header-size invariants to catch future encoder regressions early.

Roadmap (short)

ShardSeal v1 storage format + erasure coding
Background scrubber and self-healing
Admin API hardening (OIDC/RBAC), monitoring assets (dashboards/alerts)

One-liner reset (cleanup + rebuild + monitoring)

Use this single command to fully reset the stack, remove stale networks/containers from older runs, rebuild, and bring up monitoring:

bash -lc 'docker compose --profile monitoring down --remove-orphans; docker compose down --remove-orphans; docker ps -q --filter network=shardseal_net | xargs -r docker rm -f; docker network rm shardseal_net 2>/dev/null || true; docker network prune -f; docker compose up --build -d; docker compose --profile monitoring up -d'

Notes:

This works even if an older Compose run created a legacy fixed-name network "shardseal_net" with stray containers still attached.
For current versions of this repo, the Compose network is project-scoped (no fixed name) per docker-compose.yml; docker compose down --remove-orphans will remove it automatically unless unrelated containers attach to it.
Prometheus scrapes the service via Docker DNS at shardseal:8080 per configs/monitoring/prometheus/prometheus.yml.

If you prefer step-by-step commands, see the "Troubleshooting infos" and "Validation" sections above.

How to run locally, but needs adjustments for prometheus and grafana. (better use docker 😉 ):

Quick start:

# 1) Run shardseal (default :8080 exposes /metrics)
SHARDSEAL_CONFIG=configs/local.yaml make run

# 2) Start Prometheus (adjust path as needed)
prometheus --config.file=configs/monitoring/prometheus/prometheus.yml

# 3) Import Grafana dashboard JSON:
#    configs/monitoring/grafana/shardseal_overview.json
#    and set the Prometheus datasource accordingly.

License

AGPL-3.0-or-later

Contributing

Early-stage experimental project — contributions welcome, especially in areas of:

Erasure coding implementations
Distributed systems algorithms
Storage integrity verification techniques
Performance optimizations

Please keep code documented and tested. Note that the project structure and APIs may change significantly as the design evolves.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
cmd/shardseal		cmd/shardseal
configs		configs
img		img
internal/testutil		internal/testutil
pkg		pkg
tmp-check/objects/bkt/obj.txt		tmp-check/objects/bkt/obj.txt
tmp-inspect/objects/bkt/obj.txt		tmp-inspect/objects/bkt/obj.txt
tmp-rewriter/objects/bkt/obj.txt		tmp-rewriter/objects/bkt/obj.txt
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
project.md		project.md

Folders and files

Latest commit

History

Repository files navigation

ShardSeal - Open S3-compatible, self-healing object store written in Go.

Project Status & Goals

Current State

Roadmap / TODO (Summary)

Quick start

Prerequisites

Build and run

Using with curl (auth disabled by default; SigV4 optional)

Multipart upload example (ETag behavior)

Testing

Docker (dev)

Admin repair examples

Notes on authentication (OIDC)

Metrics

Health endpoints

Monitoring (Prometheus + Grafana)

Configuration

Sealed mode (experimental)

Authentication (optional SigV4)

Notes & limitations (current MVP)

Recent Improvements (2025-10-29)

Recent Improvements (2025-10-30)

Recent Improvements (2025-10-31)

Recent Improvements (2025-11-01)

Recent Improvements (2025-11-02)

Roadmap (short)

One-liner reset (cleanup + rebuild + monitoring)

How to run locally, but needs adjustments for prometheus and grafana. (better use docker 😉 ):

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages