| layout | default |
|---|---|
| title | Deployment |
| nav_order | 4 |
Version: 1.0 Last Updated: 2025-11-13
- Overview
- Local Deployment
- Docker Deployment
- Kubernetes Deployment
- CI/CD Integration
- Production Deployment
- Monitoring & Observability
- Deployment Improvements
- Best Practices
- Troubleshooting
This guide covers deployment strategies for kshark across different environments, from local development to production Kubernetes clusters.
| Environment | Deployment Method | Use Case |
|---|---|---|
| Local | Binary | Development, ad-hoc diagnostics |
| Docker | Container | Isolated execution, CI/CD |
| Kubernetes | CronJob/Job | Automated monitoring, scheduled checks |
| CI/CD | Pipeline integration | Pre-deployment validation |
| Lambda/Functions | Serverless | Event-driven diagnostics |
- Go 1.23+ (for building from source)
- Access to target Kafka cluster
- Configuration files prepared
Download:
# Linux (amd64)
wget https://github.com/scalytics/kshark-core/releases/latest/download/kshark-linux-amd64.tar.gz
tar -xzf kshark-linux-amd64.tar.gz
sudo mv kshark /usr/local/bin/
chmod +x /usr/local/bin/kshark
# macOS (arm64)
wget https://github.com/scalytics/kshark-core/releases/latest/download/kshark-darwin-arm64.tar.gz
tar -xzf kshark-darwin-arm64.tar.gz
sudo mv kshark /usr/local/bin/
chmod +x /usr/local/bin/kshark
# Windows (amd64)
wget https://github.com/scalytics/kshark-core/releases/latest/download/kshark-windows-amd64.zip
unzip kshark-windows-amd64.zip
# Add to PATH or move to C:\Windows\System32\Verify Installation:
kshark --versionClone and Build:
# Clone repository
git clone https://github.com/scalytics/kshark-core.git
cd kshark-core
# Download dependencies
go mod download
# Build
go build -o kshark ./cmd/kshark
# Install (optional)
sudo mv kshark /usr/local/bin/Build with Version Information:
VERSION=$(git describe --tags --always --dirty)
COMMIT=$(git rev-parse --short HEAD)
DATE=$(date -u +%Y-%m-%dT%H:%M:%SZ)
go build \
-ldflags="-s -w -X main.version=${VERSION} -X main.commit=${COMMIT} -X main.date=${DATE}" \
-o kshark ./cmd/ksharkCreate Configuration Directory:
mkdir -p ~/.kshark
chmod 700 ~/.ksharkCreate Properties File:
cat > ~/.kshark/client.properties <<EOF
bootstrap.servers=broker.example.com:9092
security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-256
sasl.username=your-username
sasl.password=your-password
EOF
chmod 600 ~/.kshark/client.propertiesCreate AI Config (Optional):
cat > ~/.kshark/ai_config.json <<EOF
{
"provider": "openai",
"api_key": "sk-...",
"api_endpoint": "https://api.openai.com/v1/chat/completions",
"model": "gpt-4"
}
EOF
chmod 600 ~/.kshark/ai_config.json# Basic check
kshark -props ~/.kshark/client.properties
# With topic test
kshark -props ~/.kshark/client.properties -topic test-topic
# Automated mode
kshark -props ~/.kshark/client.properties -yUsing Provided Dockerfile:
# Clone repository
git clone https://github.com/scalytics/kshark-core.git
cd kshark-core
# Build image
docker build -t kshark:latest .
# Tag for registry
docker tag kshark:latest your-registry.com/kshark:latest
# Push to registry
docker push your-registry.com/kshark:latestMulti-platform Build:
# Enable buildx
docker buildx create --use
# Build for multiple platforms
docker buildx build \
--platform linux/amd64,linux/arm64 \
-t your-registry.com/kshark:latest \
--push \
.Basic Run:
docker run --rm \
-v $(pwd)/client.properties:/config/client.properties:ro \
kshark:latest -props /config/client.propertiesWith Reports Output:
mkdir -p reports
docker run --rm \
-v $(pwd)/client.properties:/config/client.properties:ro \
-v $(pwd)/reports:/app/reports \
kshark:latest -props /config/client.properties -yWith AI Analysis:
docker run --rm \
-v $(pwd)/client.properties:/config/client.properties:ro \
-v $(pwd)/ai_config.json:/app/ai_config.json:ro \
-v $(pwd)/license.key:/app/license.key:ro \
-v $(pwd)/reports:/app/reports \
kshark:latest -props /config/client.properties --analyze -ydocker-compose.yml:
version: '3.8'
services:
kshark:
image: kshark:latest
volumes:
- ./config/client.properties:/config/client.properties:ro
- ./config/ai_config.json:/app/ai_config.json:ro
- ./secrets/license.key:/app/license.key:ro
- ./reports:/app/reports
command: ["-props", "/config/client.properties", "-y"]
restart: "no"
kshark-scheduler:
image: kshark:latest
volumes:
- ./config/client.properties:/config/client.properties:ro
- ./reports:/app/reports
command: ["-props", "/config/client.properties", "-topic", "health-check", "-y"]
restart: "no"
# Use with cron or Kubernetes CronJob for schedulingRun:
docker-compose up ksharkconfigmap.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: kshark-config
namespace: monitoring
data:
client.properties: |
bootstrap.servers=kafka-broker.kafka.svc.cluster.local:9092
security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-256
# Do not put credentials here - use Secret insteadsecret.yaml:
apiVersion: v1
kind: Secret
metadata:
name: kshark-credentials
namespace: monitoring
type: Opaque
stringData:
sasl.username: "your-api-key"
sasl.password: "your-api-secret"
client.properties: |
bootstrap.servers=kafka-broker.kafka.svc.cluster.local:9092
security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-256
sasl.username=your-username
sasl.password=your-passwordCreate Secret:
kubectl create secret generic kshark-credentials \
--from-file=client.properties=./client.properties \
--from-file=ai_config.json=./ai_config.json \
--from-file=license.key=./license.key \
-n monitoringjob.yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: kshark-diagnostic
namespace: monitoring
spec:
ttlSecondsAfterFinished: 300
template:
spec:
restartPolicy: Never
containers:
- name: kshark
image: your-registry.com/kshark:latest
args:
- "-props"
- "/config/client.properties"
- "-topic"
- "diagnostic-test"
- "-y"
volumeMounts:
- name: config
mountPath: /config
readOnly: true
- name: reports
mountPath: /app/reports
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
volumes:
- name: config
secret:
secretName: kshark-credentials
- name: reports
emptyDir: {}Deploy:
kubectl apply -f job.yamlView Logs:
kubectl logs -f job/kshark-diagnostic -n monitoringcronjob.yaml:
apiVersion: batch/v1
kind: CronJob
metadata:
name: kshark-health-check
namespace: monitoring
spec:
# Run every 15 minutes
schedule: "*/15 * * * *"
# Keep last 3 successful and 1 failed job
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
# Clean up completed jobs after 10 minutes
ttlSecondsAfterFinished: 600
template:
metadata:
labels:
app: kshark
component: health-check
spec:
restartPolicy: OnFailure
containers:
- name: kshark
image: your-registry.com/kshark:latest
imagePullPolicy: IfNotPresent
args:
- "-props"
- "/config/client.properties"
- "-topic"
- "health-check"
- "-y"
- "-timeout"
- "30s"
volumeMounts:
- name: config
mountPath: /config
readOnly: true
- name: reports
mountPath: /app/reports
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
volumes:
- name: config
secret:
secretName: kshark-credentials
defaultMode: 0400
- name: reports
persistentVolumeClaim:
claimName: kshark-reportsDeploy:
kubectl apply -f cronjob.yamlTrigger Manual Run:
kubectl create job kshark-manual-$(date +%s) \
--from=cronjob/kshark-health-check \
-n monitoringpvc.yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: kshark-reports
namespace: monitoring
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
storageClassName: standardserviceaccount.yaml:
apiVersion: v1
kind: ServiceAccount
metadata:
name: kshark
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kshark-role
namespace: monitoring
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"]
resourceNames: ["kshark-credentials"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kshark-rolebinding
namespace: monitoring
subjects:
- kind: ServiceAccount
name: kshark
namespace: monitoring
roleRef:
kind: Role
name: kshark-role
apiGroup: rbac.authorization.k8s.ioworkflow.yaml:
name: Kafka Connectivity Check
on:
push:
branches: [main, staging, production]
pull_request:
branches: [main]
schedule:
- cron: '0 */6 * * *' # Every 6 hours
jobs:
kafka-diagnostic:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Download kshark
run: |
wget https://github.com/scalytics/kshark-core/releases/latest/download/kshark-linux-amd64.tar.gz
tar -xzf kshark-linux-amd64.tar.gz
chmod +x kshark
- name: Create configuration
run: |
cat > client.properties <<EOF
bootstrap.servers=${{ secrets.KAFKA_BOOTSTRAP_SERVERS }}
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.username=${{ secrets.KAFKA_API_KEY }}
sasl.password=${{ secrets.KAFKA_API_SECRET }}
EOF
- name: Run diagnostic
run: |
./kshark -props client.properties -topic ci-test -y
- name: Upload report
if: always()
uses: actions/upload-artifact@v4
with:
name: kshark-report
path: reports/*.html
retention-days: 30.gitlab-ci.yml:
stages:
- test
kafka_diagnostic:
stage: test
image: alpine:latest
before_script:
- apk add --no-cache wget tar
- wget https://github.com/scalytics/kshark-core/releases/latest/download/kshark-linux-amd64.tar.gz
- tar -xzf kshark-linux-amd64.tar.gz
- chmod +x kshark
script:
- |
cat > client.properties <<EOF
bootstrap.servers=${KAFKA_BOOTSTRAP_SERVERS}
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.username=${KAFKA_API_KEY}
sasl.password=${KAFKA_API_SECRET}
EOF
- ./kshark -props client.properties -topic ci-test -y
artifacts:
when: always
paths:
- reports/*.html
expire_in: 30 days
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == "main"
- if: $CI_PIPELINE_SOURCE == "schedule"Jenkinsfile:
pipeline {
agent any
environment {
KSHARK_VERSION = 'latest'
}
stages {
stage('Download kshark') {
steps {
sh '''
wget https://github.com/scalytics/kshark-core/releases/latest/download/kshark-linux-amd64.tar.gz
tar -xzf kshark-linux-amd64.tar.gz
chmod +x kshark
'''
}
}
stage('Create Configuration') {
steps {
withCredentials([
string(credentialsId: 'kafka-bootstrap-servers', variable: 'KAFKA_SERVERS'),
usernamePassword(credentialsId: 'kafka-credentials',
usernameVariable: 'KAFKA_USER',
passwordVariable: 'KAFKA_PASS')
]) {
sh '''
cat > client.properties <<EOF
bootstrap.servers=${KAFKA_SERVERS}
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.username=${KAFKA_USER}
sasl.password=${KAFKA_PASS}
EOF
'''
}
}
}
stage('Run Diagnostic') {
steps {
sh './kshark -props client.properties -topic ci-test -y'
}
}
}
post {
always {
archiveArtifacts artifacts: 'reports/*.html', allowEmptyArchive: true
}
failure {
emailext (
subject: "Kafka Connectivity Check Failed",
body: "The Kafka connectivity diagnostic failed. Check the attached report.",
attachmentsPattern: 'reports/*.html',
to: 'devops@example.com'
)
}
}
}Considerations:
- Run from multiple availability zones
- Use Kubernetes for orchestration
- Store reports in centralized storage (S3, GCS)
- Implement alerting on failures
Multi-cluster Deployment:
apiVersion: batch/v1
kind: CronJob
metadata:
name: kshark-cluster-1
spec:
schedule: "*/15 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: kshark
image: kshark:latest
args: ["-props", "/config/cluster-1.properties", "-y"]
volumeMounts:
- name: config
mountPath: /config
volumes:
- name: config
secret:
secretName: cluster-1-credentials
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: kshark-cluster-2
spec:
schedule: "*/15 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: kshark
image: kshark:latest
args: ["-props", "/config/cluster-2.properties", "-y"]
volumeMounts:
- name: config
mountPath: /config
volumes:
- name: config
secret:
secretName: cluster-2-credentialsAWS S3 Integration:
# Run diagnostic and upload to S3
docker run --rm \
-v $(pwd)/client.properties:/config/client.properties:ro \
-v $(pwd)/reports:/app/reports \
-e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
-e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
kshark:latest -props /config/client.properties -y
# Upload reports
aws s3 sync reports/ s3://my-bucket/kshark-reports/ \
--exclude "*" \
--include "*.html" \
--include "*.json"Post-run Script:
#!/bin/bash
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
REPORT_FILE="reports/analysis_report_${TIMESTAMP}.html"
# Run kshark
./kshark -props client.properties -y
# Upload to S3
if [ -f "$REPORT_FILE" ]; then
aws s3 cp "$REPORT_FILE" "s3://my-bucket/kshark-reports/"
fi
# Send to logging system
if grep -q "FAIL" "$REPORT_FILE"; then
curl -X POST https://logs.example.com/api/alerts \
-H "Content-Type: application/json" \
-d "{\"severity\":\"error\",\"message\":\"Kafka connectivity check failed\",\"report\":\"$REPORT_FILE\"}"
fiProposed Metrics:
kshark_check_duration_seconds{layer="L3",status="OK"}
kshark_check_duration_seconds{layer="L4",status="OK"}
kshark_check_duration_seconds{layer="L5-6",status="OK"}
kshark_check_duration_seconds{layer="L7",status="OK"}
kshark_check_total{layer="L3",status="OK"} 1
kshark_check_total{layer="L3",status="FAIL"} 0
kshark_last_check_timestamp
Alert on Failures:
# Kubernetes Event
apiVersion: v1
kind: ConfigMap
metadata:
name: kshark-alert-script
data:
alert.sh: |
#!/bin/sh
if grep -q "FAIL" /app/reports/*.html; then
curl -X POST https://alerts.example.com/webhook \
-H "Content-Type: application/json" \
-d '{"text":"Kafka connectivity check failed"}'
fiSlack Integration:
#!/bin/bash
WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
# Run diagnostic
./kshark -props client.properties -y > /tmp/kshark.log 2>&1
# Check for failures
if grep -q "FAIL" /tmp/kshark.log; then
FAILURES=$(grep "FAIL" /tmp/kshark.log)
curl -X POST $WEBHOOK_URL \
-H 'Content-Type: application/json' \
-d "{\"text\":\"⚠️ Kafka Connectivity Alert\n\`\`\`$FAILURES\`\`\`\"}"
fiCurrent State:
- Base image:
alpine:latest - Final image size: ~50MB
Improvements:
# Use specific version tags (not latest)
FROM golang:1.23.2-alpine3.19 AS builder
# Use distroless for even smaller runtime
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /kshark /usr/local/bin/kshark
COPY --from=builder /app/web/templates/ /app/web/templates/
ENTRYPOINT ["kshark"]
# Result: ~20MB final imageBenefits:
- Smaller image size
- Fewer vulnerabilities
- Faster pulls
Add to CI/CD:
# .github/workflows/security-scan.yml
name: Security Scan
on:
push:
branches: [main]
pull_request:
jobs:
trivy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t kshark:${{ github.sha }} .
- name: Run Trivy scan
uses: aquasecurity/trivy-action@master
with:
image-ref: kshark:${{ github.sha }}
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload results
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'Chart.yaml:
apiVersion: v2
name: kshark
description: Kafka connectivity diagnostic tool
type: application
version: 1.0.0
appVersion: "1.0.0"values.yaml:
image:
repository: your-registry.com/kshark
tag: latest
pullPolicy: IfNotPresent
schedule: "*/15 * * * *"
config:
existingSecret: "kshark-credentials"
topic: "health-check"
timeout: "30s"
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
persistence:
enabled: true
storageClass: "standard"
size: "10Gi"templates/cronjob.yaml:
apiVersion: batch/v1
kind: CronJob
metadata:
name: {{ include "kshark.fullname" . }}
spec:
schedule: {{ .Values.schedule | quote }}
jobTemplate:
spec:
template:
spec:
containers:
- name: kshark
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
args:
- "-props"
- "/config/client.properties"
- "-topic"
- {{ .Values.config.topic | quote }}
- "-timeout"
- {{ .Values.config.timeout | quote }}
- "-y"
resources:
{{- toYaml .Values.resources | nindent 14 }}
# ... (volumeMounts, volumes)Installation:
helm install kshark ./helm/kshark \
--namespace monitoring \
--create-namespace \
--set config.existingSecret=my-kafka-credentialsCustom Resource Definition:
apiVersion: diagnostics.kafka.io/v1alpha1
kind: KafkaHealthCheck
metadata:
name: production-kafka-check
spec:
schedule: "*/15 * * * *"
target:
bootstrapServers: "kafka.prod.svc.cluster.local:9092"
credentialsSecret: "kafka-prod-credentials"
checks:
- type: connectivity
- type: topic
topicName: health-check
- type: produce-consume
ai:
enabled: true
provider: openai
notifications:
- type: slack
webhook: https://hooks.slack.com/...Enhanced .goreleaser.yaml:
builds:
- id: kshark
main: ./cmd/kshark/main.go
binary: kshark
env:
- CGO_ENABLED=0
goos:
- linux
- windows
- darwin
goarch:
- amd64
- arm64
- arm
goarm:
- "7"
# Ignore specific combinations
ignore:
- goos: windows
goarch: arm-
Never commit secrets
# Use .gitignore echo "*.properties" >> .gitignore echo "ai_config.json" >> .gitignore echo "license.key" >> .gitignore
-
Use environment variables
# In Kubernetes - name: KAFKA_PASSWORD valueFrom: secretKeyRef: name: kafka-credentials key: password -
Separate configs per environment
config/ ├── dev.properties ├── staging.properties └── prod.properties
Recommended Kubernetes Resources:
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"For AI-enabled checks:
resources:
requests:
memory: "128Mi"
cpu: "200m"
limits:
memory: "256Mi"
cpu: "500m"-
Run as non-root
securityContext: runAsNonRoot: true runAsUser: 1000 allowPrivilegeEscalation: false capabilities: drop: - ALL
-
Use read-only volumes
volumeMounts: - name: config mountPath: /config readOnly: true
-
Scan images regularly
trivy image your-registry.com/kshark:latest
Check logs:
kubectl logs -l app=kshark -n monitoringCommon causes:
- Missing Secret/ConfigMap
- Incorrect volume mounts
- Network policies blocking Kafka access
- Resource limits too low
Solution:
- Ensure PVC is bound
- Check volume mount path
- Verify write permissions
kubectl get pvc -n monitoring
kubectl describe pvc kshark-reports -n monitoringCheck:
kubectl describe pod <pod-name> -n monitoring
kubectl logs <pod-name> -n monitoring --previousCommon causes:
- OOMKilled (increase memory limit)
- CrashLoopBackOff (check configuration)
- ImagePullBackOff (verify image exists)
This deployment guide provides comprehensive strategies for running kshark across various environments. Choose the deployment method that best fits your use case and infrastructure.
Next Steps:
- Review SECURITY.md for security best practices
- Check FEATURES.md for complete feature documentation
- See ARCHITECTURE.md for system architecture details
Document Version: 1.0 Author: kshark Development Team Last Review: 2025-11-13 Next Review: 2025-12-13