If you discover a security vulnerability, please report it responsibly:
- Do NOT open a public GitHub issue
- Email the maintainer directly or use GitHub's private security advisory feature
- Include detailed steps to reproduce the vulnerability
- Allow reasonable time for a fix before public disclosure
NEVER commit:
.envfiles with real credentialsconfig/spark/spark-defaults.confwith real credentials- Private keys, certificates, or tokens
- Database connection strings with embedded passwords
Always use:
.env.exampleand*.examplefiles for templates- Environment variables for sensitive configuration
- GitHub Secrets for CI/CD credentials
Install pre-commit hooks to catch security issues before committing:
# Install pre-commit
pip install pre-commit
# Install hooks
pre-commit install
# Run manually
pre-commit run --all-filesThe hooks include:
- detect-secrets: Scans for hardcoded secrets
- detect-private-key: Catches accidentally committed keys
- bandit: Python security linter
- shellcheck: Shell script security linting
When writing shell scripts:
# BAD: Direct variable interpolation in SQL
psql -c "SELECT * FROM users WHERE name = '$user_input'"
# GOOD: Validate and escape input
if [[ ! "$filename" =~ ^[a-zA-Z0-9_.-]+$ ]]; then
echo "Invalid input"
exit 1
fi
safe_value="${value//\'/\'\'}" # Escape single quotesWhen writing Python:
- Use parameterized queries for SQL
- Validate file paths to prevent traversal
- Sanitize user input before shell commands
Avoid:
network_mode: hostin production- Running containers as root
- Disabling authentication (e.g., Jupyter tokens)
- Using
privileged: true
Prefer:
- Bridge networking with explicit port mappings
- Non-root container users
- Proper authentication for all services
- Minimal container capabilities
GitHub Actions:
- Pin actions to full commit SHAs, not tags
- Use minimal permissions (
contents: read) - Never pipe curl/wget output directly to shell
- Use GitHub Secrets for all credentials
# BAD
- uses: actions/checkout@v4
# GOOD
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2Before approving PRs, verify:
- No hardcoded credentials or secrets
- Input validation for user-controlled data
- SQL queries use parameterization
- Shell commands don't use unsafe
eval - Docker configs don't expose unnecessary privileges
- CI changes don't introduce security regressions
- New dependencies don't have known vulnerabilities
Run security-focused tests:
# Run security tests only
poetry run pytest tests/test_security.py -v
# Run with security marker
poetry run pytest -m security -vCheck for vulnerable dependencies:
# Using pip-audit
pip install pip-audit
pip-audit
# Using safety
pip install safety
safety checkFor production deployments:
-
Use TLS everywhere
- Enable SSL for PostgreSQL connections
- Enable HTTPS for S3/SeaweedFS
- Use TLS for Kafka (if exposed externally)
-
Network segmentation
- Place services in private subnets
- Use security groups/firewalls
- Expose only necessary ports
-
Authentication
- Enable authentication on all services
- Use strong, unique passwords
- Rotate credentials regularly
If you suspect a security breach:
- Rotate all credentials immediately
- Review access logs
- Check for unauthorized changes
- Document the incident
- Notify affected parties
- Weekly: Run
pre-commit run --all-files - Monthly: Review dependency vulnerabilities
- Quarterly: Full security audit of configurations
- Annually: Penetration testing (if applicable)