This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
s3utils is a collection of utilities and scripts for managing S3-compatible storage at scale, specifically designed for Scality S3 Connector (S3C) and Zenko deployments. The project provides operational tools for Cross-Region Replication (CRR), metadata verification, Raft consensus debugging, and data integrity checks.
# Run unit tests
yarn test:unit
# Run functional tests (requires MongoDB and workbench environment)
# With workbench installed, use: workbench up --env-dir ./workbench/env -d
yarn test:functional
# Run both test suites with coverage
yarn test:unit && yarn test:functional# Run ESLint on tracked files
yarn lint
# Lint with zero warnings (CI mode)
yarn lint -- --max-warnings 0Most scripts are executed directly with Node.js and require environment variables for configuration:
# Example: CRR existing objects
node crrExistingObjects.js bucket1,bucket2
# Example: Verify bucket sproxyd keys
node verifyBucketSproxydKeys.js
# Example: Cleanup noncurrent versions
node cleanupNoncurrentVersions.js bucket1,bucket2Many scripts are also designed to run inside Docker containers with proper network and volume mounts (see README.md for specific examples).
Script Categories:
- CRR Management (
CRR/,crrExistingObjects.js,autoRetryFailedCRR.js,requeueFailedCRRCronJob.js): Cross-Region Replication utilities for triggering and managing replication of existing objects - Metadata Verification (
verifyBucketSproxydKeys.js,VerifyBucketSproxydKeys/): Validates sproxyd keys referenced in S3 metadata exist on storage backend - Raft Consensus Tools (
CompareRaftMembers/): Compares metadata databases between Raft leader and followers to detect and repair divergences - Data Cleanup (
cleanupBuckets.js,cleanupNoncurrentVersions.js,removeDeleteMarkers.js): Utilities for managing object versions and lifecycle - Replication Verification (
VerifyReplication/): Compares source and destination buckets to verify replication completeness - Object Repair (
ObjectRepair/,repairDuplicateVersions*.js): Repairs metadata inconsistencies and duplicate keys - Monitoring & Reporting (
service-level-sidecar/,DataReport/,bucketVersionsStats.js): UtapiV2 service-level metrics and bucket statistics - Utilities (
Clients/,utils/): Shared S3 and CloudServer client wrappers
Client Architecture:
Clients/s3Client.js: AWS SDK v3 S3 client with custom retry strategy (100 retries with exponential backoff)Clients/CloudserverClient.js: Internal CloudServer API client for direct metadata operations- Both clients use configurable retry logic and keep-alive connections for high-throughput operations
Replication Status Management:
CRR/ReplicationStatusUpdater.js: Core class for updating object replication metadata- Processes objects in batches with configurable worker concurrency
- Supports filtering by replication status: NEW, PENDING, COMPLETED, FAILED, REPLICA
- Can target specific prefixes and resume from markers for large-scale operations
Metadata Database Tools:
- Uses LevelDB for local metadata storage comparisons
CompareRaftMembers/followerDiff.js: Compares follower databases against leader viewCompareRaftMembers/compareFollowerDbs.js: Compares two follower database setsCompareRaftMembers/repairObjects.js: Automatically repairs detected inconsistencies- Implements block digests for efficient comparison of large datasets
- Supports filtering by Raft oplog cseq to eliminate false positives from concurrent updates
Stream Processing:
- Extensive use of Node.js streams for memory-efficient processing of large datasets
- Custom transform streams for filtering, diffing, and aggregating metadata
- Examples:
DBListStream,DiffStream,BlockDigestsStream,RaftOplogStream
- Runtime: Node.js 22+
- AWS SDK: v3 (client-s3, smithy packages)
- Scality Libraries:
arsenal: Core S3 data models and utilitiesbucketclient: Bucket metadata clientvaultclient: Authentication service clientwerelogs: Structured logging
- Storage: LevelDB for local metadata caching and comparison
- Kafka: node-rdkafka for replication queue management
- Testing: Jest with mongodb-memory-server
- Monitoring: Warp10 and Scuba for UtapiV2 metrics
Scripts follow consistent patterns for configuration:
- Required:
ACCESS_KEY,SECRET_KEY,ENDPOINTfor S3 operations - Optional:
DEBUG=1for debug logging,WORKERSfor concurrency control - Limits:
MAX_UPDATES,MAX_SCANNEDfor batch processing with resumption - Markers:
KEY_MARKER,VERSION_ID_MARKERfor resuming interrupted operations - Filters:
TARGET_PREFIX,TARGET_REPLICATION_STATUSfor selective processing
- Unit Tests:
tests/unit/- Mock-based testing of individual components - Functional Tests:
tests/functional/- Integration tests requiring MongoDB and workbench - Workbench:
workbench/env/default/- Local development environment with CloudServer and dependencies - CI/CD: GitHub Actions with Docker-based MongoDB and service orchestration
- Test configuration in
jest.config.jswith separate coverage directories
Raft Metadata System:
- Scripts interact with Raft-based distributed metadata storage
- Multiple raft sessions (typically 8) distribute metadata across nodes
- Leader-follower architecture requires special handling for consistency checks
- Oplog filtering prevents false positives from in-flight operations
Sproxyd Storage:
- Backend storage system for object data
- Scripts verify sproxyd key references in metadata match actual stored data
- Supports MPU (multipart upload) validation with multiple keys per object
Replication Semantics:
- Supports one-to-many replication to AWS S3, Azure Blob Storage, GCP
- Replication status lifecycle: NEW → PENDING → COMPLETED/FAILED
- REPLICA status indicates objects created via replication (not originals)
Production Usage:
- Scripts are designed for production operations on live systems
- Many scripts support dry-run modes for safety
- Progress logging and resumption markers for long-running operations
- Operations may require stopping Raft followers to ensure consistency