AIStore 4.2 focuses on correctness and reliability, authentication and observability, API modernization, and operational fixes across backends and tooling.
The resilver subsystem was substantially rewritten, introducing explicit preemption on mountpath events and full support for chunked object relocation.
Security enhancements include Prometheus metrics for AuthN, persistent RSA key pairs, OIDC-compatible discovery and JWKS endpoints.
New APIs replace legacy polling with explicit condition-based waiting for batch jobs and introduce a chunk-aware HEAD(object), enabling more scalable job monitoring and efficient large-object access.
List-objects implementation was corrected for non-recursive walks on AIS buckets, while the Python SDK and CLI add faster, chunk-aware download paths with parallelism and progress reporting.
Additional improvements and fixes span: cloud bucket namespaces, multipart retry behavior, backend interoperability, and premature global-rebalance completion reporting.
In particular, 4.2 adds support for namespace-scoped cloud buckets (e.g., s3://#prod/data, s3://#dev/data, etc.). This enables multi-tenant scenarios where different users/accounts access same-named buckets in different cloud accounts via respective (different) profiles and/or endpoints.
AIStore 4.2 maintains full backward compatibility with v4.1 and earlier releases. Overall, this release improves the system's availability in presence of disk faults, observability and correctness under load, and modernizes long-standing APIs.
Table of Contents
- Resilver
- Authentication and Observability
- New APIs
- 3.1 Xaction v2
- 3.2 Object HEAD v2
- List Objects: Non-Recursive Walks
- Multipart transfers (downloads, uploads, and backend interoperability)
- Global Rebalance
- Filesystem Health Checker (FSHC)
- ETag and Last-Modified Normalization
- Python SDK
- CLI
- Documentation
- Build and CI
- Miscellaneous fixes across subsystems
Resilver
Resilver is AIStore’s node-local counterpart to global rebalance: it redistributes objects across a target’s mountpaths to restore correct placement and redundancy after volume changes (attach, detach, enable, disable).
Version 4.2 introduces a major rewrite of the resilver xaction for correctness and reliability. The previous implementation relied on retry-based copying loops; the new implementation uses deterministic copy (object replica) selection and explicit lifecycle management.
Resilver is a single-target operation: cluster-wide execution is now disallowed to prevent cross-target interference; attempts to start resilver without a target ID are rejected by both the CLI and by AIStore itself (i.e., calling the API directly without a target ID will also fail).
Mountpath event-triggered resilvers remain internal and continue to register with IC automatically (the events are: enable, disable, attach, and detach).
Improvements include:
- Preemption on mountpath events - disable/detach/attach/enable operations now abort any running resilver and restart appropriately; handles back-to-back mountpath events without data loss
- Chunked object relocation - step-wise relocation with rollback and cleanup on error; validates chunk placement
- Mountpath jogger lifecycle - filesystem walks terminate when the parent xaction aborts
- Concurrent access fixes across shared resilver state
- Runtime progress visibility - live counters (visited objects, active workers) via 'ais show job' CLI
- Deterministic primary copy selection - to eliminate contention between concurrent goroutines
New documentation: docs/resilver.md
Commit Highlights
- 805f9ab93: Rewrite copy-recovering path; deterministic primary copy selection
- 48153ca6d: Preempt running xaction upon mountpath events; add tests
- 55c890fdd: Relocate chunked objects with rollback
- 140c6e432: Stop joggers; revise concurrent access to shared state
- d2a56a220: Add stress tests; consolidate resilver tests
- 1a53f10e9: Revise mountpath jogger; add walk-stopped sentinel
- 96213840f: Wire walks to parent xaction abort
Authentication and Observability
Building on v4.1 authentication improvements, this release adds OIDC-compatible discovery and JWKS endpoints, persistent RSA key pairs, and production-grade observability for AuthN.
AuthN now supports both HMAC-SHA256 and RSA (RS256) token signing. When no HMAC secret is provided via config or AIS_AUTH_SECRET_KEY, AuthN will initialize and persist an RSA keypair on disk and use it to issue RS256-signed JWTs.
From the operational perspective, the important changes include:
- OIDC discovery + JWKS endpoints: AuthN now serves
/.well-known/openid-configurationand a public JWKS endpoint for RS256 verification; JWKS responses include cache control based on token expiry. - Cluster validation handshake: either the HMAC secret checksum or the RSA public key, depending on the configured signing method.
- Persistent RSA key pairs for AuthN - RSA private key is loaded from disk if present; otherwise generated once and persisted (key rotation not yet implemented)
- Prometheus metrics for OIDC/JWKS: counters for invalid iss/kid, plus latency histograms for issuer discovery and key fetches.
- Improved logging and validation - clearer token and ACL failure diagnostics; stricter permission checks (including admin and bucket-admin)
And separately, CLI:
ais authncommand, to inspect OIDC configuration and display RSA public keysais authn show oidcandais authn show jwks- to display discovery and JWKS output (JSON or table)
Commit Highlights
- b930247cc: Add metrics for total counts and JWKS caching
- 49733a2d6: Refactor access check; add initial Prometheus metrics
- 4253d7bde: Persistent RSA key pair for AuthN
- 2fb333675: Add RSA signing and validation to AuthN service
- 26e4d5098: Improve logging on token/ACL failures
- 48b830470: CLI: view OIDC config and show RSA public key
- df8f9e764: Fix CheckPermissions to validate all permission types
New APIs
3.1 Xaction v2
Xaction (eXtended action) is AIStore’s abstraction for asynchronous batch jobs. All xactions expose a uniform API and CLI for starting, stopping, waiting, and reporting both generic and job-specific statistics.
Version 4.2 introduces explicit separation between IC-notifying and non-IC xactions, replacing legacy polling with explicit condition-based waiting and formalizing two observation modes:
- IC-based status observation for xactions that actively report progress via IC
- Snapshot-based observation for xactions that do not
For IC-notifying xactions, this avoids polling every target and makes waiting scale predictably even in large clusters. Snapshot-based xactions continue to use explicit snapshot inspection.
Background: IC (Information Center) runs on three AIS gateways (one primary and
two random). Targets asynchronously notify IC of xaction progress, eliminating
per-target polling.
Observation APIs
| Observation type | API | Semantics |
|---|---|---|
| Status-based (IC) | GetStatus |
Return current status as reported by IC |
WaitForStatus |
Block until a condition is satisfied using IC-reported status | |
| Snapshot-based | GetSnaps |
Fetch current xaction snapshots |
WaitForSnaps |
Wait until a snapshot-based condition is satisfied | |
WaitForSnapsStarted |
Wait until at least one matching xaction is observed | |
WaitForSnapsIdle |
Wait until snapshots become empty (xaction quiescence) |
Built-in Conditions
Conditions are expressed as xact.ArgsMsg methods and apply to waiting APIs:
| Condition | Meaning |
|---|---|
Finished() |
Xaction reached a terminal state (finished or aborted) |
NotRunning() |
No matching xaction is currently running |
Started() |
At least one matching xaction has been observed |
Idle() |
Consecutive empty snapshots observed |
Conditions that require OnlyRunning=true set it automatically.
Note: Python SDK support for the same v2 job-wait semantics is planned for an upcoming release.
Commit Highlights
- 0309822081: Introduce xaction-v2 waiting/getting APIs
- 881a6c9564: Move condition implementations; define Finished semantics
- 9dec7a4c31: Move/revise WaitForXactionIC; remove legacy polling
- 56e11b2c66: Reorg and document xaction-v2 APIs
- 3aa6b770d: CLI: multi-object jobs wait on xkind (not xname)
3.2 Object HEAD v2
AIStore 4.2 introduces Object HEAD v2, a chunk-aware, opt-in extension of the existing HEAD(object) API that allows clients to request structured object metadata with lower overhead and clearer access semantics.
Unlike the legacy HEAD(object) path, v2 API can selectively emit metadata fields based on the requested properties, avoiding unnecessary overhead.
When requested, it exposes chunk layout information (count and maximum chunk size), enabling clients to efficiently plan parallel range reads without additional probing.
Metadata derivation was also made more consistent. Last-Modified and ETag handling was normalized across native AIS, S3 compatibility paths, list-objects fast paths, and multipart workflows, eliminating several long-standing edge cases.
Object HEAD v2 is strictly additive. The legacy HEAD(object) v1 API, request/response structures, and execution path are preserved unchanged and continue to be used by default. Existing clients follow the exact same code paths and observe identical behavior unless v2-specific properties are explicitly requested.
Deprecation notice: The legacy HEAD(object) v1 API is now considered deprecated.
It remains fully supported in 4.2, but new development should target Object HEAD v2.
The v1 path is planned for removal in a future major release.
Disclaimer: Object HEAD v2 is new and considered experimental in 4.2.
While the API is stable, we reserve the right to modify associated control structures during the 4.2 lifecycle.
Commit Highlights
- 4a1519fc6: Rewrite target ObjHeadV2; unify Last-Modified and ETag derivation
- ce25a7ec4: Introduce object HEAD v2 for chunk-aware access
- b12e0b3c0: Refactor object HEAD workflow and error handling
List Objects: Non-Recursive Walks
Corrected non-recursive listing semantics (ls --nr) for AIS buckets:
- Directory entries included via new
IncludeDirswalk option - Trailing
/convention enforced for directory names - Lexicographical ordering for continuation tokens in non-recursive mode
- Pagination fixes across pages with mixed directory/file content
Commit Highlights
- 48793d79d: Fix non-recursive operation on ais:// buckets
- 1a2a5eabb: Amend CheckDirNoRecurs for non-recursive list-objects
Multipart transfers (downloads, uploads, and backend interoperability)
This release significantly improves multipart data transfers across the entire stack, covering client-side downloads, server-side multipart uploads, and backend-specific interoperability.
Client-side multipart download is now available via ais get --mpd, enabling concurrent HTTP range reads with progress reporting. When chunk size or object size are not explicitly provided, the implementation leverages the new HeadObjectV2 API to discover object size and chunk layout, ensuring efficient and predictable parallelization for large objects.
Multipart uploads now respect backend retry behavior and avoid holding excess memory or open streams across retries. Part uploads to remote backends consistently use retriable and/or seekable readers, ensuring correctness for providers that rely on stream rewind during retries (notably Azure and remote AIS).
Under high memory pressure, AIS now applies explicit backpressure: the target waits briefly for pressure to subside and returns HTTP 429 ("Too Many Requests") when safe buffering remains impossible, replacing the previous best-effort streaming fallback.
Backend-specific fixes further improve robustness and protocol compatibility. S3 multipart handling now normalizes ETag quoting and derivation, disables retries for non-seekable readers to avoid SDK rewind failures, and fixes edge cases in presigned-request error handling.
Manifest validation and part lookup paths were tightened to fail fast on missing or inconsistent state, preventing silent corruption and improving error diagnostics.
Together, these changes make multipart transfers more reliable and predictable across heterogeneous backends and large deployments.
Commit Highlights
- aa5a32bdd: Fix corrupted SGL offset on retry for remote AIS backend
- 21088dd26: S3 backend: disable retries for non-seekable readers
- 2400cf111: S3 backend: fix nil deref in presigned-request response path
- 3d8b2e67d: Fix copy-bucket sync not detecting remotely-deleted objects
- d1bdc5f52: Fix blob download abort timeout by making Abort() non-blocking
Global Rebalance
Global rebalance is a special cluster-wide xaction that handles cluster membership changes. It is orchestrated by the primary (gateway), is configurable, and can also be started manually by an admin (e.g., ais start rebalance).
Rebalance start and completion are now serialized under a single critical section that atomically binds the renewed xaction (job) to the singleton’s internal state (cluster map, stage, and job ID). This prevents rare races during rapid, back-to-back membership changes.
Additional improvements:
- Completion reporting - fix cases where rebalance could be reported as finished prematurely, causing query and wait APIs to return early during membership changes
- Atomic xaction binding - serialize
initRenew()andfini()for symmetric state transitions - Identity checks - guard against unlikely race conditions
- Persistent markers - trigger FSHC on I/O errors, surfacing repeated rebalance failures as mountpath health issues
- Post-renew Smap handling - validate same-targets (not same-count) and proceed safely on benign version bumps
Commit Highlights
- a74af7d097: Fix premature rebalance completion reporting during membership changes
- 90ab46e1e: Guard atomic state; reset published pointers
- 8c97b2e6a: Persistent markers to trigger FSHC
- 570af0e42: Clarify post-renew Smap change
Filesystem Health Checker (FSHC)
FSHC was improved to better distinguish fatal mountpath (disk) failures from transient I/O errors (noise), while preventing runaway re-check loops under repeated error conditions.
Major changes include:
-
Failure classification: FAULTED vs DEGRADED
FAULTED: root-level failures (mount rootstat, filesystem identity mismatch, cannot open mount root) disable the mountpath immediately.DEGRADED: sampling read/write errors disable the mountpath only if the combined error count exceeds the configured threshold.
-
Bounded sampling
- FSHC runs two passes of sampling (read + write/fsync), stopping early when thresholds are exceeded.
-
One-shot delayed retry
- Root checks retry once with a short delay to avoid false positives from transient hiccups (common with network-attached storage).
-
Per-mountpath scheduling
- Checks are serialized per mountpath and rate-limited (minimum interval between runs), preventing repeated errors from triggering constant disk hammering.
-
Avoid false positives
- FSHC does not trigger on benign conditions such as
ENOENTand other known non-disk failure paths.
- FSHC does not trigger on benign conditions such as
New documentation: docs/fshc.md
Commit Highlights
- 6a1284ae1: FAULTED/DEGRADED classification; bounded sampling
- eefa2ae6a: Add single-delay retry; refactor unit tests
ETag and Last-Modified Normalization
Strict consistency for metadata handling across the codebase:
- Unquoted internally, quoted on the wire - applies to headers and S3 XML responses
lom.ETag()- prefer custom cloud ETag; fall back to MD5 when available; otherwise generate a stable synthetic valuelom.LastModified()/lom.LastModifiedLso()- avoid mtime syscalls in list-objects fast paths- S3 list-objects - amend wanted metadata on a best-effort basis without extra syscalls
These changes unify metadata semantics across native AIS, S3 compatibility paths, and list-objects fast paths.
Commit Highlights
- 33c987cd4: Normalize ETag and Last-Modified; generate both if needed
- 8fb294819: Quote ETag in XML responses (ListParts, MPU complete)
- 4d6873bb6: ETag quoting/unquoting rules: strict consistency
Python SDK
The Python SDK gained chunk-aware range reading built on the new, experimental HeadObjectV2 API. The API exposes object chunk layout (ObjectPropsV2.Chunks), allowing the SDK to issue chunk-aligned range requests and reassemble chunks in sequential order.
This enables optional parallel reads for large objects via Object.get_reader(num_workers=N), particularly beneficial for chunked objects.
HeadObjectV2is introduced as in this release; while the semantics are stable, the underlying control structures may evolve during the 4.2 iteration.
Fixed cluster-key (HMAC) authentication failures on redirects caused by Content-Length changes during request forwarding (HTTPS + urllib3).
Job-waiting behavior was unified across long-running operations. Job.wait(), wait_for_idle(), wait_single_node(), and Dsort.wait()
now return a structured WaitResult (success flag, error, and completion time), improving timeout handling and diagnostics for aborted or preempted jobs.
Finally, Python access-control constants were synchronized with AIS semantics. This includes updated read/write scopes, corrected cluster-level access flags, and alignment with bucket-admin permissions to match the server-side authorization model.
Commit Highlights
- 036361cff: Add chunk-aware object reader with concurrent range-reads
- 744f5a0b6: Add multipart download API with concurrent range-reads
- 51047e075: Add progress report callback to MultipartDownload
- 34d772ea4: Fix HMAC mismatch and improve request handling
- f3a6504cb: Implement WaitResult dataclass for consistent job wait API
- 94f1e0ec1: Release Python SDK version 1.19.0
- 7662730c3: Add
requests_listproperty to retrieve MossIn requests in Batch class
CLI
The CLI now provides safer handling of object names containing special symbols across more commands.
The existing --encode-objname option is recognized in additional code paths (including GET/PUT, rename, multipart, ETL object, and others). When the flag isn’t provided and the name appears to require escaping, the CLI emits a one-time warning suggesting --encode-objname.
Note: do not combine
--encode-objnamewith already-escaped names to avoid double-encoding.
For large downloads, ais get adds --mpd for client-side multipart (HTTP range) download of a single object, with a progress bar.
Related flags (--chunk-size, --num-workers) are now validated and require either --mpd or --blob-download.
Note: the
ais getcommand-line options--mpdand--blob-downloadare mutually exclusive.
Authentication tooling is expanded with ais auth show oidc and ais auth show jwks, providing direct visibility into AuthN OIDC configuration and public JWKS.
Resilvering commands were tightened to require an explicit target ID and now detect and warn about an already running resilver before starting a new one. Several usage and help texts were improved, and ais ls --no-recursion now lists directories before objects.
In short, the release includes:
- Multipart download -
ais get --mpdwith configurable--chunk-sizeand--num-workers, including progress bar support - Resilver shortcut -
ais storage resilveralias; confirmation when a resilver is already running - Special character encoding -
--encode-objnameacross put/get/remove/rename; centralized warning logic - Mountpath commands - expanded usage text for attach/detach/enable/disable
- Extended help -
ais create --helpnow explicitly documents six distinct bucket-creation and attachment scenarios, including AIS buckets, remote AIS clusters, cloud buckets with custom credentials, pre-configured properties, namespace-qualified cloud buckets, and advanced--skip-lookupregistration.
Commit Highlights
- 876264b9c: Add client-side multipart download to
ais get - 08eb7224a: Add
ais storage resilveralias; confirm when already running - 48d247d7c: Warn/encode special symbols in object names across operations
- bb44f2179: Add extended
ais create --help - e2837ed8b: Add extended usage for mountpath commands
- bde86af9c: Support remote bucket props (
profile,endpoint) at creation time with--skip-lookup - d413e3a2c: Refactor bucket creation; revise
--skip-lookuppath
Documentation
- New:
docs/resilver.md- resilver architecture and operations - New:
docs/fshc.md- filesystem health checker - Rewritten:
docs/bucket.md- AIS Buckets: Design and Operations - Revised:
docs/http_api.md- updated cross-references - Python SDK docs - environment variables and configuration precedence
Commit Highlights
- 63fb898e3: Add docs/resilver.md
- 7a57f2913: New docs/bucket.md - design and operations
- 759c7fc61: Clarify bucket creation vs identity
- 291f428a0: Python SDK: add env vars and config precedence
Build and CI
CI pipelines now run larger clusters with remote AIS enabled, improving test coverage for distributed scenarios. Infrastructure and dependency updates include:
- 50f00f8c4: GitHub CI: run larger cluster with remote AIS
- ece283da6: GitLab CI: enable remote AIS cluster (
remais) in long tests - 9be9f67d1: Unify host:port and URL construction; remove IPv4-only assumptions
- 97da81299: Parallelize Docker image builds
- d2aa6a1fc: Add pipeline to randomize bucket namespacing
- 0af6a913d: Upgrade OSS dependencies
Miscellaneous fixes across subsystems
Concurrency and ordering
b5590a094FixOpcDoneordering race - ensure completion sentinel (indicating end of Tx) cannot arrive before Rx datad1bdc5f52Non-blocking abort for blob-download xactions - prevent premature abort completion under concurrent shutdownaa5a32bddMultipart retry correctness against remote AIS backend22b599760Early request rejection - prevent "superfluous response" on invalid object rename
List-objects and metadata
48793d79dNon-recursive listing semantics - correctlist-objectsoperation forais://buckets1a2a5eabbDirectory handling fix - amendCheckDirNoRecurslogic for edge cases765953ebdCloud namespace metadata - fix BMD handling for cloud buckets with namespaces45272ceef: Enforce 2KiB size limit for custom object metadata