GitHub - NVIDIA/aistore: AIStore: scalable storage for AI applications

AIStore: High-Performance, Scalable Storage for AI Workloads

AIStore (AIS) is a lightweight distributed storage stack tailored for AI applications. It's an elastic cluster that can grow and shrink at runtime and can be ad-hoc deployed, with or without Kubernetes, anywhere from a single Linux machine to a bare-metal cluster of any size. Built from scratch, AIS provides linear scale-out, consistent performance, and a flexible deployment model.

AIS consistently shows balanced I/O distribution and linear scalability across an arbitrary number of clustered nodes. The system supports fast data access, reliability, and rich customization for data transformation workloads.

Features

✅ Multi-Cloud Access: Seamlessly access and manage content across multiple cloud backends (including AWS S3, GCS, Azure, OCI), with the additional benefit of fast-tier performance and configurable data redundancy.
✅ Deploy Anywhere: AIS runs on any Linux machine, virtual or physical. Deployment options range from a single Docker container and Google Colab to petascale Kubernetes clusters. There are no built-in limitations on deployment size or functionality.
✅ High Availability: Redundant control and data planes. Self-healing, end-to-end protection, n-way mirroring, and erasure coding. Arbitrary number of lightweight access points.
✅ HTTP-based API: A feature-rich, native API (with user-friendly SDKs for Go and Python), and compliant Amazon S3 API for running unmodified S3 clients.
✅ Monitoring: Comprehensive observability with integrated Prometheus metrics, Grafana dashboards, detailed logs with configurable verbosity, and CLI-based performance tracking for complete cluster visibility and troubleshooting. See Observability for details.
✅ Unified Namespace: Attach AIS clusters together to provide fast, unified access to the entirety of hosted datasets, allowing users to reference shared buckets with cluster-specific identifiers.
✅ Turn-key Cache: In addition to robust data protection features, AIS offers a per-bucket configurable LRU-based cache with eviction thresholds and storage capacity watermarks.
✅ ETL Offload: Execute I/O intensive data transformations close to the data, either inline (on-the-fly as part of each read request) or offline (batch processing, with the destination bucket populated with transformed results).
✅ Existing File Datasets: Ingest file datasets from any local or remote source, either on-demand (ad-hoc) or through asynchronous batch.
✅ Data Consistency: Guaranteed consistency across all gateways, with write-through semantics in presence of remote backends.
✅ Small File Optimization: AIS supports TAR, ZIP, TAR.GZ, and TAR.LZ4 serialization for batching and processing small files. Features include initial sharding, distributed shuffle (re-sharding), appending to existing shards, listing contained files, and more.
✅ Kubernetes: For production deployments, we developed the AIS/K8s Operator. A dedicated GitHub repository contains Ansible scripts, Helm charts, and deployment guidance.
✅ Authentication and Access Control: OAuth 2.0-compatible authentication server (AuthN).
✅ Batch Jobs: Start, monitor, and control cluster-wide batch operations.

The feature set is actively growing and also includes: adding/removing nodes at runtime, managing TLS certificates at runtime, listing, copying, prefetching, and transforming virtual directories, executing presigned S3 requests, adaptive rate limiting, and more.

For the original white paper and design philosophy, please see AIStore Overview, which also includes high-level block diagram, terminology, APIs, CLI, and more. For our 2024 KubeCon presentation, please see AIStore: Enhancing petascale Deep Learning across Cloud backends.

CLI

AIS includes an integrated, scriptable CLI for managing clusters, buckets, and objects, running and monitoring batch jobs, viewing and downloading logs, generating performance reports, and more:

$ ais <TAB-TAB>

advanced         config           get              prefetch         show
alias            cp               help             put              space-cleanup
archive          create           job              remote-cluster   start
auth             download         log              rmb              stop
blob-download    dsort            ls               rmo              storage
bucket           etl              object           scrub            tls
cluster          evict            performance      search           wait

Developer Tools

AIS runs natively on Kubernetes and features open format - thus, the freedom to copy or move your data from AIS at any time using the familiar Linux tar(1), scp(1), rsync(1) and similar.

For developers and data scientists, there's also:

Go API used in CLI and benchmarking tools
Python SDK + Reference Guide
PyTorch integration and usage examples
Boto3 support

Quick Start

Read the Getting Started Guide for a 5-minute local install, or
Run a minimal AIS cluster consisting of a single gateway and a single storage node, or
Clone the repo and run make kill cli aisloader deploy followed by ais show cluster

Deployment options

AIS deployment options, as well as intended (development vs. production vs. first-time) usages, are all summarized here.

Since the prerequisites essentially boil down to having Linux with a disk the deployment options range from all-in-one container to a petascale bare-metal cluster of any size, and from a single VM to multiple racks of high-end servers. Practical use cases require, of course, further consideration.

Some of the most popular deployment options include:

Option	Use Case
Local playground	AIS developers or first-time users, Linux or Mac OS. Run `make kill cli aisloader deploy <<< $'N\nM'`, where `N` is a number of targets, `M` is a number of gateways
Minimal production-ready deployment	This option utilizes preinstalled docker image and is targeting first-time users or researchers (who could immediately start training their models on smaller datasets)
Docker container	Quick testing and evaluation; single-node setup
GCP/GKE automated install	Developers, first-time users, AI researchers
Large-scale production deployment	Requires Kubernetes; provided via ais-k8s

For performance tuning, see performance and AIS K8s Playbooks.

Existing Datasets

AIS supports multiple ingestion modes:

✅ On Demand: Transparent cloud access during workloads.
✅ PUT: Locally accessible files and directories.
✅ Promote: Import local target directories and/or NFS/SMB shares mounted on AIS targets.
✅ Copy: Full buckets, virtual subdirectories (recursively or non-recursively), lists or ranges (via Bash expansion).
✅ Download: HTTP(S)-accessible datasets and objects.
✅ Prefetch: Remote buckets or selected objects (from remote buckets), including subdirectories, lists, and/or ranges.
✅ Archive: Group and store related small files from an original dataset.

Install from Release Binaries

You can install the CLI and benchmarking tools using:

./scripts/install_from_binaries.sh --help

The script installs aisloader and CLI from the latest or previous GitHub release and enables CLI auto-completions.

PyTorch integration

PyTorch integration is a growing set of datasets (both iterable and map-style), samplers, and dataloaders:

AIStore Badge

Let others know your project is powered by high-performance AI storage:

[![aistore](https://img.shields.io/badge/powered%20by-AIStore-76B900?style=flat&labelColor=000000)](https://github.com/NVIDIA/aistore)

More Docs & Guides

How to find information

See Extended Index
Use CLI search command, e.g.: ais search copy
Clone the repository and run git grep, e.g.: git grep -n out-of-band -- "*.md"

License

MIT

Author

Alex Aizman (NVIDIA)

Name		Name	Last commit message	Last commit date
Latest commit History 9,285 Commits
.github		.github
3rdparty		3rdparty
ais		ais
api		api
bench		bench
cmd		cmd
cmn		cmn
core		core
deploy		deploy
docs		docs
ec		ec
ext		ext
fs		fs
hk		hk
ios		ios
memsys		memsys
mirror		mirror
nl		nl
python		python
reb		reb
res		res
scripts		scripts
space		space
stats		stats
sys		sys
tools		tools
tracing		tracing
transport		transport
volume		volume
xact		xact
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.golangci.yml		.golangci.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
action.yml		action.yml
go.mod		go.mod
go.sum		go.sum
netlify.toml		netlify.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Features

CLI

Developer Tools

Quick Start

Deployment options

Existing Datasets

Install from Release Binaries

PyTorch integration

AIStore Badge

More Docs & Guides

How to find information

License

Author

About

Uh oh!

Releases 31

Uh oh!

Contributors 42

Languages

License

NVIDIA/aistore

Folders and files

Latest commit

History

Repository files navigation

Features

CLI

Developer Tools

Quick Start

Deployment options

Existing Datasets

Install from Release Binaries

PyTorch integration

AIStore Badge

More Docs & Guides

How to find information

License

Author

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 31

Uh oh!

Contributors 42

Languages