Skip to content

Commit 0432d4e

Browse files
authored
Refactor Dockerfile to simplify file copying process and update README.md to reflect the transition of the AIRE Standards status from Draft to Live. Enhanced clarity in the Operational Excellence section and updated repository structure details. Added new sponsor image for branding consistency. (#6)
1 parent e04b0ef commit 0432d4e

4 files changed

Lines changed: 42 additions & 31 deletions

File tree

Dockerfile

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,7 @@ COPY pyproject.toml uv.lock ./
2020
RUN uv sync --frozen --no-dev
2121

2222
# Copy project files
23-
RUN mkdir -p docs
24-
COPY mkdocs.yml ./
25-
COPY . ./docs/
23+
COPY . .
2624

2725
# Build the MkDocs site
2826
RUN uv run mkdocs build --strict --site-dir /app/site

README.md

Lines changed: 31 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# The AI Reliability Engineering (AIRE) Standards
22

33
[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
4-
[![Status: Draft](https://img.shields.io/badge/Status-Draft%20v0.1-orange)]()
4+
[![Status: Live](https://img.shields.io/badge/Status-Live%20v0.1-green)](https://github.com/exospherehost/ai-reliability-standards)
55

66
> **An open implementation guide for building reliable AI Agents at scale. Defining the practices for AI Reliability Engineering (AIRE).**
77
@@ -134,16 +134,16 @@ Security for AI agents differs from traditional software-agents are autonomous d
134134

135135
### 5. Operational Excellence & Team Culture
136136

137-
*Establishing SLAs, error budgets, team structures, and operational practices that enable reliable AI systems to scale.*
137+
*Establishing performance targets, quality budgets, team structures, and operational practices that enable reliable AI systems to scale.*
138138

139139
Operational Excellence bridges the gap between technical architecture and organizational culture. While the first four pillars define *what* to build, this pillar defines *how* teams operate, measure, and continuously improve AI systems at scale:
140140

141-
- **AI-Specific SLAs & Error Budgets** - Service Level Objectives for availability, latency, quality, safety, and efficiency; error budget policies for balancing reliability with innovation velocity
141+
- **AI-Specific Performance Targets & Quality Budgets** - Performance targets for cognitive accuracy, safety integrity, autonomy level, response performance, and cost efficiency; quality budget policies for balancing reliability with innovation velocity
142142
- **Team Structure & Shared Responsibility** - Product teams own agents end-to-end; embedded AI Reliability Engineers (AIREs) with 20% time allocation; central platform team provides infrastructure
143143
- **Progressive Autonomy Maturity Model** - Five levels of agent autonomy (L0: Human-Driven → L4: Autonomous), reducing HITL rate from 100% to <5% over time
144-
- **Reliability Reviews** - Weekly metric reviews, monthly postmortems, error budget tracking, SLO compliance monitoring
144+
- **Reliability Reviews** - Weekly metric reviews, monthly postmortems, quality budget tracking, performance target compliance monitoring
145145

146-
**Key Metrics:** SLO Compliance >95%, Error Budget Remaining >25%, HITL Rate <10%, Autonomy Level L3+, Time to Autonomy <6 months
146+
**Key Metrics:** Performance Target Compliance >95%, Quality Budget Remaining >50%, HITL Rate <10%, Autonomy Level L3+, Time to Autonomy <6 months
147147

148148
📖 **[Read the full Operational Excellence guide →](docs/pillars/operational-excellence.md)**
149149

@@ -187,19 +187,31 @@ You get to shape the future of AI reliability engineering and get recognized for
187187

188188
## Repository Structure
189189

190-
```
191-
docs/
192-
├── getting-started.md # Adoption roadmap for organizations
193-
├── pillars/
194-
│ ├── resilient-architecture.md # Pillar 1: Fault tolerance, scaling, recovery
195-
│ ├── cognitive-reliability.md # Pillar 2: Accuracy, consistency, drift detection
196-
│ ├── quality-lifecycle.md # Pillar 3: Testing, deployment, feedback loops
197-
│ ├── security.md # Pillar 4: JIT access, guardrails, audit logs
198-
│ └── operational-excellence.md # Pillar 5: SLAs, team structure, progressive autonomy
199-
└── appendix/
200-
├── principles.md # AIRE Principles (5 guiding tenets)
201-
├── metrics-framework.md # Three-tier metrics framework
202-
└── glossary.md # Key terms and definitions
190+
This repository contains the source files for the AIRE Standards documentation and deployment infrastructure:
191+
192+
```text
193+
.
194+
├── docs/ # MkDocs documentation source
195+
│ ├── index.md # Documentation homepage
196+
│ ├── getting-started.md # Adoption roadmap for organizations
197+
│ ├── principles.md # AIRE Principles (5 guiding tenets)
198+
│ ├── pillars/ # Core reliability pillars
199+
│ │ ├── resilient-architecture.md # Pillar 1: Fault tolerance, scaling, recovery
200+
│ │ ├── cognitive-reliability.md # Pillar 2: Accuracy, consistency, drift detection
201+
│ │ ├── quality-lifecycle.md # Pillar 3: Testing, deployment, feedback loops
202+
│ │ ├── security.md # Pillar 4: JIT access, guardrails, audit logs
203+
│ │ └── operational-excellence.md # Pillar 5: Performance targets, team structure, progressive autonomy
204+
│ └── appendix/
205+
│ ├── metrics-framework.md # Three-tier metrics framework
206+
│ └── glossary.md # Key terms and definitions
207+
├── assets/ # Static assets (sponsor logos, images)
208+
├── k8s/ # Kubernetes deployment manifests
209+
├── stylesheets/ # Custom CSS for documentation
210+
├── mkdocs.yml # MkDocs configuration
211+
├── Dockerfile # Container image for documentation site
212+
├── pyproject.toml # Python project dependencies
213+
├── README.md # GitHub repository homepage (this file)
214+
├── CONTRIBUTORS.md # Contributors registry
203215
```
204216

205217
---
@@ -215,7 +227,7 @@ We welcome Pull Requests (PRs) from engineers who have solved specific reliabili
215227

216228
## Sponsors
217229

218-
<a href="https://exosphere.host"><img src="./assets/sponsors/exosphere.png" alt="ExosphereHost Inc." width="75"></a>
230+
<a href="https://exosphere.host"><img src="./docs/assets/sponsors/exosphere.png" alt="ExosphereHost Inc." width="75"></a>
219231

220232
Contact nikita@exosphere.host to sponsor this work.
221233

docs/index.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# The AI Reliability Engineering (AIRE) Standards
22

33
[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
4-
[![Status: Draft](https://img.shields.io/badge/Status-Draft%20v0.1-orange)]()
4+
[![Status: Live](https://img.shields.io/badge/Status-Live%20v0.1-green)](https://github.com/exospherehost/ai-reliability-standards)
55

66
> **An open implementation guide for building reliable AI Agents at scale. Defining the practices for AI Reliability Engineering (AIRE).**
77
@@ -109,7 +109,6 @@ Operational Excellence bridges the gap between technical architecture and organi
109109

110110
---
111111

112-
113112
## AIRE Principles
114113

115114
*Guiding tenets inspired by SRE:*
@@ -150,7 +149,6 @@ Design for autonomous operation. Human escalation is a safety net for edge cases
150149

151150
---
152151

153-
154152
## Getting Started
155153

156154
**New to AIRE?** Start with the **[Getting Started Guide →](getting-started.md)** for a step-by-step adoption roadmap:
@@ -189,17 +187,20 @@ You get to shape the future of AI reliability engineering and get recognized for
189187

190188
## Repository Structure
191189

192-
```
193-
docs/
190+
This documentation is built from the [ai-reliability-standards repository](https://github.com/exospherehost/ai-reliability-standards). The repository structure includes:
191+
192+
```text
193+
docs/ # Documentation source files
194+
├── index.md # This page (documentation homepage)
194195
├── getting-started.md # Adoption roadmap for organizations
195-
├── pillars/
196+
├── principles.md # AIRE Principles (5 guiding tenets)
197+
├── pillars/ # Core reliability pillars
196198
│ ├── resilient-architecture.md # Pillar 1: Fault tolerance, scaling, recovery
197199
│ ├── cognitive-reliability.md # Pillar 2: Accuracy, consistency, drift detection
198200
│ ├── quality-lifecycle.md # Pillar 3: Testing, deployment, feedback loops
199201
│ ├── security.md # Pillar 4: JIT access, guardrails, audit logs
200-
│ └── operational-excellence.md # Pillar 5: SLAs, team structure, progressive autonomy
202+
│ └── operational-excellence.md # Pillar 5: Performance targets, team structure, progressive autonomy
201203
└── appendix/
202-
├── principles.md # AIRE Principles (5 guiding tenets)
203204
├── metrics-framework.md # Three-tier metrics framework
204205
└── glossary.md # Key terms and definitions
205206
```
@@ -219,7 +220,7 @@ We welcome Pull Requests (PRs) from engineers who have solved specific reliabili
219220

220221
<a href="https://exosphere.host"><img src="./assets/sponsors/exosphere.png" alt="ExosphereHost Inc." width="75"></a>
221222

222-
Contact nivedit@exosphere.host to sponsor this work.
223+
Contact nikita@exosphere.host to sponsor this work.
223224

224225
## License
225226

0 commit comments

Comments
 (0)