Skip to content

Latest commit

 

History

History
400 lines (292 loc) · 13.2 KB

File metadata and controls

400 lines (292 loc) · 13.2 KB

AI Infrastructure Architect - Reading List

Comprehensive reading list for developing enterprise-scale AI infrastructure architecture skills.

Essential Books

Enterprise Architecture

  1. TOGAF 9 Foundation Study Guide by The Open Group

    • Focus: Enterprise architecture framework and ADM
    • Level: Essential for architects
    • ISBN: 978-9087537203
    • Link: The Open Group
  2. Software Architecture in Practice (4th Edition) by Len Bass, Paul Clements, Rick Kazman

    • Focus: Architecture patterns, quality attributes, documentation
    • Level: Advanced
    • ISBN: 978-0136886099
  3. Fundamentals of Software Architecture by Mark Richards, Neal Ford

    • Focus: Modern architecture patterns and practices
    • Level: Intermediate to Advanced
    • ISBN: 978-1492043454
  4. Building Evolutionary Architectures by Neal Ford, Rebecca Parsons, Patrick Kua

    • Focus: Designing for change and evolution
    • Level: Advanced
    • ISBN: 978-1491986363

Cloud Architecture

  1. Cloud Architecture Patterns by Bill Wilder

    • Focus: Scalability, reliability, security in cloud
    • Level: Intermediate
    • ISBN: 978-1449319779
  2. Architecting the Cloud: Design Decisions for Cloud Computing Service Models by Michael J. Kavis

    • Focus: Cloud strategy and decision-making
    • Level: Advanced
    • ISBN: 978-1118617618

ML Infrastructure and MLOps

  1. Designing Machine Learning Systems by Chip Huyen

    • Focus: End-to-end ML systems design
    • Level: Advanced
    • ISBN: 978-1098107963
    • Essential: Highly recommended for ML architects
  2. Machine Learning Engineering by Andriy Burkov

    • Focus: Practical ML engineering and infrastructure
    • Level: Intermediate to Advanced
    • ISBN: 978-1999579579
  3. Reliable Machine Learning by Cathy Chen, Niall Murphy, et al.

    • Focus: SRE principles for ML systems
    • Level: Advanced
    • ISBN: 978-1098106225
  4. Building Machine Learning Powered Applications by Emmanuel Ameisen

    • Focus: Practical ML product development
    • Level: Intermediate
    • ISBN: 978-1492045113

Kubernetes and Cloud-Native

  1. Kubernetes in Action (2nd Edition) by Marko Lukša

    • Focus: Comprehensive Kubernetes guide
    • Level: Intermediate
    • ISBN: 978-1617297724
  2. Kubernetes Patterns by Bilgin Ibryam, Roland Huß

    • Focus: Design patterns for cloud-native apps
    • Level: Advanced
    • ISBN: 978-1492050285
  3. Cloud Native DevOps with Kubernetes by John Arundel, Justin Domingus

    • Focus: DevOps practices on Kubernetes
    • Level: Intermediate
    • ISBN: 978-1492040767

Data Architecture

  1. Data Mesh by Zhamak Dehghani

    • Focus: Decentralized data architecture
    • Level: Advanced
    • ISBN: 978-1492092391
    • Relevance: Modern data architecture thinking
  2. Designing Data-Intensive Applications by Martin Kleppmann

    • Focus: Foundations of data systems
    • Level: Advanced
    • ISBN: 978-1449373320
    • Essential: Must-read for data architects
  3. Streaming Systems by Tyler Akidau, Slava Chernyak, Reuven Lax

    • Focus: Real-time data processing
    • Level: Advanced
    • ISBN: 978-1491983874

Security and Compliance

  1. Zero Trust Networks by Evan Gilman, Doug Barth

    • Focus: Zero-trust security architecture
    • Level: Advanced
    • ISBN: 978-1491962190
  2. Practical Cloud Security by Chris Dotson

    • Focus: Cloud security best practices
    • Level: Intermediate to Advanced
    • ISBN: 978-1492037521

Cost Optimization and FinOps

  1. Cloud FinOps by J.R. Storment, Mike Fuller
    • Focus: Financial operations for cloud
    • Level: Intermediate
    • ISBN: 978-1492054627
    • Essential: For cost-conscious architects

Leadership and Communication

  1. The Software Architect Elevator by Gregor Hohpe

    • Focus: Navigating organizational levels as architect
    • Level: Advanced
    • Essential: For architect-level communication
  2. Talking with Tech Leads by Patrick Kua

    • Focus: Technical leadership
    • Level: Intermediate to Advanced
    • ISBN: 978-1494745035

Research Papers and Whitepapers

ML Systems

  1. Hidden Technical Debt in Machine Learning Systems - Google (NIPS 2015)

    • Link
    • Essential understanding of ML systems complexity
  2. Machine Learning: The High-Interest Credit Card of Technical Debt - Google

    • Link
    • ML technical debt and maintenance
  3. Challenges in Deploying Machine Learning: A Survey of Case Studies - Cambridge (2020)

    • Link
    • Real-world ML deployment challenges

Distributed Systems

  1. The Google File System - Google (SOSP 2003)

    • Link
    • Foundational distributed storage
  2. MapReduce: Simplified Data Processing on Large Clusters - Google (OSDI 2004)

    • Link
    • Distributed data processing
  3. The Chubby Lock Service for Loosely-Coupled Distributed Systems - Google (OSDI 2006)

    • Link
    • Distributed coordination

LLM Infrastructure

  1. FlashAttention: Fast and Memory-Efficient Exact Attention (2022)

    • Link
    • LLM optimization techniques
  2. Efficient Memory Management for Large Language Model Serving with PagedAttention (vLLM paper, 2023)

    • Link
    • Modern LLM serving
  3. LoRA: Low-Rank Adaptation of Large Language Models (2021)

    • Link
    • Efficient LLM fine-tuning

Cloud Architecture

  1. AWS Well-Architected Framework - Amazon Web Services

    • Link
    • Cloud architecture best practices
  2. Google Cloud Architecture Framework

    • Link
    • GCP architecture principles
  3. Microsoft Azure Well-Architected Framework

    • Link
    • Azure architecture guidance

Articles and Blog Posts

ML Infrastructure

  1. Uber's Michelangelo: ML Platform at Uber

  2. Netflix's ML Infrastructure: Notebook to Production

  3. Airbnb's ML Infrastructure: Bighead

  4. Meta's ML Infrastructure: PyTorch at Scale

Architecture

  1. Martin Fowler's Architecture Blog

    • Link
    • Architecture patterns and practices
  2. High Scalability Blog

    • Link
    • Architecture of large-scale systems

FinOps and Cost

  1. FinOps Foundation Resources

    • Link
    • FinOps frameworks and practices
  2. The 10 Commandments of Cost Optimization - Corey Quinn

    • Practical cost optimization

Online Courses and Certifications

TOGAF Certification

  1. TOGAF 9 Foundation and Certified - The Open Group
    • Priority: Highest for architects
    • Duration: 40-80 hours study
    • Link: The Open Group

Cloud Certifications

  1. AWS Solutions Architect – Professional

    • Provider: Amazon Web Services
    • Duration: 3-6 months preparation
    • Link: AWS Certification
  2. Google Cloud Professional Cloud Architect

  3. Microsoft Azure Solutions Architect Expert

Kubernetes

  1. Certified Kubernetes Administrator (CKA)

    • Provider: CNCF / Linux Foundation
    • Duration: 2-3 months preparation
    • Link: CNCF Certification
  2. Certified Kubernetes Security Specialist (CKS)

    • Provider: CNCF / Linux Foundation
    • Duration: 2-3 months preparation (requires CKA)
    • Link: CNCF Certification

Security

  1. CISSP (Certified Information Systems Security Professional)
    • Provider: ISC2
    • Duration: 6-12 months preparation
    • Link: ISC2 CISSP

FinOps

  1. FinOps Certified Practitioner
    • Provider: FinOps Foundation
    • Duration: 1-2 months preparation
    • Link: FinOps Foundation

Video Resources

Conference Talks

  1. KubeCon + CloudNativeCon - CNCF

    • Focus: Kubernetes and cloud-native
    • Link: YouTube
  2. AWS re:Invent - Amazon Web Services

    • Focus: AWS architecture and services
    • Link: YouTube
  3. Google Cloud Next - Google Cloud

    • Focus: GCP architecture and ML
    • Link: YouTube
  4. MLOps Community - MLOps Talks

    • Focus: ML operations and infrastructure
    • Link: YouTube

YouTube Channels

  1. TechWorld with Nana - DevOps and Kubernetes
  2. Tech Lead Journal - Architecture interviews
  3. InfoQ - Software architecture and engineering
  4. GOTO Conferences - Software architecture talks

Podcasts

  1. Software Engineering Radio

    • Focus: Software architecture and engineering
    • Frequency: Weekly
    • Link: SE Radio
  2. The Changelog

    • Focus: Open source and software development
    • Frequency: Weekly
  3. Google Cloud Podcast

    • Focus: Cloud architecture and GCP
  4. AWS Podcast

    • Focus: Cloud architecture and AWS
  5. Kubernetes Podcast from Google

    • Focus: Kubernetes and cloud-native

Community Resources

Forums and Communities

  1. TOGAF Community - Open Group Forums
  2. Cloud Architecture - AWS Forums, GCP Community, Azure Community
  3. Kubernetes - discuss.kubernetes.io
  4. MLOps Community - mlops.community
  5. FinOps Foundation - finops.org/community

Reddit Communities

  1. r/enterprisearchitecture - Enterprise architecture discussions
  2. r/cloudarchitecture - Cloud architecture patterns
  3. r/kubernetes - Kubernetes and cloud-native
  4. r/MachineLearning - ML research and engineering
  5. r/devops - DevOps and infrastructure

Standards and Frameworks

  1. TOGAF 9.2 - Enterprise Architecture Framework
  2. Zachman Framework - Enterprise Architecture Framework
  3. ITIL 4 - IT Service Management
  4. NIST Cybersecurity Framework - Security standards
  5. ISO/IEC 27001 - Information security management
  6. GDPR - Data protection regulation (EU)
  7. HIPAA - Healthcare data protection (US)
  8. SOC 2 - Security and availability standards

Recommended Reading Order

For New Architects (First 6 Months)

  1. TOGAF 9 Foundation Study Guide
  2. Designing Machine Learning Systems (Chip Huyen)
  3. Software Architecture in Practice
  4. Cloud FinOps
  5. Reliable Machine Learning
  6. The Software Architect Elevator

For Experienced Architects (Ongoing)

  1. Research papers on emerging technologies
  2. Cloud provider whitepapers and case studies
  3. Industry blogs and tech blogs
  4. Conference presentations
  5. Standards and framework updates

Staying Current

Daily/Weekly

  • Follow tech blogs from major companies (Google, Meta, Netflix, Uber, Airbnb)
  • Subscribe to architecture newsletters
  • Monitor GitHub trending repositories

Monthly

  • Read selected research papers
  • Attend local meetups or webinars
  • Review cloud provider updates

Quarterly

  • Deep dive into one new technology area
  • Update certifications or complete online course
  • Review and update personal architecture knowledge base

Annually

  • Attend major conference (KubeCon, re:Invent, etc.)
  • Complete certification renewal or new certification
  • Read 3-5 books from reading list
  • Publish or present on architecture topic

Note: This reading list is continuously updated. Last updated: 2025-10-14

Suggestions: Have a book or resource to add? Open a pull request or issue!