Skip to content

This repo is for the LinkedIn Learning course: Operating AI Agents: Failure and Recovery

License

Notifications You must be signed in to change notification settings

LinkedInLearning/operating-AI-agents-failure-and-recovery-8020004

Repository files navigation

Operating AI Agents: Failure and Recovery

This is the repository for the LinkedIn Learning course Operating AI Agents: Failure and Recovery. The full course is available from [LinkedIn Learning][lil-course-url].

course-name-alt-text

Course Description

As AI agents shift from experimentation to production, operational failures can create serious business risks. This intermediate course explores practical techniques for monitoring agent behavior, tracing execution paths, and identifying failure modes across single‑ and multi‑agent systems. Through hands-on GitHub Codespaces exercises, you learn how to implement rollback mechanisms, build automated recovery workflows, and create reports that surface agent health and system status in real time. By the end of the course, you’ll have the skills to improve the safety and predictability of AI agents in production, and to respond quickly and effectively when failures occur.

Notes

Requirements

Setup

  1. Clone this repo (or download the files).
  2. Create and activate a virtual environment:
    python -m venv venv
    source venv/bin/activate   # macOS/Linux
    venv\Scripts\activate      # Windows
  3. Install dependencies:
    pip install -r requirements.txt
  4. Set your OpenAI API key or place in .env file:
    export OPENAI_API_KEY="your_api_key"      # macOS/Linux
    setx OPENAI_API_KEY "your_api_key"        # Windows PowerShell

Instructor

Kesha Williams

Award-Winning Tech Innovator and AI/ML Leader

About

This repo is for the LinkedIn Learning course: Operating AI Agents: Failure and Recovery

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages