Skip to content

From Stateless to Self-Aware: Building Real-Time Context-Aware AI That Actually Adapts #416

@Aman123lug

Description

@Aman123lug

Talk title

From Stateless to Self-Aware: Building Real-Time Context-Aware AI That Actually Adapts

Short talk description

Most AI agents forget everything the moment a conversation ends. Users repeat themselves. Corrections go nowhere. Trust erodes.

In this talk, Aman walks through a memory-first agent architecture he built and tested in production — a system where every user correction becomes a persistent, self-evolving skill that shapes every future response, automatically, within the same turn.

No fine-tuning. No manual prompt editing. Just an agent that genuinely gets better the more you use it.

Engineers building AI products, designing agent systems, or anyone frustrated by stateless LLMs will leave with a practical, domain-agnostic blueprint they can apply immediately.

Long talk description

Every major AI framework today is obsessed with the model — bigger context windows, smarter routing, better prompts. But ask one honest question: does your agent remember what a user told it last week? Yesterday? Five minutes ago in the same session?

Almost always, the answer is no. And that's not a model problem. It's an architecture problem.

This talk introduces a memory-first agent architecture — a system where the agent's ability to learn, retain, and self-correct is the primary design constraint, not an afterthought bolted on later. Aman walks through an 11-stage cognitive pipeline built in production: sense the input, update working memory, classify intent, run guardrails, register a skill, cluster patterns, recall history, generate a response, evaluate compliance, and log every decision for replay.

Each stage maps to something real. Working memory decays exponentially so stale context doesn't crowd out what matters now. Episodic memory compresses long conversations into retrievable snapshots. A skills registry holds everything the agent has ever learned — persistent rules that hot-inject into every future response, update automatically when a user changes their mind, and deactivate instantly when contradicted. No retraining. No deployment. Same turn.

Between learning and acting sits a guardrails engine — four checks that run on every piece of feedback before it can become a skill. Confidence gating, schema validation, contradiction detection, semantic deduplication. Without this layer, one misclassification corrupts the entire knowledge base silently.

On top of everything is an LLM-as-judge evaluation layer. After every response, a separate model checks: did the agent actually follow the rules it learned? It produces a compliance score and a trend line across sessions — measurable proof that the agent is improving, not just collecting data.

Aman also shares the production failure that shaped the architecture: context bloat causing silent agent degradation around turn 15, with no errors, no crashes — just quietly worsening responses. The fix came from cognitive science: exponential memory decay, relevance scoring, a hard signal cap, and decision trace snapshots so you can replay exactly what the agent saw at any given turn.

The talk closes with a live demo where all 11 pipeline stages execute on screen in real time — every classification, every guardrail decision, every skill learned, every cluster formed, every compliance score — nothing hidden.

Who should attend: engineers building AI agents or LLM-powered products, technical leads evaluating agent frameworks, anyone who has shipped a chatbot and watched it fail silently after turn 10.

Key takeaways:

A concrete memory architecture (working, episodic, semantic, procedural) with specific eviction and decay policies
How to route feedback to where the fix actually belongs — preventing the most common corruption pattern in adaptive systems
A guardrails pattern that makes misclassification detectable and recoverable
An eval strategy that produces measurable improvement over time, not just vibes

What format do you have in mind?

Talk (20-25 minutes + Q&A)

Talk outline / Agenda

The Problem (4 mins) — AI agents are stateless by default. Users repeat themselves, corrections vanish, trust erodes silently. This isn't a model problem — it's an architecture problem.

The Failure Story (3 mins) — Context bloat causing silent agent degradation at turn 15. No errors, no alerts — just quietly worsening responses. Why it happens and why most teams miss it until users are already gone.

Memory-First Architecture — Four Layers (10 mins) — Working memory with exponential decay, episodic compression, semantic clustering, and a self-evolving procedural skills registry. Each layer explained with its eviction policy, storage strategy, and the specific problem it solves.

Guardrails — Learning Safely Is Harder Than Learning Fast (5 mins) — Confidence gating, contradiction detection, deduplication. How one misclassification without guardrails silently corrupts an agent's knowledge base across every future session — and how to prevent it.

Closing the Loop — Measuring Improvement (3 mins) — LLM-as-judge eval pattern. Per-turn compliance scoring and trend lines. Turning "we think it's better" into a number you can show stakeholders.

Q&A (5 mins)

Key takeaways

A concrete 4-layer memory architecture (working, episodic, semantic, procedural) with specific decay rates, eviction policies, and storage strategies — not theory, a working implementation you can replicate.

The "where does the fix live?" heuristic — a single routing question that prevents the most dangerous failure in adaptive systems: user corrections, bug reports, and ambiguous feedback being treated identically and corrupting the skills registry.

A guardrails pattern for LLM classification output — confidence gating, contradiction detection, and deduplication as a validation layer between any classifier and any knowledge store, applicable to any agent system.

How to measure whether your agent is actually improving — an LLM-as-judge eval pattern that produces a compliance trend line per session, turning "we think it's getting better" into a number you can show stakeholders.

The context bloat failure story and the fix — why agent degradation after turn 10-15 happens silently, how exponential memory decay and relevance scoring solve it, and how decision trace snapshots make any future failure debuggable and replayable.

What domain would you say your talk falls under?

Artificial Intelligence & Deep Learning

Duration (including Q&A)

30 min

Prerequisites and preparation

What to know beforehand:

Comfortable reading Python — the live demo shows real code, not pseudocode
Basic understanding of how LLMs work (prompt → response) — no ML background needed
Familiarity with REST APIs and async backends is helpful but not required
No preparation needed — the demo runs live on screen. Attendees don't need to install anything or clone any repo during the talk.

Who will get the most out of this:

Engineers currently building or maintaining an LLM-powered product or chatbot
Technical leads evaluating agent frameworks or memory strategies
Anyone who has shipped an AI feature and watched it behave inconsistently after a few turns

Resources and references

https://github.com/Aman123lug/agent-feedback-pipeline

Link to slides/demos (if available)

No response

Twitter/X handle (optional)

https://x.com/amanlug

LinkedIn profile (optional)

https://www.linkedin.com/in/aman-kumar-5bb609228/

Profile picture URL (optional)

No response

Speaker bio

Aman is an AI Engineer at Ghaia.ai, an AI/ML freelancer, former GSOC mentee, MLFlow Ambassador.

Availability

23 may 2026

Accessibility & special requirements

No response

Speaker checklist

  • I have read and understood the PyDelhi guidelines for submitting proposals and giving talks
  • I have read and acknowledged the PyDelhi accessibility guidelines and will ensure my presentation materials (slides, videos, demos) follow these recommendations
  • I will make my talk accessible to all attendees and will proactively ask for any accommodations or special requirements I might need
  • I agree to share slides, code snippets, and other materials used during the talk with the community
  • I will follow PyDelhi's Code of Conduct and maintain a welcoming, inclusive environment throughout my participation
  • I understand that PyDelhi meetups are community-centric events focused on learning, knowledge sharing, and networking, and I will respect this ethos by not using this platform for self-promotion or hiring pitches during my presentation, unless explicitly invited to do so by means of a sponsorship or similar arrangement
  • If the talk is recorded by the PyDelhi team, I grant permission to release the video on PyDelhi's YouTube channel under the CC-BY-4.0 license, or a different license of my choosing if I am specifying it in my proposal or with the materials I share

Additional comments

No response

Metadata

Metadata

Assignees

Labels

needs reviewerReviewers: this proposal is in need of reviews! Please remove this label if you decide to review it.proposalWish to present at PyDelhi? This label gets added when the "Talk Proposal" option is chosen.review in progressThis proposal is currently under review

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions