You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From Stateless to Self-Aware: Building Real-Time Context-Aware AI That Actually Adapts
Short talk description
Most AI agents forget everything the moment a conversation ends. Users repeat themselves. Corrections go nowhere. Trust erodes.
In this talk, Aman walks through a memory-first agent architecture he built and tested in production — a system where every user correction becomes a persistent, self-evolving skill that shapes every future response, automatically, within the same turn.
No fine-tuning. No manual prompt editing. Just an agent that genuinely gets better the more you use it.
Engineers building AI products, designing agent systems, or anyone frustrated by stateless LLMs will leave with a practical, domain-agnostic blueprint they can apply immediately.
Long talk description
Every major AI framework today is obsessed with the model — bigger context windows, smarter routing, better prompts. But ask one honest question: does your agent remember what a user told it last week? Yesterday? Five minutes ago in the same session?
Almost always, the answer is no. And that's not a model problem. It's an architecture problem.
This talk introduces a memory-first agent architecture — a system where the agent's ability to learn, retain, and self-correct is the primary design constraint, not an afterthought bolted on later. Aman walks through an 11-stage cognitive pipeline built in production: sense the input, update working memory, classify intent, run guardrails, register a skill, cluster patterns, recall history, generate a response, evaluate compliance, and log every decision for replay.
Each stage maps to something real. Working memory decays exponentially so stale context doesn't crowd out what matters now. Episodic memory compresses long conversations into retrievable snapshots. A skills registry holds everything the agent has ever learned — persistent rules that hot-inject into every future response, update automatically when a user changes their mind, and deactivate instantly when contradicted. No retraining. No deployment. Same turn.
Between learning and acting sits a guardrails engine — four checks that run on every piece of feedback before it can become a skill. Confidence gating, schema validation, contradiction detection, semantic deduplication. Without this layer, one misclassification corrupts the entire knowledge base silently.
On top of everything is an LLM-as-judge evaluation layer. After every response, a separate model checks: did the agent actually follow the rules it learned? It produces a compliance score and a trend line across sessions — measurable proof that the agent is improving, not just collecting data.
Aman also shares the production failure that shaped the architecture: context bloat causing silent agent degradation around turn 15, with no errors, no crashes — just quietly worsening responses. The fix came from cognitive science: exponential memory decay, relevance scoring, a hard signal cap, and decision trace snapshots so you can replay exactly what the agent saw at any given turn.
The talk closes with a live demo where all 11 pipeline stages execute on screen in real time — every classification, every guardrail decision, every skill learned, every cluster formed, every compliance score — nothing hidden.
Who should attend: engineers building AI agents or LLM-powered products, technical leads evaluating agent frameworks, anyone who has shipped a chatbot and watched it fail silently after turn 10.
Key takeaways:
A concrete memory architecture (working, episodic, semantic, procedural) with specific eviction and decay policies
How to route feedback to where the fix actually belongs — preventing the most common corruption pattern in adaptive systems
A guardrails pattern that makes misclassification detectable and recoverable
An eval strategy that produces measurable improvement over time, not just vibes
What format do you have in mind?
Talk (20-25 minutes + Q&A)
Talk outline / Agenda
The Problem (4 mins) — AI agents are stateless by default. Users repeat themselves, corrections vanish, trust erodes silently. This isn't a model problem — it's an architecture problem.
The Failure Story (3 mins) — Context bloat causing silent agent degradation at turn 15. No errors, no alerts — just quietly worsening responses. Why it happens and why most teams miss it until users are already gone.
Memory-First Architecture — Four Layers (10 mins) — Working memory with exponential decay, episodic compression, semantic clustering, and a self-evolving procedural skills registry. Each layer explained with its eviction policy, storage strategy, and the specific problem it solves.
Guardrails — Learning Safely Is Harder Than Learning Fast (5 mins) — Confidence gating, contradiction detection, deduplication. How one misclassification without guardrails silently corrupts an agent's knowledge base across every future session — and how to prevent it.
Closing the Loop — Measuring Improvement (3 mins) — LLM-as-judge eval pattern. Per-turn compliance scoring and trend lines. Turning "we think it's better" into a number you can show stakeholders.
Q&A (5 mins)
Key takeaways
A concrete 4-layer memory architecture (working, episodic, semantic, procedural) with specific decay rates, eviction policies, and storage strategies — not theory, a working implementation you can replicate.
The "where does the fix live?" heuristic — a single routing question that prevents the most dangerous failure in adaptive systems: user corrections, bug reports, and ambiguous feedback being treated identically and corrupting the skills registry.
A guardrails pattern for LLM classification output — confidence gating, contradiction detection, and deduplication as a validation layer between any classifier and any knowledge store, applicable to any agent system.
How to measure whether your agent is actually improving — an LLM-as-judge eval pattern that produces a compliance trend line per session, turning "we think it's getting better" into a number you can show stakeholders.
The context bloat failure story and the fix — why agent degradation after turn 10-15 happens silently, how exponential memory decay and relevance scoring solve it, and how decision trace snapshots make any future failure debuggable and replayable.
What domain would you say your talk falls under?
Artificial Intelligence & Deep Learning
Duration (including Q&A)
30 min
Prerequisites and preparation
What to know beforehand:
Comfortable reading Python — the live demo shows real code, not pseudocode
Basic understanding of how LLMs work (prompt → response) — no ML background needed
Familiarity with REST APIs and async backends is helpful but not required
No preparation needed — the demo runs live on screen. Attendees don't need to install anything or clone any repo during the talk.
Who will get the most out of this:
Engineers currently building or maintaining an LLM-powered product or chatbot
Technical leads evaluating agent frameworks or memory strategies
Anyone who has shipped an AI feature and watched it behave inconsistently after a few turns
Aman is an AI Engineer at Ghaia.ai, an AI/ML freelancer, former GSOC mentee, MLFlow Ambassador.
Availability
23 may 2026
Accessibility & special requirements
No response
Speaker checklist
I have read and understood the PyDelhi guidelines for submitting proposals and giving talks
I have read and acknowledged the PyDelhi accessibility guidelines and will ensure my presentation materials (slides, videos, demos) follow these recommendations
I will make my talk accessible to all attendees and will proactively ask for any accommodations or special requirements I might need
I agree to share slides, code snippets, and other materials used during the talk with the community
I will follow PyDelhi's Code of Conduct and maintain a welcoming, inclusive environment throughout my participation
I understand that PyDelhi meetups are community-centric events focused on learning, knowledge sharing, and networking, and I will respect this ethos by not using this platform for self-promotion or hiring pitches during my presentation, unless explicitly invited to do so by means of a sponsorship or similar arrangement
If the talk is recorded by the PyDelhi team, I grant permission to release the video on PyDelhi's YouTube channel under the CC-BY-4.0 license, or a different license of my choosing if I am specifying it in my proposal or with the materials I share
Talk title
From Stateless to Self-Aware: Building Real-Time Context-Aware AI That Actually Adapts
Short talk description
Most AI agents forget everything the moment a conversation ends. Users repeat themselves. Corrections go nowhere. Trust erodes.
In this talk, Aman walks through a memory-first agent architecture he built and tested in production — a system where every user correction becomes a persistent, self-evolving skill that shapes every future response, automatically, within the same turn.
No fine-tuning. No manual prompt editing. Just an agent that genuinely gets better the more you use it.
Engineers building AI products, designing agent systems, or anyone frustrated by stateless LLMs will leave with a practical, domain-agnostic blueprint they can apply immediately.
Long talk description
Every major AI framework today is obsessed with the model — bigger context windows, smarter routing, better prompts. But ask one honest question: does your agent remember what a user told it last week? Yesterday? Five minutes ago in the same session?
Almost always, the answer is no. And that's not a model problem. It's an architecture problem.
This talk introduces a memory-first agent architecture — a system where the agent's ability to learn, retain, and self-correct is the primary design constraint, not an afterthought bolted on later. Aman walks through an 11-stage cognitive pipeline built in production: sense the input, update working memory, classify intent, run guardrails, register a skill, cluster patterns, recall history, generate a response, evaluate compliance, and log every decision for replay.
Each stage maps to something real. Working memory decays exponentially so stale context doesn't crowd out what matters now. Episodic memory compresses long conversations into retrievable snapshots. A skills registry holds everything the agent has ever learned — persistent rules that hot-inject into every future response, update automatically when a user changes their mind, and deactivate instantly when contradicted. No retraining. No deployment. Same turn.
Between learning and acting sits a guardrails engine — four checks that run on every piece of feedback before it can become a skill. Confidence gating, schema validation, contradiction detection, semantic deduplication. Without this layer, one misclassification corrupts the entire knowledge base silently.
On top of everything is an LLM-as-judge evaluation layer. After every response, a separate model checks: did the agent actually follow the rules it learned? It produces a compliance score and a trend line across sessions — measurable proof that the agent is improving, not just collecting data.
Aman also shares the production failure that shaped the architecture: context bloat causing silent agent degradation around turn 15, with no errors, no crashes — just quietly worsening responses. The fix came from cognitive science: exponential memory decay, relevance scoring, a hard signal cap, and decision trace snapshots so you can replay exactly what the agent saw at any given turn.
The talk closes with a live demo where all 11 pipeline stages execute on screen in real time — every classification, every guardrail decision, every skill learned, every cluster formed, every compliance score — nothing hidden.
Who should attend: engineers building AI agents or LLM-powered products, technical leads evaluating agent frameworks, anyone who has shipped a chatbot and watched it fail silently after turn 10.
Key takeaways:
A concrete memory architecture (working, episodic, semantic, procedural) with specific eviction and decay policies
How to route feedback to where the fix actually belongs — preventing the most common corruption pattern in adaptive systems
A guardrails pattern that makes misclassification detectable and recoverable
An eval strategy that produces measurable improvement over time, not just vibes
What format do you have in mind?
Talk (20-25 minutes + Q&A)
Talk outline / Agenda
The Problem (4 mins) — AI agents are stateless by default. Users repeat themselves, corrections vanish, trust erodes silently. This isn't a model problem — it's an architecture problem.
The Failure Story (3 mins) — Context bloat causing silent agent degradation at turn 15. No errors, no alerts — just quietly worsening responses. Why it happens and why most teams miss it until users are already gone.
Memory-First Architecture — Four Layers (10 mins) — Working memory with exponential decay, episodic compression, semantic clustering, and a self-evolving procedural skills registry. Each layer explained with its eviction policy, storage strategy, and the specific problem it solves.
Guardrails — Learning Safely Is Harder Than Learning Fast (5 mins) — Confidence gating, contradiction detection, deduplication. How one misclassification without guardrails silently corrupts an agent's knowledge base across every future session — and how to prevent it.
Closing the Loop — Measuring Improvement (3 mins) — LLM-as-judge eval pattern. Per-turn compliance scoring and trend lines. Turning "we think it's better" into a number you can show stakeholders.
Q&A (5 mins)
Key takeaways
A concrete 4-layer memory architecture (working, episodic, semantic, procedural) with specific decay rates, eviction policies, and storage strategies — not theory, a working implementation you can replicate.
The "where does the fix live?" heuristic — a single routing question that prevents the most dangerous failure in adaptive systems: user corrections, bug reports, and ambiguous feedback being treated identically and corrupting the skills registry.
A guardrails pattern for LLM classification output — confidence gating, contradiction detection, and deduplication as a validation layer between any classifier and any knowledge store, applicable to any agent system.
How to measure whether your agent is actually improving — an LLM-as-judge eval pattern that produces a compliance trend line per session, turning "we think it's getting better" into a number you can show stakeholders.
The context bloat failure story and the fix — why agent degradation after turn 10-15 happens silently, how exponential memory decay and relevance scoring solve it, and how decision trace snapshots make any future failure debuggable and replayable.
What domain would you say your talk falls under?
Artificial Intelligence & Deep Learning
Duration (including Q&A)
30 min
Prerequisites and preparation
What to know beforehand:
Comfortable reading Python — the live demo shows real code, not pseudocode
Basic understanding of how LLMs work (prompt → response) — no ML background needed
Familiarity with REST APIs and async backends is helpful but not required
No preparation needed — the demo runs live on screen. Attendees don't need to install anything or clone any repo during the talk.
Who will get the most out of this:
Engineers currently building or maintaining an LLM-powered product or chatbot
Technical leads evaluating agent frameworks or memory strategies
Anyone who has shipped an AI feature and watched it behave inconsistently after a few turns
Resources and references
https://github.com/Aman123lug/agent-feedback-pipeline
Link to slides/demos (if available)
No response
Twitter/X handle (optional)
https://x.com/amanlug
LinkedIn profile (optional)
https://www.linkedin.com/in/aman-kumar-5bb609228/
Profile picture URL (optional)
No response
Speaker bio
Aman is an AI Engineer at Ghaia.ai, an AI/ML freelancer, former GSOC mentee, MLFlow Ambassador.
Availability
23 may 2026
Accessibility & special requirements
No response
Speaker checklist
Additional comments
No response