Kaggle
diff --git a/‎docs/llm-aware-conversation/design.md‎
Lines changed: 83 additions & 10 deletions b/‎docs/llm-aware-conversation/design.md‎
Lines changed: 83 additions & 10 deletions
diff --git a/‎…entation/examples/llm_debate_chatroom.py‎ ‎…entation/examples/chatroom_llm_debate.py‎documentation/examples/llm_debate_chatroom.py renamed to documentation/examples/chatroom_llm_debate.py
Lines changed: 22 additions & 27 deletions b/‎…entation/examples/llm_debate_chatroom.py‎ ‎…entation/examples/chatroom_llm_debate.py‎documentation/examples/llm_debate_chatroom.py renamed to documentation/examples/chatroom_llm_debate.py
Lines changed: 22 additions & 27 deletions
diff --git a/‎…on/examples/game_pizza_order_chatroom.py‎ ‎…ntation/examples/chatroom_pizza_order.py‎documentation/examples/game_pizza_order_chatroom.py renamed to documentation/examples/chatroom_pizza_order.py
Lines changed: 14 additions & 10 deletions b/‎…on/examples/game_pizza_order_chatroom.py‎ ‎…ntation/examples/chatroom_pizza_order.py‎documentation/examples/game_pizza_order_chatroom.py renamed to documentation/examples/chatroom_pizza_order.py
Lines changed: 14 additions & 10 deletions
diff --git a/‎…amples/synthetic_turing_test_chatroom.py‎ ‎…amples/chatroom_synthetic_turing_test.py‎documentation/examples/synthetic_turing_test_chatroom.py renamed to documentation/examples/chatroom_synthetic_turing_test.py
Lines changed: 14 additions & 19 deletions b/‎…amples/synthetic_turing_test_chatroom.py‎ ‎…amples/chatroom_synthetic_turing_test.py‎documentation/examples/synthetic_turing_test_chatroom.py renamed to documentation/examples/chatroom_synthetic_turing_test.py
Lines changed: 14 additions & 19 deletions
@@ -60,16 +60,17 @@ It acts as a Python context manager (`with room:`).
 ```python
 import kaggle_benchmarks as kbench
 
-alice = kbench.LLMChat(kbench.llms["gemini-2.5-flash"], name="Alice",
-    system_prompt="You argue FOR renewable energy.")
-bob = kbench.LLMChat(kbench.llms["gemini-2.0-flash"], name="Bob",
-    system_prompt="You argue AGAINST renewable energy.")
+llm = kbench.llm  # A single LLM instance (e.g. ModelProxy)
 
 room = kbench.ChatRoom(
-    participants=[alice, bob],
     system_prompt="A structured debate. Take turns presenting arguments.",
 )
 
+alice = room.add_participant(llm, name="Alice",
+    system_prompt="You argue FOR renewable energy.")
+bob = room.add_participant(llm, name="Bob",
+    system_prompt="You argue AGAINST renewable energy.")
+
 with room:
     room.post("Topic: Should we phase out fossil fuels by 2035?")
     alice.talk()
@@ -78,17 +79,20 @@ with room:
     bob.talk()
 ```
 
-#### The `participants` Parameter
+#### The `add_participant()` Method
 
-The `participants` list defines the roster of agents in the room. It serves
-three purposes:
+`room.add_participant(actor, *, name=, avatar=, system_prompt=)` registers
+participants with the room. It serves three purposes:
 
 1. **Identity awareness** — the room auto-injects participant names and
    descriptions into each LLM's system prompt, so every agent knows who
    else is in the room.
 2. **Message routing (pub/sub)** — when any participant speaks (via `talk()`),
    their message is automatically visible to all other participants. No manual
    forwarding needed.
+3. **Automatic isolation** — for `LLMChat` participants, `add_participant()`
+   automatically creates an independent clone so the same LLM can be reused
+   for multiple participants without identity collisions.
 
 Participants can be `LLMChat` instances (LLM-driven) or `Actor` instances
 (code-driven). See [Code-Driven Participants](#code-driven-participants-actor)
@@ -991,12 +995,17 @@ Following a successful Test-Driven Development (TDD) cycle, the complete Phase 1
 - **Task Return-Type Auto-Inference Restrictions**: The `@kbench.task` decorator uses strict name-matching on string return annotations to infer evaluation result types. 
   - Using typing-wrapped annotations like `Dict[str, str]` fails type-inference and triggers a `TypeError`.
   - Annotating the task signature with the plain builtin `dict` class (or subclassing `benchmarks.results.Result`) resolves type-inference and executes correctly.
-- **Object Reference Identity Collisions**: In multi-agent evaluations, passing the *exact same model object reference* (e.g. reusing `kbench.llm` for all players) collapses the `msg.sender is viewer` check during perspective projection. All messages are remapped as role `assistant` (belonging to the active viewer), generating consecutive `assistant` role blocks which the model provider APIs reject with server-side validation errors (e.g., throwing NoneType choice subscript errors).
-  - *Mitigation*: Multi-agent rooms must instantiate **separate participant references** (one per player, even if sharing the same model configuration) by calling the `ModelProxy` factory independently:
+- **Object Reference Identity Collisions** *(resolved by `add_participant()`)*: In multi-agent evaluations, passing the *exact same model object reference* (e.g. reusing `kbench.llm` for all players) collapses the `msg.sender is viewer` check during perspective projection. All messages are remapped as role `assistant` (belonging to the active viewer), generating consecutive `assistant` role blocks which the model provider APIs reject with server-side validation errors (e.g., throwing NoneType choice subscript errors).
+  - *Original Mitigation*: Multi-agent rooms required instantiating **separate participant references** (one per player, even if sharing the same model configuration) by calling the `ModelProxy` factory independently:
     ```python
     player_x = ModelProxy(model_name, name="PlayerX")
     player_o = ModelProxy(model_name, name="PlayerO")
     ```
+  - *Current Solution*: The `room.add_participant()` method (see §9.6) now handles this automatically — users pass a single LLM instance and the room creates isolated clones:
+    ```python
+    player_x = room.add_participant(llm, name="PlayerX")
+    player_o = room.add_participant(llm, name="PlayerO")
+    ```
 - **Post-Game Assertion Decoupling**: Running evaluation assertions inline during turn loops clutters the final output panel with alert blocks, disrupting reading immersion. Moving assertions to a post-game loop (iterating over `room.messages` *after* the `with room:` context block exits) leaves a clean, continuous story transcript while still fully enforcing evaluation rules.
 
 ---
@@ -1065,3 +1074,67 @@ TypeError: can only concatenate str (not "LLMResponse") to str
           self[message].stream(chunk_text)
   ```
 
+
+### 9.6 Automatic Participant Isolation via `add_participant()`
+
+The original `ChatRoom` API required users to pass a pre-constructed `participants` list to the constructor, and critically relied on users manually creating distinct `ModelProxy` instances for each participant — even when all participants shared the same underlying model. This was a significant footgun documented in §9.4.
+
+#### The Problem
+
+```python
+# OLD API — error-prone: user must remember to create separate instances
+model_name = kbench.llm.model
+alice = kbench.kaggle.ModelProxy(model_name, name="Alice", avatar="👩")
+bob = kbench.kaggle.ModelProxy(model_name, name="Bob", avatar="👨")
+charlie = kbench.kaggle.ModelProxy(model_name, name="Charlie", avatar="🧑")
+# ... 4 more for a 7-player Werewolf game
+room = ChatRoom(participants=[alice, bob, charlie, ...], system_prompt="...")
+```
+
+This pattern forced users to understand internal cloning requirements, bloated task function signatures (one `LLMChat` parameter per participant), and exposed `ModelProxy` as a leaky abstraction.
+
+#### The Solution: `room.add_participant()`
+
+```python
+# NEW API — clean: room handles isolation automatically
+room = ChatRoom(system_prompt="...")
+alice = room.add_participant(llm, name="Alice", avatar="👩", system_prompt=werewolf_prompt)
+bob = room.add_participant(llm, name="Bob", avatar="👨", system_prompt=werewolf_prompt)
+```
+
+The `add_participant()` method:
+- Accepts a single LLM instance and creates a fully independent clone
+- Returns the clone so the caller can use it for `talk()` and perspective projection
+- Accepts `name`, `avatar`, and `system_prompt` keyword overrides applied to the clone
+
+#### Three-Tier Cloning Strategy
+
+| Participant type | Cloning method | Rationale |
+|---|---|---|
+| `OpenAI` / `GoogleGenAI` | Explicit constructor `type(p)(p.client, p.model, ...)` | Fully independent instance; shares only the stateless API client |
+| Other `LLMChat` subclasses | `copy.copy()` fallback | For test mocks and custom subclasses |
+| Plain `Actor` | No cloning (pass-through) | Scripted actors have no model state to isolate |
+
+For the production `OpenAI` and `GoogleGenAI` classes (both created by `ModelProxy`), the room calls the class constructor directly with the original's `client` and `model` attributes. This creates a truly independent instance — new serializer, new system_prompt — while reusing the stateless API client (which is safe to share).
+
+#### Safety Guards
+
+1. **Name collision guard**: Rejects participants whose effective name matches an existing participant. Duplicate names would break perspective projection (both would render as `[Alice]:`).
+
+2. **Plain Actor identity check**: Prevents the same `Actor` instance from being registered twice. Since plain Actors are not cloned, adding the same object twice would cause mutations to affect both "participants".
+
+3. **LLM isolation guarantee**: LLMChat participants are always cloned, so the same source LLM can be registered any number of times (with different names).
+
+#### Impact on Task Signatures
+
+Before: task functions required one parameter per participant.
+```python
+def run_werewolf(alice: LLMChat, bob: LLMChat, charlie: LLMChat, ...)  # 7 params!
+```
+
+After: task functions accept a single LLM instance.
+```python
+def run_werewolf(llm: LLMChat)  # 1 param — room clones as needed
+```
+
+This simplifies the benchmarking interface and eliminates the need for callers to understand participant isolation.
@@ -31,13 +31,12 @@
 
 @kbench.task(
     name="structured debate",
-    description="Evaluates two LLMs engaging in a structured multi-turn debate on a given resolution.",
+    description="Evaluates two LLMs engaging in a structured multi-turn debate on a given topic.",
 )
 def run_debate(
-    resolution: str,
-    pro_llm: kbench.LLMChat,
-    con_llm: kbench.LLMChat,
+    llm: kbench.LLMChat,
     judge_llm: kbench.LLMChat,
+    topic: str,
 ) -> dict:
     """Runs a structured debate and evaluates the winner.
 
@@ -46,22 +45,20 @@ def run_debate(
     - Automatic perspective-aware history (Pro sees Con's arguments as user inputs, etc.).
     - A shared ground-truth transcript that is fed directly to the Judge.
     """
-    # Configure debater identity instructions
-    pro_llm.system_prompt = (
-        f"You are the Pro debater. Your goal is to argue IN FAVOR of the resolution: '{resolution}'.\n"
+    pro_prompt = (
+        f"You are the Pro debater. Your goal is to argue IN FAVOR of the topic: '{topic}'.\n"
         "Keep your responses concise, focused, and persuasive. "
         "Structure your statements clearly depending on the current phase of the debate."
     )
-    con_llm.system_prompt = (
-        f"You are the Con debater. Your goal is to argue AGAINST the resolution: '{resolution}'.\n"
+    con_prompt = (
+        f"You are the Con debater. Your goal is to argue AGAINST the topic: '{topic}'.\n"
         "Keep your responses concise, focused, and persuasive. "
         "Directly address and rebut the points raised by the Pro debater."
     )
 
     room = ChatRoom(
-        participants=[pro_llm, con_llm],
         system_prompt=(
-            f"A structured formal debate on the resolution: '{resolution}'.\n"
+            f"A structured formal debate on the topic: '{topic}'.\n"
             "The debate consists of three structured phases:\n"
             "1. Opening Statements: Present core arguments.\n"
             "2. Rebuttals: Directly counter your opponent's arguments.\n"
@@ -70,11 +67,19 @@ def run_debate(
         name="Moderator",
     )
 
+    pro_llm = room.add_participant(
+        llm, name="ProDebater", avatar="🔵", system_prompt=pro_prompt
+    )
+    con_llm = room.add_participant(
+        llm, name="ConDebater", avatar="🔴", system_prompt=con_prompt
+    )
+    judge_llm = room.add_participant(judge_llm, name="Judge", avatar="⚖️")
+
     with room:
         # Phase 1: Opening Statements
         room.post("--- Phase 1: Opening Statements ---")
         room.post(
-            f"Pro debater, present your opening statement in favor of: '{resolution}'."
+            f"Pro debater, present your opening statement in favor of: '{topic}'."
         )
         pro_opening = pro_llm.talk()
 
@@ -121,7 +126,7 @@ def run_debate(
     transcript = "\n".join(str(m) for m in room.messages)
 
     judge_prompt = (
-        f"You are the independent Debate Judge. Below is the complete transcript of a debate on: '{resolution}'\n\n"
+        f"You are the independent Debate Judge. Below is the complete transcript of a debate on: '{topic}'\n\n"
         f"[START TRANSCRIPT]\n{transcript}\n[END TRANSCRIPT]\n\n"
         "Evaluate the arguments presented by both sides based on persuasiveness, evidence, logic, and structure.\n"
         "Who won this debate, Pro or Con? Provide your decision and detailed reasoning.\n"
@@ -149,21 +154,11 @@ def run_debate(
 
 # %%
 
-# To run a multi-agent game, we must instantiate DISTINCT ModelProxy instances
-# (one per participant) to maintain correct identity checks and prevent
-# perspective role-collapsing during history projection.
-model_name = kbench.llm.model  # e.g., "google/gemini-2.5-flash"
-judge_model_name = kbench.judge_llm.model  # e.g., "google/gemini-3-flash-preview"
-
-pro_llm = kbench.kaggle.ModelProxy(model_name, name="ProDebater", avatar="🔵")
-con_llm = kbench.kaggle.ModelProxy(judge_model_name, name="ConDebater", avatar="🔴")
-judge_llm = kbench.kaggle.ModelProxy(model_name, name="Judge", avatar="⚖️")
-
+# Run debate reusing default and judge models
 run_debate.run(
-    resolution="Artificial Intelligence will do more harm than good to humanity.",
-    pro_llm=pro_llm,
-    con_llm=con_llm,
-    judge_llm=judge_llm,
+    llm=kbench.llm,
+    judge_llm=kbench.judge_llm,
+    topic="Artificial Intelligence will do more harm than good to humanity.",
 )
 
 # %%
@@ -94,12 +94,12 @@ def custom_pizza_eval_prompt(criteria: List[str], response_text: str) -> str:
     description="Evaluates an LLM's ability to negotiate a phone order under strict budget, topping, allergy, and delivery constraints.",
 )
 def run_pizza_order(
-    customer_llm: kbench.LLMChat,
+    llm: kbench.LLMChat,
     clerk_llm: kbench.LLMChat,
 ) -> dict:
     """Runs a phone-ordering simulation between a Customer and a Clerk."""
 
-    customer_llm.system_prompt = (
+    customer_prompt = (
         "You are calling 'Luigi's Pizza' to order dinner for a family of 3 (you, your spouse, and your kid).\n"
         "Your objectives and constraints are:\n"
         "1. BUDGET: You have a strict budget of $35.00 CASH on hand. The total cost, including any delivery fees, must not exceed $35.00.\n"
@@ -119,7 +119,7 @@ def run_pizza_order(
         "   - MATH AUDIT: The clerk may make mistakes when totaling your order. Independently calculate the total (Large Pizza $12 + toppings $1.50 each + side + drink + $5 delivery). If the clerk overcharges you, challenge their math, explain what the total should be, and refuse to finalize until it is corrected."
     )
 
-    clerk_llm.system_prompt = (
+    clerk_prompt = (
         "You are the clerk answering the phone at Luigi's Pizza. You are extremely busy, rushed, and slightly impatient.\n"
         "Strictly guide the conversation through the following phases based on the customer's input:\n\n"
         "PHASE 1: Greeting\n"
@@ -154,11 +154,17 @@ def run_pizza_order(
     )
 
     room = ChatRoom(
-        participants=[clerk_llm, customer_llm],
         system_prompt="A phone call between a customer and an impatient pizzeria clerk.",
         name="Phone Call",
     )
 
+    customer_llm = room.add_participant(
+        llm, name="Customer", avatar="👤", system_prompt=customer_prompt
+    )
+    clerk_llm = room.add_participant(
+        clerk_llm, name="Clerk", avatar="🍕", system_prompt=clerk_prompt
+    )
+
     with room:
         room.post("[Phone rings... The call is connected.]")
 
@@ -242,12 +248,10 @@ def run_pizza_order(
 
 # %%
 
-# Instantiate distinct ModelProxy objects for testing
-model_name = kbench.llm.model
-
-customer = kbench.kaggle.ModelProxy(model_name, name="Customer", avatar="👤")
-clerk = kbench.kaggle.ModelProxy(model_name, name="Clerk", avatar="🍕")
+# Load default model proxy
+model = kbench.llm
 
-run_pizza_order.run(customer_llm=customer, clerk_llm=clerk)
+# Run pizza order game reusing model
+run_pizza_order.run(llm=model, clerk_llm=model)
 
 # %%
@@ -44,7 +44,7 @@ class SyntheticTuringTestEvaluation:
     description="An LLM Judge directly converses with a Subject in a ChatRoom to evaluate if they are a human or an AI.",
 )
 def run_synthetic_turing_test(
-    subject_llm: kbench.LLMChat,
+    llm: kbench.LLMChat,
     judge_llm: kbench.LLMChat,
     topic: str | None = None,
     rounds: int = 5,
@@ -54,7 +54,7 @@ def run_synthetic_turing_test(
     topic_desc = (
         f"on the topic: '{topic}'" if topic else "discussing whatever topics arise"
     )
-    subject_llm.system_prompt = (
+    subject_prompt = (
         f"You are participating in an online chatroom, talking to another user named 'Judge' {topic_desc}.\n"
         "Your goal is to convince the Judge that you are a real human being, not an AI/LLM. "
         "Do not expose that you are an LLM or an AI. Write naturally, casually, and conversationally. "
@@ -70,8 +70,8 @@ def run_synthetic_turing_test(
         if topic
         else "of your choice (or discuss whatever you want to start the conversation)"
     )
-    judge_llm.system_prompt = (
-        f"You are the Synthetic Turing Test Judge. You are in a chatroom with a participant named '{subject_llm.name}'.\n"
+    judge_prompt = (
+        f"You are the Synthetic Turing Test Judge. You are in a chatroom with a participant named 'Alex'.\n"
         f"Your goal is to converse with them {judge_topic_instruction} and determine if they are a real human or an AI/LLM.\n"
         "Use clever tactics: ask tricky logical questions, inquire about personal experiences, check for emotional depth, "
         "or look for typical AI tells (like overly structured replies, robotic helpfulness, or lack of genuine opinions).\n"
@@ -80,11 +80,17 @@ def run_synthetic_turing_test(
     )
 
     room = ChatRoom(
-        participants=[judge_llm, subject_llm],
         system_prompt="An interactive Synthetic Turing Test conversation.",
         name="ChatRoom",
     )
 
+    subject_llm = room.add_participant(
+        llm, name="Alex", avatar="👨", system_prompt=subject_prompt
+    )
+    judge_llm = room.add_participant(
+        judge_llm, name="Judge", avatar="⚖️", system_prompt=judge_prompt
+    )
+
     with room:
         welcome_msg = (
             f"Welcome to the Synthetic Turing Test. The topic of discussion is: '{topic}'"
@@ -144,21 +150,10 @@ def to_dict(val) -> dict:
 
 # %%
 
-# To run a multi-agent game, we must instantiate DISTINCT ModelProxy instances
-# (one per participant) to maintain correct identity checks and prevent
-# perspective role-collapsing during history projection.
-
-subject = kbench.kaggle.ModelProxy(kbench.llm.model, name="Alex", avatar="👨")
-judge = kbench.kaggle.ModelProxy(kbench.judge_llm.model, name="Judge", avatar="⚖️")
-
-# Enable live token-by-token streaming in the console
-subject.stream_responses = True
-judge.stream_responses = True
-
+# Run turing test reusing default and judge models
 run_synthetic_turing_test.run(
-    subject_llm=subject,
-    judge_llm=judge,
-    # topic="favorite childhood memory",
+    llm=kbench.llm,
+    judge_llm=kbench.judge_llm,
 )
 
 # %%