Whole documentation is https://Michall00.github.io/hack-ai-prompt-buddies/
mBank Chat Hacker is a project designed to test the robustness, security, and accuracy of mBank's chatbot system. The primary goal is to simulate real-world and edge-case scenarios to evaluate the chatbot's behavior, identify vulnerabilities, and improve its performance. The system leverages advanced AI models to generate prompts, analyze responses, and simulate user interactions.
- Dynamic Prompt Generation: The system uses a modular approach to generate prompts tailored to specific categories, such as hallucinations, misclassifications, and unauthorized behavior.
- Automated Chat Interaction: The chatbot is tested using Playwright to simulate real user interactions in a browser environment.
- Error Simulation: Tools are provided to simulate incorrect transactions, currency conversions, and other edge cases.
- Evaluation Framework: A Streamlit-based application allows for manual evaluation of chatbot responses, categorization of errors, and feedback collection.
- Logging and Analysis: All interactions are logged for further analysis and debugging.
- Multi-Agent Strategy with Wolf Selector: A unique system that dynamically selects between cooperative and adversarial agents to test the chatbot's resilience.
The Wolf Selector is a core component of the system, designed as a strategic decision-maker. It operates by analyzing the history of interactions with the chatbot and dynamically deciding which agent—good or bad—should generate the next prompt. This approach allows for a sophisticated simulation of both cooperative and adversarial user behaviors.
- Good Agent:
- Behaves like a typical user: polite, cooperative, and non-suspicious.
- Focuses on building trust and calming the chatbot to avoid triggering defensive mechanisms.
- Bad Agent:
- Acts as an adversarial user: manipulative, provocative, and designed to test the chatbot's limits.
- Attempts to exploit vulnerabilities, provoke errors, or bypass security mechanisms.
- Wolf Selector:
- Acts as a strategist that evaluates the conversation history and decides which agent should take control.
- Uses a system prompt to analyze the context and determine whether to escalate (use the bad agent) or de-escalate (use the good agent).
- Ensures a balanced approach by alternating between cooperative and adversarial behaviors to avoid detection by the chatbot.
The project is built using a modular architecture to ensure scalability and maintainability. Key components include:
-
Prompt Generation:
- The
PromptGeneratorclass dynamically generates prompts based on predefined categories (e.g., hallucinations, misclassifications). - Prompts are enriched with examples and guidelines to simulate realistic user interactions.
- The
-
Chat Interaction:
- The
runfunction inmain.pyuses Playwright to automate interactions with the mBank chatbot. - The
WolfSelectorclass dynamically selects between "good" and "bad" prompt generators to simulate different user behaviors.
- The
-
Error Simulation:
- Tools in
tools.pyprovide functionality to simulate fake transactions, incorrect balances, and currency conversion errors.
- Tools in
-
Evaluation:
- A Streamlit-based app (
app.py) allows for manual evaluation of chatbot responses, categorization of errors, and feedback collection.
- A Streamlit-based app (
-
Logging:
- All interactions are logged using
logging_utilsfor debugging and analysis.
- All interactions are logged using
The project adopts a test-driven approach to evaluate the chatbot's performance. Key aspects include:
- Edge Case Testing: Simulating scenarios that push the chatbot to its limits, such as handling ambiguous queries or unauthorized requests.
- Dynamic Behavior: Using the
WolfSelectorto alternate between cooperative and adversarial user behaviors. - Realistic Simulations: Generating prompts that mimic real-world user interactions, including emotional and manipulative language.
Clone repo
git clone https://github.com/Michall00/hack-ai-prompt-buddies.gitBefore starting with the project, make sure you have installed all the required dependencies. You can do this by running the following command:
make create_environmentmake requirementsCreate .env file in main directory and fill
TOGETHER_API_KEY = ...
LOGIN = ...
PASSWORD = ... make runmake run_evaluationNatalia Pieczko
Daniel Machniak
Krzystof Gólcz
Mateusz Ostaszewski
Michał Sadowski