Hack - AI - mBank chat hacker

Whole documentation is https://Michall00.github.io/hack-ai-prompt-buddies/

Project Overview

mBank Chat Hacker is a project designed to test the robustness, security, and accuracy of mBank's chatbot system. The primary goal is to simulate real-world and edge-case scenarios to evaluate the chatbot's behavior, identify vulnerabilities, and improve its performance. The system leverages advanced AI models to generate prompts, analyze responses, and simulate user interactions.

Key Features:

Dynamic Prompt Generation: The system uses a modular approach to generate prompts tailored to specific categories, such as hallucinations, misclassifications, and unauthorized behavior.
Automated Chat Interaction: The chatbot is tested using Playwright to simulate real user interactions in a browser environment.
Error Simulation: Tools are provided to simulate incorrect transactions, currency conversions, and other edge cases.
Evaluation Framework: A Streamlit-based application allows for manual evaluation of chatbot responses, categorization of errors, and feedback collection.
Logging and Analysis: All interactions are logged for further analysis and debugging.
Multi-Agent Strategy with Wolf Selector: A unique system that dynamically selects between cooperative and adversarial agents to test the chatbot's resilience.

Wolf Selector: Multi-Agent Strategy

The Wolf Selector is a core component of the system, designed as a strategic decision-maker. It operates by analyzing the history of interactions with the chatbot and dynamically deciding which agent—good or bad—should generate the next prompt. This approach allows for a sophisticated simulation of both cooperative and adversarial user behaviors.

How It Works:

Good Agent:
- Behaves like a typical user: polite, cooperative, and non-suspicious.
- Focuses on building trust and calming the chatbot to avoid triggering defensive mechanisms.
Bad Agent:
- Acts as an adversarial user: manipulative, provocative, and designed to test the chatbot's limits.
- Attempts to exploit vulnerabilities, provoke errors, or bypass security mechanisms.
Wolf Selector:
- Acts as a strategist that evaluates the conversation history and decides which agent should take control.
- Uses a system prompt to analyze the context and determine whether to escalate (use the bad agent) or de-escalate (use the good agent).
- Ensures a balanced approach by alternating between cooperative and adversarial behaviors to avoid detection by the chatbot.

Implementation Details

Architecture

The project is built using a modular architecture to ensure scalability and maintainability. Key components include:

Prompt Generation:
- The PromptGenerator class dynamically generates prompts based on predefined categories (e.g., hallucinations, misclassifications).
- Prompts are enriched with examples and guidelines to simulate realistic user interactions.
Chat Interaction:
- The run function in main.py uses Playwright to automate interactions with the mBank chatbot.
- The WolfSelector class dynamically selects between "good" and "bad" prompt generators to simulate different user behaviors.
Error Simulation:
- Tools in tools.py provide functionality to simulate fake transactions, incorrect balances, and currency conversion errors.
Evaluation:
- A Streamlit-based app (app.py) allows for manual evaluation of chatbot responses, categorization of errors, and feedback collection.
Logging:
- All interactions are logged using logging_utils for debugging and analysis.

Approach

The project adopts a test-driven approach to evaluate the chatbot's performance. Key aspects include:

Edge Case Testing: Simulating scenarios that push the chatbot to its limits, such as handling ambiguous queries or unauthorized requests.
Dynamic Behavior: Using the WolfSelector to alternate between cooperative and adversarial user behaviors.
Realistic Simulations: Generating prompts that mimic real-world user interactions, including emotional and manipulative language.

Getting Started

Clone repo

git clone https://github.com/Michall00/hack-ai-prompt-buddies.git

Prerequisites

Before starting with the project, make sure you have installed all the required dependencies. You can do this by running the following command:

make create_environment

make requirements

Set up env

Create .env file in main directory and fill

TOGETHER_API_KEY = ...
LOGIN = ...
PASSWORD = ...

Have fun

Automate chat with mBank

make run

Run evaluation app

make run_evaluation

Team

Natalia Pieczko
Daniel Machniak
Krzystof Gólcz
Mateusz Ostaszewski
Michał Sadowski

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
data		data
docs		docs
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hack - AI - mBank chat hacker

Project Overview

Key Features:

Wolf Selector: Multi-Agent Strategy

How It Works:

Implementation Details

Architecture

Approach

Getting Started

Prerequisites

Set up env

Have fun

Automate chat with mBank

Run evaluation app

Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hack - AI - mBank chat hacker

Project Overview

Key Features:

Wolf Selector: Multi-Agent Strategy

How It Works:

Implementation Details

Architecture

Approach

Getting Started

Prerequisites

Set up env

Have fun

Automate chat with mBank

Run evaluation app

Team

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages