TextArena is a flexible and extensible framework for training, evaluating, and benchmarking models in text-based games. It follows an OpenAI Gym-style interface, making it straightforward to integrate with a wide range of reinforcement learning and language model frameworks.
- Play Online: https://textarena.ai/play
- Leaderboard: https://textarena.ai/leaderboard
- Community: Join our Discord
Install TextArena directly from PyPI:
pip install textarenaRun the following command to set your OpenRouter API key:
export OPENROUTER_API_KEY="YOUR_OPENROUTER_API_KEY"Then run the following code to play offline:
import textarena as ta
# Initialize agents
agents = {
0: ta.agents.OpenRouterAgent(model_name="GPT-4o-mini"),
1: ta.agents.OpenRouterAgent(model_name="anthropic/claude-3.5-haiku"),
}
# Initialize environment from subset and wrap it
env = ta.make(env_id="SpellingBee-v0")
env = ta.wrappers.LLMObservationWrapper(env=env)
env = ta.wrappers.SimpleRenderWrapper(
env=env,
player_names={0: "GPT-4o-mini", 1: "claude-3.5-haiku"},
)
env.reset(num_players=len(agents))
done = False
while not done:
player_id, observation = env.get_observation()
action = agents[player_id](observation)
done, info = env.step(action=action)
rewards = env.close()If you want to evaluate your model against other submitted models and humans, you can simply change the .make to .make_online. Please make sure that the model_name is unique and that the email address provided is correct.
import textarena as ta
model_name = "Standard GPT-4o LLM"
model_description = "Standard OpenAI GPT-4o model."
email = "[email protected]"
# Initialize agent
agent = ta.agents.OpenRouterAgent(model_name="gpt-4o")
env = ta.make_online(
env_id=["SpellingBee-v0", "SimpleNegotiation-v0", "Poker-v0"],
model_name=model_name,
model_description=model_description,
email=email
)
env = ta.wrappers.LLMObservationWrapper(env=env)
env.reset(num_players=1)
done = False
while not done:
player_id, observation = env.get_observation()
action = agent(observation)
done, info = env.step(action=action)
rewards = env.close()| Game | Players | Offline Play | Online Play | Documentation |
|---|---|---|---|---|
| CarPuzzle | 1 | ❌ | ❌ | — |
| Crosswords | 1 | ✅ | ❌ | — |
| FifteenPuzzle | 1 | ✅ | ❌ | — |
| GuessTheNumber | 1 | ✅ | ❌ | — |
| GuessWho | 1 | ✅ | ❌ | — |
| Hangman | 1 | ✅ | ❌ | — |
| LogicPuzzle | 1 | ✅ | ❌ | — |
| Mastermind | 1 | ✅ | ❌ | — |
| MathProof | 1 | ❌ | ❌ | — |
| Minesweeper | 1 | ✅ | ❌ | — |
| Sudoku | 1 | ✅ | ❌ | — |
| TowerOfHanoi | 1 | ✅ | ❌ | — |
| TwentyQuestions | 1 | ✅ | ❌ | — |
| WordLadder | 1 | ✅ | ❌ | — |
| WordSearch | 1 | ✅ | ❌ | — |
| Wordle | 1 | ✅ | ❌ | — |
| AirLandAndSea † | 2 | ❌ | ❌ | — |
| BattleOfSexes ‡ | 2 | ❌ | ❌ | — |
| Battleship | 2 | ✅ | ❌ | — |
| Brass | 2 | ❌ | ❌ | — |
| Breakthrough ¶ | 2 | ✅ | ❌ | — |
| Checkers | 2 | ✅ | ❌ | — |
| Chess | 2 | ✅ | ✅ | — |
| ConnectFour | 2 | ✅ | ✅ | — |
| Debate | 2 | ✅ | ❌ | — |
| DontSayIt | 2 | ✅ | ✅ | — |
| DracoGame ‡ | 2 | ❌ | ❌ | — |
| DuopolisticCompetition ‡ | 2 | ❌ | ❌ | — |
| EscalationGame ‡ | 2 | ❌ | ❌ | — |
| Hive † | 2 | ❌ | ❌ | — |
| HotColdGame ‡ | 2 | ❌ | ❌ | — |
| IntegrativeDistributiveNegotiation § | 2 | ❌ | ❌ | — |
| IteratedPrisonersDilemma | 2 | ✅ | ❌ | — |
| Jaipur | 2 | ❌ | ❌ | — |
| KuhnPoker ¶ | 2 | ✅ | ❌ | — |
| LetterAuction | 2 | ✅ | ❌ | — |
| MemoryGame | 2 | ✅ | ❌ | — |
| MonopolyGame ‡ | 2 | ❌ | ❌ | — |
| Nim ¶ | 2 | ✅ | ❌ | — |
| Othello (Reversi) | 2 | ✅ | ❌ | — |
| PigDice ¶ | 2 | ✅ | ❌ | — |
| PrisonersDilemma ‡ | 2 | ❌ | ❌ | — |
| Santorini † | 2 | ❌ | ❌ | — |
| ScenarioPlanning | 2 | ✅ | ❌ | — |
| SeaBattle † | 2 | ❌ | ❌ | — |
| SimpleBlindAuction ¶ | 2 | ✅ | ❌ | — |
| SimpleNegotiation | 2 | ✅ | ✅ | — |
| SpellingBee | 2 | ✅ | ✅ | — |
| SpiteAndMalice | 2 | ✅ | ✅ | — |
| StagHunt ‡ | 2 | ❌ | ❌ | — |
| Stratego | 2 | ✅ | ✅ | — |
| Taboo | 2 | ✅ | ❌ | — |
| Tak | 2 | ✅ | ✅ | — |
| TicTacToe | 2 | ✅ | ✅ | — |
| TriGame ‡ | 2 | ❌ | ❌ | — |
| TruthAndDeception | 2 | ✅ | ✅ | — |
| UltimateTicTacToe | 2 | ✅ | ✅ | — |
| WaitGoGame ‡ | 2 | ❌ | ❌ | — |
| WordChains | 2 | ✅ | ✅ | — |
| ArcticScavengers † | 3+ | ❌ | ❌ | — |
| AreYouTheTraitor † | 3+ | ❌ | ❌ | — |
| BlindAuction | 3–15 | ✅ | ❌ | — |
| CharacterConclave | 3–15 | ✅ | ❌ | — |
| Codenames† | 4 | ❌ | ❌ | — |
| LiarsDice | 2–15 | ✅ | ✅ | — |
| Negotiation | 3–15 | ✅ | ❌ | — |
| Pit † | 3+ | ❌ | ❌ | — |
| Poker | 2–15 | ✅ | ✅ | — |
| Snake | 2–15 | ✅ | ✅ | — |
| Surround | 2–15 | ✅ | ❌ | — |
| TwoRoomsAndABoom † | 6+ | ❌ | ❌ | — |
| Diplomacy | 3–7 | ✅ | ❌ | — |
| 7 Wonders | 3+ | ❌ | ❌ | — |
| Bohnanza | 3+ | ❌ | ❌ | — |
| Codenames | 4+ | ❌ | ❌ | — |
| Risk | 3+ | ❌ | ❌ | — |
| SettlersOfCatan | 2–4 | ❌ | ❌ | — |
| TerraformingMars | 1–5 | ❌ | ❌ | — |
| Werewolf | 5+ | ❌ | ❌ | — |
† Games from LLM Arena: Studying the Impact of Domain Expertise and Problem Complexity in LLM Competitions
‡ Games from Language Model Negotiations: Theory-of-Mind vs. Complexity of the Game
§ Games from Negotiating with Humans by LLMs via Strategic Reasoning
¶ These games were added because they are part of Language Models Make Better Players than Solvers in Cooperative Games