Coup

This environment is a PettingZoo implementation of the card game Coup.

Import	`from pettingzoo_coup import coup_v0`
Actions	Discrete
Parallel API	No
Manual Control	No
Agents	`agents=['agent_0', 'agent_1', ...]`
Agents	2+ (configurable)
Action Shape	varies, default is (31,)
Action Values	varies
Observation Shape	varies, default is (94,)
Observation Values	varies

Coup is a 2-6 player competitive hidden information game where the last player standing wins. Each player starts with 2 cards and 2 coins. On their turn, players can perform various actions to gain coins, steal from others, or force them to lose cards. Players can claim to have specific character cards to perform special actions, but opponents can challenge these claims. If caught bluffing, the claimant loses a card of their choice. If they have the card, the challenger loses a card instead. Some of the card actions are counter-actions that block and nullify certain actions. These block actions can also be challenged.

See examples/run.py for basic usage.

Environment arguments

The environment is configurable to allow different initial conditions of the game. In particular, it is possible to start in a late-game situation with cards already eliminated and a few players remaining. This way a curriculum-based learning is possible.

coup_v0.env(
    deck=None, dead_draw=False, num_players=6, num_players_alive=6, render_mode="ansi"
)

deck: Dictionary mapping card names to their count in the deck. The default deck is 3 of each card type (Ambassador, Assassin, Captain, Contessa, Duke).

dead_draw: Whether to draw cards for non-participating players at game start to maintain consistent deck size.

num_players: Total number of player slots (2-6). This influences the observation and action space sizes.

num_players_alive: Number of players actually participating in the game (must be <= num_players and >= 2).

render_mode: Rendering mode. Currently only supports "ansi" for text-based rendering.

Observation Space

The observation is a 1D numpy array with information about the game state and the current turn's history, from the relative perspective of the observing agent. The size and range of the observation space depends on initialization parameters (deck and num_players). For default settings (six players, default deck):

Index Range	Array Length	Array Type	Description	Value Range	Scope
0 - 5	# Players	Counts	Coins of each player	0 - 12	Public
6 - 10	# Card types	Counts	Unseen card totals	0 - 3	Private
11 - 15	# Card types	Counts	Player's cards by type	0 - 3	Private
16 - 21	# Players	Counts	Card totals per player	0 - 4	Public
22 - 28	# Start Actions	One-hot	Action	0 - 1	Public
29 - 34	# Players	One-hot	Actor	0 - 1	Public
35 - 40	# Players	One-hot	Action's target	0 - 1	Public
41 - 46	# Players	Indicator	Challenge passed	0 - 1	Public
47 - 52	# Players	One-hot	Challenger	0 - 1	Public
53 - 58	# Players	One-hot	Challenge loser	0 - 1	Public
59 - 63	# Block Actions	One-hot	Block action	0 - 1	Public
64 - 69	# Players	Indicator	Block passed	0 - 1	Public
70 - 75	# Players	One-hot	Blocker	0 - 1	Public
76 - 81	# Players	Indicator	Block challenge passed	0 - 1	Public
82 - 87	# Players	One-hot	Block challenger	0 - 1	Public
88 - 93	# Players	One-hot	Block challenge loser	0 - 1	Public

The subarrays of the observation array of size equal to the number of players are represented relative to the observing agent: The observer is always at the 0-bit of the subarray, followed by the other players in order of play after the observer.

Observation History

Past turn observations are made available as a list of observation arrays in info["observation_history"], from oldest to most recent.

The current observation array only contains information on the current game state and turn. Being a hidden information game, past turns contain important information that the agent might use to make better decisions. Additionally, not all agents have a chance to observe and act during a turn.

Player

Action Space

The action space is discrete and varies based on the number of players. Actions are mapped to integers starting from 0:

Bit	Action	Target
0	LOSE_AMBASSADOR	0
1	LOSE_ASSASSIN	0
2	LOSE_CAPTAIN	0
3	LOSE_CONTESSA	0
4	LOSE_DUKE	0
5	CHALLENGE_PASS	0
6	CHALLENGE_CALL	0
7	BLOCK_PASS	0
8	BLOCK_ASSASSINATE	0
9	BLOCK_FOREIGN_AID	0
10	BLOCK_STEAL_AMB	0
11	BLOCK_STEAL_CAP	0
12	EXCHANGE	0
13	FOREIGN_AID	0
14	INCOME	0
15	TAX	0
16	ASSASSINATE	1
17	ASSASSINATE	2
18	ASSASSINATE	3
19	ASSASSINATE	4
20	ASSASSINATE	5
21	COUP	1
22	COUP	2
23	COUP	3
24	COUP	4
25	COUP	5
26	STEAL	1
27	STEAL	2
28	STEAL	3
29	STEAL	4
30	STEAL	5

Here, target is the player position counting from the acting agent, so target=0 actions are self-targeted.

Self-targeted actions:

INCOME: Gain 1 coin (cannot be blocked or challenged)
FOREIGN_AID: Gain 2 coins (can be blocked by Duke)
EXCHANGE: Claim Ambassador, draw 2 cards and return 2 cards of your choice (can be challenged)
TAX: Claim Duke, gain 3 coins (can be challenged)

Targeted actions:

COUP: Pay 7 coins to force a player to lose a card (cannot be blocked or challenged)
ASSASSINATE: Claim Assassin, pay 3 coins to force player to lose a card (can be blocked by Contessa, can be challenged)
STEAL: Claim Captain, steal 2 coins from a player (can be blocked by Captain or Ambassador, can be challenged)

Response actions:

CHALLENGE_PASS: Do not challenge the action
CHALLENGE_CALL: Challenge the action (available when an opponent claims a card)
BLOCK_PASS: Don't block the action
BLOCK_*: Various blocking actions
LOSE_*: Choose which card to lose from your hand

Legal Actions Mask

The legal actions available to the current agent are provided in the action_mask element of the info dictionary returned by last(). The action_mask is a binary vector where each index represents whether the corresponding action is legal.

Taking an illegal action will raise a ValueError.

Rewards

Coup uses a winner-takes-all reward structure:

The last surviving player receives a reward of +1.0
All other players receive 0.0

The game ends when only one player has remaining cards.

Game Flow

A typical turn in Coup follows this pattern:

Start State: Active player chooses an action
Challenge Window: If action is challengeable, other players can challenge
Block Window: If action is blockable, target player (or all players) can block
Block Challenge Window: If blocked, other players can challenge the block
Resolution: Action resolves (or doesn't) based on challenges/blocks.
End Turn: Next player's turn begins

Players lose cards when:

They lose a challenge
They are successfully assassinated or coup'd

Implementation

The game is modeled as a directed decision graph and implemented using a scrict state space. Here is a diagram of the game model:

Version History

v0: Initial release

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/pettingzoo_coup		src/pettingzoo_coup
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Coup

Environment arguments

Observation Space

Observation History

Player

Action Space

Legal Actions Mask

Rewards

Game Flow

Implementation

Version History

About

Uh oh!

Languages

License

ivomac/pettingzoo-coup

Folders and files

Latest commit

History

Repository files navigation

Coup

Environment arguments

Observation Space

Observation History

Player

Action Space

Legal Actions Mask

Rewards

Game Flow

Implementation

Version History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages