datamllab
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/games.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/games.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/high-level-design.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/high-level-design.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/toy-examples.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/toy-examples.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/evaluate.py‎
Lines changed: 7 additions & 10 deletions b/‎examples/evaluate.py‎
Lines changed: 7 additions & 10 deletions
diff --git a/‎examples/human/blackjack_human.py‎
Lines changed: 2 additions & 3 deletions b/‎examples/human/blackjack_human.py‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎examples/human/gin_rummy_human.py‎
Lines changed: 2 additions & 2 deletions b/‎examples/human/gin_rummy_human.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/human/leduc_holdem_human.py‎
Lines changed: 2 additions & 3 deletions b/‎examples/human/leduc_holdem_human.py‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎examples/human/limit_holdem_human.py‎
Lines changed: 2 additions & 3 deletions b/‎examples/human/limit_holdem_human.py‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎examples/human/nolimit_holdem_human.py‎
Lines changed: 2 additions & 3 deletions b/‎examples/human/nolimit_holdem_human.py‎
Lines changed: 2 additions & 3 deletions
@@ -245,7 +245,7 @@ You can use the the following interface to make an environment. You may optional
 	*   `allow_step_back`: Default `False`. `True` if allowing `step_back` function to traverse backward in the tree.
 	*   Game specific configurations: These fields start with `game_`. Currently, we only support `game_num_players` in Blackjack, .
 
-Once the environemnt is made, we can access some information of the game.
+Once the environment is made, we can access some information of the game.
 *   **env.num_actions**: The number of actions.
 *   **env.num_players**: The number of players.
 *   **env.state_shape**: The shape of the state space of the observations.
 
@@ -90,7 +90,7 @@ At each decision point of the game, the corresponding player will be able to obs
 | ------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
 | seen\_cards   | Three face-down cards distributed to the landlord after bidding. Then these cards will be made public to all players.                                | TQA                                                                                                 |
 | landlord      | An integer of landlord's id                                                                                                                          | 0                                                                                                   |
-| self          | An integer of current player's id                                                                                                                    | 2                                                                                                   |
+| cls          | An integer of current player's id                                                                                                                    | 2                                                                                                   |
 | trace         | A list of tuples which records every actions in one game. The first entry of  the tuple is player's id, the second is corresponding player's action. | \[(0, '8222'), (1, 'pass'), (2, 'pass'), (0 '6KKK'), (1, 'pass'), (2, 'pass'), (0, '8'), (1, 'Q')\] |
 | played\_cards | As the game progresses, the cards which have been played by the three players and sorted from low to high.                                           | \['6', '8', '8', 'Q', 'K', 'K', 'K', '2', '2', '2'\]                                                |
 | others\_hand  | The union of the other two player's current hand                                                                                                     | 333444555678899TTTJJJQQAA2R                                                                         |
@@ -134,7 +134,7 @@ If the landlord first get rid of all the cards in his hand, he will win and rece
 ## Mahjong
 Mahjong is a tile-based game developed in China, and has spread throughout the world since 20th century. It is commonly played
 by 4 players. The game is played with a set of 136 tiles. In turn players draw and discard tiles until  
-The goal of the game is to complete the leagal hand using the 14th drawn tile to form 4 sets and a pair. 
+The goal of the game is to complete the legal hand using the 14th drawn tile to form 4 sets and a pair. 
 We revised the game into a simple version that all of the winning set are equal, and player will win as long as she complete 
 forming 4 sets and a pair. Please refer the detail on [Wikipedia](https://en.wikipedia.org/wiki/Mahjong) or  [Baike](https://baike.baidu.com/item/麻将/215).
 
 
@@ -25,4 +25,4 @@ Card games usually have similar structures. We abstract some concepts in card ga
 To summarize, in one `Game`, a `Dealer` deals the cards for each `Player`. In each `Round` of the game, a `Judger` will make major decisions about the next round and the payoffs in the end of the game.
 
 ## Agents
-We provide examples of several representative algorithms and wrap them as `Agent` to show how a learning algorithm can be connected to the toolkit. The first example is DQN which is a representative of the Reinforcement Learning (RL) algorithms category. The second example is NFSP which is a representative of the Reinforcement Learning (RL) with self-play. We also provide CFR (chance sampling) and DeepCFR which belong to Conterfactual Regret Minimization (CFR) category. Other algorithms from these three categories can be connected in similar ways.
+We provide examples of several representative algorithms and wrap them as `Agent` to show how a learning algorithm can be connected to the toolkit. The first example is DQN which is a representative of the Reinforcement Learning (RL) algorithms category. The second example is NFSP which is a representative of the Reinforcement Learning (RL) with self-play. We also provide CFR (chance sampling) and DeepCFR which belong to Counterfactual Regret Minimization (CFR) category. Other algorithms from these three categories can be connected in similar ways.
@@ -339,7 +339,7 @@ def train(args):
     # Seed numpy, torch, random
     set_seed(args.seed)
 
-    # Initilize CFR Agent
+    # Initialize CFR Agent
     agent = CFRAgent(
         env,
         os.path.join(
 
@@ -1,19 +1,16 @@
-''' An example of evluating the trained models in RLCard
-'''
+"""An example of evaluating the trained models in RLCard"""
 import os
 import argparse
 
 import rlcard
-from rlcard.agents import (
-    DQNAgent,
-    RandomAgent,
-)
+
 from rlcard.utils import (
     get_device,
     set_seed,
     tournament,
 )
 
+
 def load_model(model_path, env=None, position=None, device=None):
     if os.path.isfile(model_path):  # Torch model
         import torch
@@ -29,14 +26,14 @@ def load_model(model_path, env=None, position=None, device=None):
     else:  # A model in the model zoo
         from rlcard import models
         agent = models.load(model_path).agents[position]
-    
+
     return agent
 
-def evaluate(args):
 
+def evaluate(args):
     # Check whether gpu is available
     device = get_device()
-        
+
     # Seed numpy, torch, random
     set_seed(args.seed)
 
@@ -54,6 +51,7 @@ def evaluate(args):
     for position, reward in enumerate(rewards):
         print(position, args.models[position], reward)
 
+
 if __name__ == '__main__':
     parser = argparse.ArgumentParser("Evaluation example in RLCard")
     parser.add_argument(
@@ -99,4 +97,3 @@ def evaluate(args):
 
     os.environ["CUDA_VISIBLE_DEVICES"] = args.cuda
     evaluate(args)
-
@@ -1,5 +1,4 @@
-''' A toy example of self playing for Blackjack
-'''
+"""A toy example of self playing for Blackjack """
 
 import rlcard
 from rlcard.agents import RandomAgent as RandomAgent
@@ -23,7 +22,7 @@
 
 print(">> Blackjack human agent")
 
-while (True):
+while True:
     print(">> Start a new game")
 
     trajectories, payoffs = env.run(is_training=False)
 
@@ -1,9 +1,9 @@
-'''
+"""
     Project: Gui Gin Rummy
     File name: gin_rummy_human.py
     Author: William Hale
     Date created: 3/14/2020
-'''
+"""
 
 #   You need to install tkinter if it is not already installed.
 #   Tkinter is Python's defacto standard GUI (Graphical User Interface) package.
 
@@ -1,5 +1,4 @@
-''' A toy example of playing against pretrianed AI on Leduc Hold'em
-'''
+"""A toy example of playing against pretrianed AI on Leduc Hold'em"""
 
 import rlcard
 from rlcard import models
@@ -17,7 +16,7 @@
 
 print(">> Leduc Hold'em pre-trained model")
 
-while (True):
+while True:
     print(">> Start a new game")
 
     trajectories, payoffs = env.run(is_training=False)
 
@@ -1,5 +1,4 @@
-''' A toy example of playing against a random agent on Limit Hold'em
-'''
+"""A toy example of playing against a random agent on Limit Hold'em"""
 
 import rlcard
 from rlcard.agents import LimitholdemHumanAgent as HumanAgent
@@ -17,7 +16,7 @@
 
 print(">> Limit Hold'em random agent")
 
-while (True):
+while True:
     print(">> Start a new game")
 
     trajectories, payoffs = env.run(is_training=False)
 
@@ -1,5 +1,4 @@
-''' A toy example of playing against pretrianed AI on Leduc Hold'em
-'''
+"""A toy example of playing against pretrained AI on Leduc Hold'em"""
 from rlcard.agents import RandomAgent
 
 import rlcard
@@ -17,7 +16,7 @@
 env.set_agents([human_agent, human_agent2])
 
 
-while (True):
+while True:
     print(">> Start a new game")
 
     trajectories, payoffs = env.run(is_training=False)