About
pygame
└── policy-transfer
├── environments
│ ├── agents
│ │ ├── agent.py
│ │ └── move.py
│ ├── gameEnvironment
│ │ ├── field
│ │ │ ├── field.py
│ │ │ ├── fieldReward.py
│ │ │ └── fieldSort.py
│ │ ├── gameMap.py
│ │ ├── score.py
│ │ └── scoreHistory.py
│ ├── icons
│ │ └── ...
│ └── game.py
├── experiments
│ └── template
│ ├── agent_data
│ ├── map_data
│ │ ├── field_rewards_5x5.json #defines rewards on the map
│ │ ├── field_sort_5x5.json #defines sort of each field
│ │ └── dynamics.json #defines dynamic of each field
│ └── results
│ └── ...
├── requirements.txt
├── config.py #metadata configuration
└── main.py #start script
This project implements a grid world environment with possibility to easily exchange dynamic of the environment. There are two modes:
-
Interactive mode:
Purpose of interactive mode is to validate and visualise behavior of one agent in different environment dynamics.
-
Experiment mode:
In experiment mode we want to leverage
You need python and following libraries
- pygame
- numpy
- matplotlib
All requirements are listed in requirements.txt If you have python installed you can navigate to requrements.txt and run command:
$ pip install -r requrements.txt
1. Change BASE_PATH in config.py to point to policy_transfer folder on your mashine
2. Navigate to the main.py and run:
$ python main.py
The program will start with default map setting. This setting is in experiments/template/map_data folder. After reading the metadata the programm will start learning script. After learning the environment is should appear pygame console.
You can interact with the agent pressing the keyboard buttons:
- up/down : +/- speed of agent (if 0 than is the maximal posible speed)
- m : turn off/on agent physicaly play
- h : show score plot
- j : show average scor plot
- b : stop/play the game
- p : change usage of value function and q function
- s : save agent q values in a dictionary
If you want to change position of walls and rewards, you can do this changing the metadata file in folder experiments/template/map_data
- field_rewards_5x5.json describes position of rewards
- field_sort_5x5.sjon describes position of walls
IMPORTANT: Dont change positions of [0,0] and [4,4] because those are start and end position of the game they should stay the same.
Default field sort is BAHN (main direction 0.94). You can change this behavior runnging the command:
$ python main.py -s <FIELD_SORT>
<FIELD_SORT> can be:
- BAHN (main direction 0.96)
- GRASS (main direction 0.7)
- SUMPF (main direction 0.6)
- ICE (main direction 0.5)
- RANDOM # mix of above 4
If you want further to change environment dynamic you can do it in the file ../environments/gameEnvironment/field/fieldSort.py
You can also describe dynamic for each of the field. For this you should change USE_DYNAMICS in config.py file. Dynamics are described in dynamics.json file.
Starting the program invokes the learning of the agent on the underlying map. This means that all of above described setting will affect behaviour of the agent. We can than save this agent (pressing key s in console) and load in some other environment.
We can do so typing the following command:
$ python main.py -t <AGENT_NAME>
The program will look for a <AGENT_NAME>.json file in folder /experiments/template/agent_data This file should contain the agent q values.
Experiment mode is a part of the program where we dinamicaly train, save and than combine the agents on various ways. In the file agent.py we wrote 3 different functions. Every function is a complet experiment.
To start our experiments you can run the command:
$ python main.py -e
This command will start all 3 experiments and after finishing write the results in /experiments/template/results
If you want to write your tests there are some configuration which could be interesting. First you can copy template experiment and rename it. If you do so then you should also change the name of experiment in config.py file.
The main function which actualy return dynamic of one environment is in gameMap.py file.
def get_field_dynamic(self,field_position):
if self.USE_DYNAMIC:
if self.TRAINING:
if self.RANDOM_DYNAMIC:
return self.current_field_dynamics[field_position] #dynamicaly generated dynamic
else:
return self.HOMOGEN_DYNAMIC #fixed value programaticaly setted
else:
return self.field_dynamics[field_position] #dynamic from dynamic.json file
else:
return self.field_dict[field_position].field_sort.value #dynamic from fieldSort.py fileIf we set control variable in our test we can
One example would be to train two agent on main direction 0.6 and 0.8:
def evaluate_KL(self, game_map):
AGENTS = [0.6,0.8]
self.transfer_agents = {}
for main_direction in AGENTS:
half = round((1 - main_direction)/2,2)
new_dynamic = (main_direction,half,half)
game_map.USE_DYNAMIC = True
game_map.TRAINING = True
game_map.RANDOM_DYNAMIC = False
game_map.HOMOGEN_DYNAMIC = new_dynamic
self.learn_environment(game_map)
#save agent
self.transfer_agents[main_direction] = copy.deepcopy(self.q_field_dict)
.
.
.First we prepare environment for the agent. Than we call function learn_environment and with value iteration algorithm we compute value function. At the end of the learn_environment function we transfer information from value function to q function. We have this q values in self.q_field_dict which we than save to the transfer agent dictionary.
At the end we have in dictionary transfer_agents q values from two agents which we can than combine and evaluate.
