Policy transfer across different environment dynamics

Bachelor Thema - Milicevic Aleksandar

Policy transfer across different environment dynamics

Introduction

About

Folder structure

pygame
└── policy-transfer
    ├── environments
    │   ├── agents
    │   │   ├── agent.py
    │   │   └── move.py
    │   ├── gameEnvironment
    │   │   ├── field
    │   │   │   ├── field.py
    │   │   │   ├── fieldReward.py
    │   │   │   └── fieldSort.py
    │   │   ├── gameMap.py
    │   │   ├── score.py
    │   │   └── scoreHistory.py   
    │   ├── icons
    │   │   └── ...
    │   └── game.py
    ├── experiments
    │   └── template
    │       ├── agent_data
    │       ├── map_data
    │       │   ├── field_rewards_5x5.json      #defines rewards on the map 
    │       │   ├── field_sort_5x5.json         #defines sort of each field
    │       │   └── dynamics.json               #defines dynamic of each field
    │       └── results
    │           └── ...
    ├── requirements.txt
    ├── config.py                               #metadata configuration
    └── main.py                                 #start script

Introduction

This project implements a grid world environment with possibility to easily exchange dynamic of the environment. There are two modes:

Interactive mode:

Purpose of interactive mode is to validate and visualise behavior of one agent in different environment dynamics.
Experiment mode:

In experiment mode we want to leverage

Environment requirements

You need python and following libraries

- pygame
- numpy
- matplotlib

All requirements are listed in requirements.txt If you have python installed you can navigate to requrements.txt and run command:

$ pip install -r requrements.txt

How to start and configure the programm in interactive mode

1. Change BASE_PATH in config.py to point to policy_transfer folder on your mashine 
2. Navigate to the main.py and run:

$ python main.py

The program will start with default map setting. This setting is in experiments/template/map_data folder. After reading the metadata the programm will start learning script. After learning the environment is should appear pygame console.

You can interact with the agent pressing the keyboard buttons:

up/down : +/- speed of agent (if 0 than is the maximal posible speed)
m : turn off/on agent physicaly play
h : show score plot
j : show average scor plot
b : stop/play the game
p : change usage of value function and q function
s : save agent q values in a dictionary

Change environment setup

If you want to change position of walls and rewards, you can do this changing the metadata file in folder experiments/template/map_data

- field_rewards_5x5.json describes position of rewards
- field_sort_5x5.sjon describes position of walls

IMPORTANT: Dont change positions of [0,0] and [4,4] because those are start and end position of the game they should stay the same.

Change environment dynamic

Default field sort is BAHN (main direction 0.94). You can change this behavior runnging the command:

$ python main.py -s <FIELD_SORT>

<FIELD_SORT> can be:
- BAHN (main direction  0.96)
- GRASS (main direction  0.7)
- SUMPF (main direction  0.6)
- ICE (main direction  0.5)
- RANDOM # mix of above 4

If you want further to change environment dynamic you can do it in the file ../environments/gameEnvironment/field/fieldSort.py

You can also describe dynamic for each of the field. For this you should change USE_DYNAMICS in config.py file. Dynamics are described in dynamics.json file.

Train and load agent

Starting the program invokes the learning of the agent on the underlying map. This means that all of above described setting will affect behaviour of the agent. We can than save this agent (pressing key s in console) and load in some other environment.

We can do so typing the following command:

$ python main.py -t <AGENT_NAME>

The program will look for a <AGENT_NAME>.json file in folder /experiments/template/agent_data This file should contain the agent q values.

How to start the programm in experiment mode

Experiment mode is a part of the program where we dinamicaly train, save and than combine the agents on various ways. In the file agent.py we wrote 3 different functions. Every function is a complet experiment.

To start our experiments you can run the command:

$ python main.py -e

This command will start all 3 experiments and after finishing write the results in /experiments/template/results

If you want to write your tests there are some configuration which could be interesting. First you can copy template experiment and rename it. If you do so then you should also change the name of experiment in config.py file.

The main function which actualy return dynamic of one environment is in gameMap.py file.

def get_field_dynamic(self,field_position):
    if self.USE_DYNAMIC:
        if self.TRAINING:
            if self.RANDOM_DYNAMIC:
                return self.current_field_dynamics[field_position]  #dynamicaly generated dynamic
            else:
                return self.HOMOGEN_DYNAMIC                         #fixed value programaticaly setted
        else:
            return self.field_dynamics[field_position]              #dynamic from dynamic.json file
    else:
        return self.field_dict[field_position].field_sort.value     #dynamic from fieldSort.py file

If we set control variable in our test we can

One example would be to train two agent on main direction 0.6 and 0.8:

def evaluate_KL(self, game_map):
    AGENTS = [0.6,0.8]
    self.transfer_agents = {}
    for main_direction in AGENTS:
        half = round((1 - main_direction)/2,2)
        new_dynamic = (main_direction,half,half)
        
        game_map.USE_DYNAMIC = True
        game_map.TRAINING = True
        game_map.RANDOM_DYNAMIC = False
        game_map.HOMOGEN_DYNAMIC = new_dynamic

        self.learn_environment(game_map)
        
        #save agent
        self.transfer_agents[main_direction] = copy.deepcopy(self.q_field_dict)
    .
    .
    .

First we prepare environment for the agent. Than we call function learn_environment and with value iteration algorithm we compute value function. At the end of the learn_environment function we transfer information from value function to q function. We have this q values in self.q_field_dict which we than save to the transfer agent dictionary.

At the end we have in dictionary transfer_agents q values from two agents which we can than combine and evaluate.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Doc		Doc
pygame/policy-transfer		pygame/policy-transfer
readme_pics		readme_pics
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bachelor Thema - Milicevic Aleksandar

Policy transfer across different environment dynamics

Introduction

Folder structure

Introduction

Environment requirements

How to start and configure the programm in interactive mode

Change environment setup

Change environment dynamic

Train and load agent

How to start the programm in experiment mode

Icons made by Smashicons from www.flaticon.com

About

Uh oh!

Releases

Packages

Languages

milicevica23/bt-milicevic

Folders and files

Latest commit

History

Repository files navigation

Bachelor Thema - Milicevic Aleksandar

Policy transfer across different environment dynamics

Introduction

Folder structure

Introduction

Environment requirements

How to start and configure the programm in interactive mode

Change environment setup

Change environment dynamic

Train and load agent

How to start the programm in experiment mode

Icons made by Smashicons from www.flaticon.com

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages