An advanced reinforcement learning project that trains AI agents to play Sekiro: Shadows Die Twice using Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) algorithms.
This project is inspired by the following repositories:
- https://github.com/analoganddigital/DQN_play_sekiro/tree/main
- https://github.com/ChenWendi2001/alpha-sekiro/tree/main
This project implements reinforcement learning agents that can learn to play Sekiro: Shadows Die Twice by:
- Capturing and processing game screens in real-time
- Detecting game states (health, boss health, rigidity meters)
- Executing actions using simulated keyboard inputs
- Learning optimal strategies through DQN and PPO algorithms
- Multiple RL Algorithms: Support for both DQN and PPO training
- Real-time Game Integration: Direct screen capture and keyboard control
- Advanced State Detection: Health bars, rigidity meters, and game state recognition
- Configurable Training: Extensive hyperparameter configuration
- Model Management: Save, load, and manage trained models
- TensorBoard Logging: Comprehensive training visualization
- Training/Testing Modes: Separate optimized configurations
- Clone the repository:
git clone https://github.com/tkgaolol/RL_sekiro.git
cd RL_sekiro- Install dependencies:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install opencv-python numpy pywin32 tensorboard- Game Setup:
- Install Sekiro: Shadows Die Twice
- Set game to windowed mode
- Adjust game resolution to 1024*576
- Set keyboard to use
- Y for reset horizon or fix to target
- J for attach
- M for defense
- Install Fling Trainer: https://flingtrainer.com/trainer/sekiro-shadows-die-twice-trainer/
- Set infinite resurrections and resurrection no cooldown
- Use env_example.py to test the game
- Install Sekiro: Shadows Die Twice
Start DQN training from scratch:
cd src/algorithm/dqn
python training.pyTrain with specific number of episodes:
python training.py --episodes 1000Continue training from latest model:
python training.py --latestLoad specific model:
python training.py --model model_episode_1000.pthList available models:
python training.py --list-modelsTest with latest trained model:
cd src/algorithm/dqn
python testing.py --latestTest specific model:
python testing.py --model model_episode_1000.pthTest with pure exploitation (no exploration):
python testing.py --latest --no-explorationStart PPO training:
cd src/algorithm/ppo
python training.pyPPO training with options:
python training.py --episodes 2000 --latestTest PPO model:
cd src/algorithm/ppo
python testing.py --model path/to/model.pth --episodes 5src/config.py: Main configuration file containing:- Image processing parameters
- Training hyperparameters
- Reward system settings
- Game state detection thresholds
# Training Resolution (faster training)
TRAIN_WIDTH = 120
TRAIN_HEIGHT = 125
# Testing Resolution (better quality)
TEST_WIDTH = 480
TEST_HEIGHT = 500
# Action Space
ACTION_SIZE = 6 # [nothing, attack, jump, defense, dodge, forward]
# Training Episodes
EPISODES = 3000
# Reward System
DEATH_PENALTY = -30
BOSS_KILL_REWARD = 50
BOSS_DAMAGE_REWARD = 15The AI can perform the following actions:
- Movement: Forward movement
- Combat: Attack, Defense, Dodge
- Navigation: Jump
- Idle: Nothing (no action)
- Real-time screen capture
- Image preprocessing and normalization
- Game state detection (health, rigidity)
- Keyboard input simulation
- Action space definition
- Action execution timing
- Reward calculation based on game state changes
- Health/damage tracking
- Win/loss detection
- Game restart functionality
- Emergency break system
- Training pause/resume controls
View training progress:
tensorboard --logdir dqn_logs
# or
tensorboard --logdir ppo_logs- Models saved in
dqn_model/andppo_model/ - Automatic timestamped sessions
- Episode-based checkpoints
- Press 'T': Start/resume training
- Press 'P': Pause training
- Press 'ESC': Emergency stop
DQN_play_sekiro/
├── src/
│ ├── config.py # Main configuration
│ ├── model_manager.py # Model management utilities
│ ├── env/ # Environment components
│ │ ├── env_core.py # Main environment class
│ │ ├── observation.py # Screen capture & processing
│ │ ├── actions.py # Action management
│ │ ├── rewards.py # Reward calculation
│ │ ├── game_control.py # Game control utilities
│ │ └── directkeys.py # Keyboard input simulation
│ └── algorithm/
│ ├── dqn/ # DQN implementation
│ │ ├── dqn.py # DQN network & agent
│ │ ├── training.py # Training script
│ │ └── testing.py # Testing script
│ └── ppo/ # PPO implementation
│ ├── ppo.py # PPO networks & agent
│ ├── training.py # Training script
│ └── testing.py # Testing script
├── dqn_model/ # DQN model storage
├── dqn_logs/ # DQN training logs
├── ppo_model/ # PPO model storage
└── ppo_logs/ # PPO training logs
This project is for educational and research purposes only. Use responsibly and in accordance with the game's terms of service.