Skip to content

mozturan/Human-Like_AutonomousDrive3D_DRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

232 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Human-Like Autonomous Driving via Deep Reinforcement Learning with External Action Filtering

outputx.1.1.1.mp4

Paper


Suppressing Control Oscillations in Reinforcement Learning via External Action Filtering for Autonomous Driving
Mehmet Yaşar Osman Özturan · Ahmet Emir Dirik
Computer Engineering Department, Bursa Uludağ University, Türkiye
ELECO 2025


Overview

Deep Reinforcement Learning (DRL) agents operating in continuous action spaces often produce high-frequency, non-human-like control oscillations — a critical barrier to passenger comfort and social acceptance of autonomous vehicles.

This repository presents a modular DRL architecture that cleanly separates the primary driving task from the secondary objective of action smoothing. Rather than baking smoothness into a complex multi-objective reward function, we apply a dedicated external action filtering module downstream of the policy network.

Key result: A Weighted Moving Average (WMA) filter reduces action volatility by over 92% compared to an unfiltered SAC baseline, while maintaining strong lane-keeping and obstacle avoidance performance.


Method

System Architecture

System Architecture

The system is built on three decoupled stages:

  1. Perception — Raw sensor data (camera + LiDAR) is compressed into compact 16-dimensional feature vectors using two dedicated autoencoders.
  2. Decision & Control — A Soft Actor-Critic (SAC) agent learns the driving policy over a temporally-enriched state vector.
  3. Action Filtering — Raw policy outputs are smoothed by an external filter before actuation, enforcing human-like continuity without modifying the reward signal.

State Representation

The state vector $s_t$ concatenates:

  • Encoded camera features at $t$ and $t{-1}$
  • Encoded LiDAR features at $t$ and $t{-1}$
  • Vehicle kinematics (speed, acceleration) at $t$ and $t{-1}$
  • Actions taken at $t{-1}$ and $t{-2}$

This temporal stacking provides the Markovian dynamic context needed for robust control.

Soft Actor-Critic (SAC)

SAC optimizes a maximum entropy objective, balancing expected cumulative reward with policy entropy:

$$J(\pi) = \sum_{t=0}^{T} \mathbb{E}_{(s_t, a_t) \sim \rho_\pi} \left[ r(s_t, a_t) + \alpha \mathcal{H}(\pi(\cdot \mid s_t)) \right]$$

The actor and two critic networks each use two hidden layers of 1024 units. Actions are sampled via the reparameterization trick:

$$a_t = \mu_\phi(s_t) + \sigma_\phi(s_t) \cdot \epsilon_t, \quad \epsilon_t \sim \mathcal{N}(0,1)$$

SAC Network Structure

Action Smoothing Filters

Two low-pass filtering strategies are evaluated as the external smoothing module:

Exponential Moving Average (EMA) — blends the current raw action with the previously smoothed action:

$$\tilde{a}_t = w \cdot a_t + (1 - w) \cdot \tilde{a}_{t-1}, \quad w = 0.5$$

Weighted Moving Average (WMA) — computes a linearly weighted average over the last $n$ actions, prioritizing the most recent:

$$\tilde{a}_t = \frac{n \cdot a_t + (n-1) \cdot \tilde{a}_{t-1} + \cdots + 1 \cdot \tilde{a}_{t-n+1}}{n + (n-1) + \cdots + 1}, \quad n = 5$$

Reward Function

The composite reward encourages lane centering, speed regulation, and mild action smoothness:

$$R(t) = \frac{R_{cc}(t) \cdot R_{spd}(t)}{R_{act}(t) + 1}$$

Component Description
$R_{cc}(t)$ Lane centering — penalizes cross-track error
$R_{spd}(t)$ Speed regulation — Gaussian centered on target speed
$R_{act}(t)$ Smoothness regularizer — minor penalty on large action changes

Experiments

Setup

System Architecture

  • Simulator: Donkey Car 3D simulator, mountain-track
  • Task: Simultaneous lane keeping + static obstacle avoidance
  • Action space: Continuous steering and throttle/brake in $[-1, 1]$
  • Episode termination: Lane deviation, collision, or 300-step limit

Models Evaluated

Model Description
SAC Unmodified baseline
SAC+Noise SAC with Ornstein–Uhlenbeck exploration noise
SAC+Noise+EMA OU noise + EMA action filter
SAC+Noise+WMA OU noise + WMA action filter
SAC-CLF+Noise OU noise + smoothness objective in the actor loss (coupled baseline)

Results

Training Performance

Episode Length

Mean Action Change - Training Mean Error - Training

Training summary:

Model Mean Action Change (%) Mean Error (%)
SAC 36.61 21.01
SAC+Noise 37.37 18.29
SAC+Noise+EMA 21.62 18.36
SAC+Noise+WMA 12.75 23.23
SAC-CLF+Noise 20.66 25.03

Testing Performance

Performance-Smoothness Trade-off

Test summary (500-step evaluation, no exploration noise):

Model Mean Action Change (%) Mean Error (%)
SAC 72.02 8.11
SAC+Noise 73.90 4.26
SAC+Noise+EMA 38.36 2.09
SAC+Noise+WMA 5.64 6.40
SAC-CLF+Noise 26.33 8.30

The WMA-filtered agent achieves a 92.4% reduction in action volatility versus the unfiltered SAC+Noise baseline — producing smooth, human-like control with a modest, acceptable increase in task error.


Repository Structure

├── camera/               # Convolutional and FC autoencoders for camera images
├── lidar /               # Convolutional and FC autoencoders for lidar point clouds
├── agents/               # SAC agent implementation (actor, critics, value nets, buffer)
├── filters/              # EMA and WMA action smoothing modules
├── wrappers/             # External wrappers (non-gym) for action, reward and observations
├── train.py              # Training entry point
├── train_.py             # Training entry point (different kind)
├── test.py               # Testing entry point
├── evaluate.py           # Evaluation and test drive script
├── configs/              # Environment and hyperparameter configs
├── assets/               # Images used in this README and more
├── utils/                # helper scripts
└── requirements.txt

Citation

If you find this work useful, please cite:

@INPROCEEDINGS{11329257,
  author={Osman Özturan, Mehmet Yaşar and Emir Dirik, Ahmet},
  booktitle={2025 16th International Conference on Electrical and Electronics Engineering (ELECO)}, 
  title={Suppressing Control Oscillations in Reinforcement Learning via External Action Filtering for Autonomous Driving}, 
  year={2025},
  volume={},
  number={},
  pages={1-5},
  keywords={Training;Smoothing methods;Systematics;Navigation;Filtering;Deep reinforcement learning;Driver behavior;Autonomous vehicles;Oscillators;Tuning},
  doi={10.1109/ELECO69582.2025.11329257}}

License

This project is licensed under the MIT License. See LICENSE for details.


Bursa Uludağ University · Computer Engineering Department

About

Human-Like Autonomous Driving W/ Deep Reinforcement Learning in 3D environment: Suppressing Control Oscillations in Reinforcement Learning via External Action Filtering for Autonomous Driving

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages