outputx.1.1.1.mp4
Suppressing Control Oscillations in Reinforcement Learning via External Action Filtering for Autonomous Driving
Mehmet Yaşar Osman Özturan · Ahmet Emir Dirik
Computer Engineering Department, Bursa Uludağ University, Türkiye
ELECO 2025
Deep Reinforcement Learning (DRL) agents operating in continuous action spaces often produce high-frequency, non-human-like control oscillations — a critical barrier to passenger comfort and social acceptance of autonomous vehicles.
This repository presents a modular DRL architecture that cleanly separates the primary driving task from the secondary objective of action smoothing. Rather than baking smoothness into a complex multi-objective reward function, we apply a dedicated external action filtering module downstream of the policy network.
Key result: A Weighted Moving Average (WMA) filter reduces action volatility by over 92% compared to an unfiltered SAC baseline, while maintaining strong lane-keeping and obstacle avoidance performance.
The system is built on three decoupled stages:
- Perception — Raw sensor data (camera + LiDAR) is compressed into compact 16-dimensional feature vectors using two dedicated autoencoders.
- Decision & Control — A Soft Actor-Critic (SAC) agent learns the driving policy over a temporally-enriched state vector.
- Action Filtering — Raw policy outputs are smoothed by an external filter before actuation, enforcing human-like continuity without modifying the reward signal.
The state vector
- Encoded camera features at
$t$ and$t{-1}$ - Encoded LiDAR features at
$t$ and$t{-1}$ - Vehicle kinematics (speed, acceleration) at
$t$ and$t{-1}$ - Actions taken at
$t{-1}$ and$t{-2}$
This temporal stacking provides the Markovian dynamic context needed for robust control.
SAC optimizes a maximum entropy objective, balancing expected cumulative reward with policy entropy:
The actor and two critic networks each use two hidden layers of 1024 units. Actions are sampled via the reparameterization trick:
Two low-pass filtering strategies are evaluated as the external smoothing module:
Exponential Moving Average (EMA) — blends the current raw action with the previously smoothed action:
Weighted Moving Average (WMA) — computes a linearly weighted average over the last
The composite reward encourages lane centering, speed regulation, and mild action smoothness:
| Component | Description |
|---|---|
| Lane centering — penalizes cross-track error | |
| Speed regulation — Gaussian centered on target speed | |
| Smoothness regularizer — minor penalty on large action changes |
- Simulator: Donkey Car 3D simulator, mountain-track
- Task: Simultaneous lane keeping + static obstacle avoidance
-
Action space: Continuous steering and throttle/brake in
$[-1, 1]$ - Episode termination: Lane deviation, collision, or 300-step limit
| Model | Description |
|---|---|
SAC |
Unmodified baseline |
SAC+Noise |
SAC with Ornstein–Uhlenbeck exploration noise |
SAC+Noise+EMA |
OU noise + EMA action filter |
SAC+Noise+WMA |
OU noise + WMA action filter |
SAC-CLF+Noise |
OU noise + smoothness objective in the actor loss (coupled baseline) |
Training summary:
| Model | Mean Action Change (%) | Mean Error (%) |
|---|---|---|
| SAC | 36.61 | 21.01 |
| SAC+Noise | 37.37 | 18.29 |
| SAC+Noise+EMA | 21.62 | 18.36 |
| SAC+Noise+WMA | 12.75 | 23.23 |
| SAC-CLF+Noise | 20.66 | 25.03 |
Test summary (500-step evaluation, no exploration noise):
| Model | Mean Action Change (%) | Mean Error (%) |
|---|---|---|
| SAC | 72.02 | 8.11 |
| SAC+Noise | 73.90 | 4.26 |
| SAC+Noise+EMA | 38.36 | 2.09 |
| SAC+Noise+WMA | 5.64 | 6.40 |
| SAC-CLF+Noise | 26.33 | 8.30 |
The WMA-filtered agent achieves a 92.4% reduction in action volatility versus the unfiltered SAC+Noise baseline — producing smooth, human-like control with a modest, acceptable increase in task error.
├── camera/ # Convolutional and FC autoencoders for camera images
├── lidar / # Convolutional and FC autoencoders for lidar point clouds
├── agents/ # SAC agent implementation (actor, critics, value nets, buffer)
├── filters/ # EMA and WMA action smoothing modules
├── wrappers/ # External wrappers (non-gym) for action, reward and observations
├── train.py # Training entry point
├── train_.py # Training entry point (different kind)
├── test.py # Testing entry point
├── evaluate.py # Evaluation and test drive script
├── configs/ # Environment and hyperparameter configs
├── assets/ # Images used in this README and more
├── utils/ # helper scripts
└── requirements.txt
If you find this work useful, please cite:
@INPROCEEDINGS{11329257,
author={Osman Özturan, Mehmet Yaşar and Emir Dirik, Ahmet},
booktitle={2025 16th International Conference on Electrical and Electronics Engineering (ELECO)},
title={Suppressing Control Oscillations in Reinforcement Learning via External Action Filtering for Autonomous Driving},
year={2025},
volume={},
number={},
pages={1-5},
keywords={Training;Smoothing methods;Systematics;Navigation;Filtering;Deep reinforcement learning;Driver behavior;Autonomous vehicles;Oscillators;Tuning},
doi={10.1109/ELECO69582.2025.11329257}}
This project is licensed under the MIT License. See LICENSE for details.
Bursa Uludağ University · Computer Engineering Department



