An implementation of a deepfake detection model that uses gradient regularization to improve robustness against adversarial attacks. This approach perturbs the mean and standard deviation of shallow layers in an EfficientNetB0 backbone to enhance generalization and defend against attacks like FGSM and PGD.
Traditional deepfake detectors are vulnerable to adversarial attacks that can fool them with imperceptible perturbations. This project implements a gradient regularization technique that trains the model on both original and adversarially perturbed features, making it more robust while maintaining high accuracy on clean data.
- Gradient Regularization: Combines gradients from original and perturbed data for robust training
- Feature-Level Perturbations: Targets shallow layer statistics (mean and standard deviation) rather than input pixels
- EfficientNetB0 Backbone: Leverages pre-trained EfficientNet for feature extraction
- Mixed Precision Training: Uses automatic mixed precision for efficient training
- Adversarial Robustness: Designed to defend against attacks like FGSM, and PGD.
The system consists of three main components:
- DeepfakeDetector: EfficientNetB0-based model with separate shallow and deep feature extractors
- PerturbationInjectionModule (PIM): Applies calculated perturbations to shallow features
- Gradient Regularization Training Loop: Implements the dual-gradient optimization strategy
The training process follows a sophisticated 5-step pipeline:
- Clear existing gradients from the optimizer for each batch
- Forward pass with original, unperturbed input data
- Calculate standard empirical loss L_original
- Backward pass to compute:
- Gradients w.r.t. model parameters (stored as g₁)
- Gradients w.r.t. shallow feature statistics (μₛ, σₛ)
- Use feature statistics gradients to calculate perturbation vector
- Apply small step of size
rin gradient direction:δₗ = r × ∇L/||∇L||
- Apply perturbations to shallow features using PIM
- Forward pass with perturbed features
- Calculate perturbed loss L_perturbed
- Backward pass to compute gradients (stored as g₂)
- Combine gradients using balance coefficient α:
g_final = (1 - α) × g₁ + α × g₂ - Update model parameters with combined gradients
This project uses a combination of two major deepfake detection datasets:
- DFFD (Diverse Fake Face Dataset): A comprehensive dataset containing various types of synthetic face images
- FaceForensics++: Extracted frames from video sequences containing both real and manipulated faces using different generation methods (Deepfakes, Face2Face, FaceSwap, NeuralTextures)
The combined dataset provides diverse training examples across different deepfake generation techniques, improving the model's generalization capabilities.
- Download DFFD dataset from the official source
- Download FaceForensics++ dataset and extract frames from videos
- Combine and organize the datasets with proper train/validation splits
- Apply data augmentation with random transformations (e.g., rotations, flips, color jitter) to increase diversity and mitigate class imbalance
| Parameter | Default | Description |
|---|---|---|
R_SCALAR |
0.05 | Perturbation magnitude controlling adversarial strength |
ALPHA_COEFF |
1.0 | Balance between original (g₁) and perturbed (g₂) gradients |
LEARNING_RATE |
0.001 | SGD learning rate |
EPOCHS |
10 | Number of training epochs |
The training process automatically saves checkpoints:
- Every 2 epochs:
grad_reg/model_epoch_{epoch}.pth - Final model:
grad_reg/final-full.pth
Feature Normalization (Equation 6):
Perturbation Calculation (Equation 9):
Combined Loss (Equation 12):
This approach provides enhanced robustness against:
- FGSM (Fast Gradient Sign Method): Single-step gradient-based attacks
- PGD (Projected Gradient Descent): Multi-step iterative attacks
The deepfake detection model, enhanced by gradient regularization, achieved significant robustness against adversarial attacks, with accuracy improvements to 33.23% at ε=0.001 FGSM (vs. 25.21% baseline) and 10.27% at ε=0.020 FGSM (vs. 5.53% baseline).
An ablation study optimized the selection of perturbation layers and fine-tuned hyperparameters (balance coefficient α and approximation scalar r), yielding a high-performing configuration.
dataset_combined/
├── train/
│ ├── fake/
│ │ ├── 000_face_1.png
│ │ ├── 001_face_1.png
│ │ └── ...
│ └── real/
│ ├── 000_face_1.png
│ ├── 001_face_1.png
│ └── ...
├── validation/
│ ├── fake/
│ └── real/
└── test/
├── fake/
└── real/
-
W. Guan, W. Wang, J. Dong and B. Peng, (2024). Improving Generalization of Deepfake Detectors by Imposing Gradient Regularization, In IEEE Transactions on Information Forensics and Security, vol. 19, pp. 5345-5356.
-
M. Tan and Q. Le, (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. In Proc. Int. Conf. Mach. Learn., pp. 6105–6114.
-
H. Dang, F. Liu, J. Stehouwer, X. Liu, A. Jain, (2020). On the Detection of Digital Face Manipulation. In Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR 2020), Seattle, WA, Jun. 2020.
-
M. Abbasi, P. Váz, J. Silva and P. Martins, (2025). Comprehensive Evaluation of Deepfake Detection Models: Accuracy, Generalization, and Resilience to Adversarial Attacks. Applied Sciences, 15(3), 1225.