focused fork of gallery-dl — that uses OAuth2 for secure API access, and MD5-based content deduplication to avoid re-downloading. It automatically detects and removes duplicate media files based on content (MD5 hash), ensuring you only store unique files. Built for speed (concurrent workers + polite rate-limiting) and efficiency, while keeping logs and output organized.
- Features
- Installation
- Quickstart
- Configuration
- CLI Reference
- Examples
- Output Structure
- Troubleshooting
- Contributing
- License
- OAuth2 Authentication - Secure Reddit API access using script app credentials
- MD5-Based Deduplication - Automatically detects and deletes duplicate files by content hash
- Simple & Reliable - Persistent SQLite database tracks seen content across all runs
- Gallery Support - Automatic expansion and host-specific URL normalization
- Flexible Output - Organized downloads with customizable directory structure
- Comprehensive Logging - Detailed audit trails for all download activities
- High Performance - Parallel, rate-limited downloads for maximum speed
- Python 3.8 or higher
requestslibrary (automatically installed)
For local development with editable installation:
git clone https://github.com/qasimbilalstack/reddit-dl.git
cd reddit-dl
python -m pip install -e .Install the latest version directly:
python -m pip install "git+https://github.com/qasimbilalstack/reddit-dl.git"Recommended approach to avoid dependency conflicts:
# Create and activate virtual environment
python -m venv reddit-dl-env
source reddit-dl-env/bin/activate # On Windows: reddit-dl-env\Scripts\activate
# Install reddit-dl
python -m pip install -e .Install as an isolated command-line tool:
python -m pip install --user pipx
python -m pipx ensurepath
pipx install git+https://github.com/qasimbilalstack/reddit-dl.gitExecute directly as a Python module:
python -m reddit_dl.extractor --config config.json <urls>If you installed using the development method (git clone + pip install -e .):
cd reddit-dl
git pull origin main
python -m pip install -e . --upgradeIf you installed directly from GitHub:
python -m pip install --upgrade "git+https://github.com/qasimbilalstack/reddit-dl.git"pipx upgrade reddit-dlOr reinstall to ensure latest version:
pipx uninstall reddit-dl
pipx install git+https://github.com/qasimbilalstack/reddit-dl.git- Visit Reddit App Preferences
- Click "Create App" or "Create Another App"
- Select "script" as the application type
- Note your client ID (under the app name) and client secret
Copy the example configuration and add your credentials:
cp config.example.json config.jsonEdit config.json with your Reddit app credentials:
Download media from a Reddit user:
reddit-dl --config config.json "https://www.reddit.com/user/SomeUser/"Files will be saved to downloads/ with organized subfolders and detailed logs in downloads/logs.txt.
| Parameter | Description | Default |
|---|---|---|
client_id |
Reddit app client ID | Required |
client_secret |
Reddit app client secret | Required |
username |
Reddit username | Required |
password |
Reddit password | Required |
user_agent |
Custom user agent string | reddit-dl/0.1 |
output_dir |
Download directory | downloads |
token_cache |
Path to OAuth token cache file | ~/.reddit_dl_tokens.json |
max_posts |
Default maximum posts per source | Unlimited |
default_max_posts |
Default max posts when no --max-posts or --all | 1000 |
md5_save_interval |
MD5 database checkpoint frequency (saves after N downloads) | 10 |
parallel_downloads |
Number of parallel downloads | 4 |
requests_per_second |
Rate limit for download requests (per second) | 4.0 |
Recommended conservative presets (choose one based on your environment):
- Gentle (very low load):
parallel_downloads: 1,requests_per_second: 1.0 - Conservative (recommended):
parallel_downloads: 2,requests_per_second: 1.0 - Balanced (default):
parallel_downloads: 4,requests_per_second: 4.0
Configuration example all available options:
{
"extractor": {
"reddit": {
"oauth": {
"client_id": "YOUR_CLIENT_ID",
"client_secret": "YOUR_CLIENT_SECRET",
"username": "YOUR_REDDIT_USERNAME",
"password": "YOUR_REDDIT_PASSWORD"
},
"user_agent": "reddit-dl/0.1 by YOUR_USERNAME",
"output_dir": "downloads",
"md5_save_interval": 10,
"token_cache": "~/.reddit_dl_tokens.json",
"default_max_posts": 1000,
"parallel_downloads": 2,
"requests_per_second": 1.0
}
}
}reddit-dl [OPTIONS] URLS...urls - One or more Reddit URLs to process
Supported URL formats:
- User pages:
https://www.reddit.com/user/USERNAME/ - Subreddits:
https://www.reddit.com/r/SUBREDDIT/ - Individual posts:
https://www.reddit.com/r/SUBREDDIT/comments/POST_ID/ - Shortened URLs:
https://redd.it/POST_ID
| Option | Description |
|---|---|
-h, --help |
Show help message and exit |
-c, --config CONFIG |
Path to configuration JSON file |
--debug |
Enable debug logging output (also bypasses MD5 deduplication to see all downloads) |
| Option | Description |
|---|---|
-u, --user USER |
Reddit username(s) to fetch (comma-separated or repeat flag) |
-r, --subreddit SUBREDDIT |
Subreddit name(s) to fetch (comma-separated or repeat flag) |
-p, --postid POSTID |
Post ID(s) to fetch (comma-separated or repeat flag) |
| Option | Description |
|---|---|
--output OUTPUT_DIR, -o OUTPUT_DIR |
Output directory for downloads (overrides config file setting) |
--max-posts MAX_POSTS |
Maximum number of posts to fetch |
--all |
Fetch all available posts (follow pagination) |
--per-page N |
Number of posts to request per page when paginating (default: 100, max: 100) |
--sort {hot,new,top,rising,best} |
Listing sort order to request from Reddit (default: new) |
--force |
Retry previously failed downloads (Note: MD5 deduplication always runs) |
--retry-failed |
Retry previously failed downloads |
--clear-failed |
Clear the failed URLs tracking database |
--prefer-mp4 |
Prefer MP4 video format when available (adds ?format=mp4 to compatible URLs) |
| Option | Description |
|---|---|
--save-interval N |
Save MD5 database every N downloads (default: 10) |
| Option | Description |
|---|---|
--save-json |
Save per-post metadata JSON files (disabled by default for faster downloads) |
--save-meta-only |
Only save per-post metadata JSON files; do not download media files |
--comments |
Fetch comments in addition to submissions (disabled by default). Without this flag only submissions are fetched (uses /submitted/ URLs) |
| Option | Description |
|---|---|
--save-bio |
Fetch user profile bio(s) and save compact JSON into <outdir>/user_bio (for --user) |
--only-verified |
When specified with --user or --subreddit, only process users/posts whose profile has verified: true |
Download recent posts from a user:
reddit-dl --config config.json "https://www.reddit.com/user/SomeUser/"
# Or using the --user flag:
reddit-dl --config config.json --user SomeUserDownload to a specific directory:
# Use short form (-o)
reddit-dl --config config.json --user SomeUser -o /path/to/downloads
# Use long form (--output)
reddit-dl --config config.json --user SomeUser --output ./my_reddit_contentDownload from a subreddit:
reddit-dl --config config.json "https://www.reddit.com/r/earthporn/"
# Or using the --subreddit flag:
reddit-dl --config config.json --subreddit earthporn
# Download top posts from a subreddit:
reddit-dl --config config.json --sort top --subreddit earthpornDownload a specific post:
reddit-dl --config config.json "https://www.reddit.com/r/pics/comments/abc123/..."
# Or using the --postid flag:
reddit-dl --config config.json --postid abc123Download all available posts from multiple sources:
reddit-dl --config config.json --all \
"https://www.reddit.com/user/User1/" \
"https://www.reddit.com/r/subreddit1/" \
"https://www.reddit.com/r/subreddit2/"
# Or using flags (can mix and match):
reddit-dl --config config.json --all \
--user User1,User2 \
--subreddit subreddit1,subreddit2Download from multiple users and subreddits:
# Using comma-separated lists (recommended):
reddit-dl --config config.json \
--user User1,User2,User3 \
--subreddit pics,funny,aww \
--postid abc123,def456
# Or using repeated flags:
reddit-dl --config config.json \
--user User1 --user User2 \
--subreddit pics --subreddit funny \
--postid abc123 --postid def456Limit downloads and enable debug logging:
reddit-dl --config config.json --max-posts 50 --debug \
"https://www.reddit.com/user/SomeUser/"
# Or with flags:
reddit-dl --config config.json --max-posts 50 --debug --user SomeUserForce re-download with custom save interval:
reddit-dl --config config.json --force --save-interval 1 \
"https://www.reddit.com/user/SomeUser/"Retry failed downloads from previous sessions:
reddit-dl --config config.json --retry-failedDownload with custom sort order and pagination:
# Download top posts with custom page size
reddit-dl --config config.json --sort top --per-page 50 \
"https://www.reddit.com/r/earthporn/"
# Download hot posts without metadata JSON files (faster)
reddit-dl --config config.json --sort hot \
"https://www.reddit.com/user/SomeUser/"
# Download only submissions (comments disabled by default) from multiple users
reddit-dl --config config.json \
--user User1,User2,User3Process multiple URLs from a file:
# Create URL list
cat > urls.txt << EOF
https://www.reddit.com/user/User1/
https://www.reddit.com/user/User2/
https://www.reddit.com/r/subreddit1/
EOF
# Process all URLs
xargs -I {} reddit-dl --config config.json {} < urls.txtProcess multiple sources using flags:
# Download from multiple users and subreddits in one command (recommended):
reddit-dl --config config.json \
--user User1,User2,User3 \
--subreddit pics,funny,aww
# Or using repeated flags:
reddit-dl --config config.json \
--user User1 --user User2 --user User3 \
--subreddit pics --subreddit funny --subreddit aww
# Mix URLs and flags:
reddit-dl --config config.json \
"https://www.reddit.com/user/SpecialUser/" \
--subreddit earthporn,wallpapers \
--postid abc123,def456Downloaded files are organized as follows:
downloads/
├── .md5_index.sqlite # MD5 deduplication database
├── logs.txt # Comprehensive download logs
├── u_USERNAME/ # User downloads
│ ├── POST_ID.jpg # Media files
│ ├── POST_ID.json # Metadata
│ └── POST_ID_1.jpg # Additional media from galleries
└── r_SUBREDDIT/ # Subreddit downloads
├── POST_ID.mp4
├── POST_ID.json
└── ...
reddit-dl uses content-based deduplication to ensure you never store duplicate media files:
- Download - File is downloaded to disk
- Calculate MD5 - Content hash is computed for the file
- Check Database - MD5 is looked up in
.md5_index.sqlite - Decision:
- If MD5 exists → File is deleted immediately (duplicate detected)
- If MD5 is new → File is kept, MD5 added to database
✅ Content-Based - Detects duplicates even if filenames differ
✅ Persistent - Database survives across all runs
✅ Automatic - No configuration needed, always active
✅ Efficient - Only unique content stored on disk
✅ Cross-Post Detection - Same image posted to multiple subreddits = stored once
First Run:
reddit-dl --config config.json --all --user SomeUser
# Result: 102 items → 30 unique files kept, 72 duplicates deleted
# Files on disk: 30 (all unique)
# Database: 30 MD5 hashesSecond Run (Same Command):
reddit-dl --config config.json --all --user SomeUser
# Result: 102 items downloaded, all detected as duplicates, all deleted
# Files on disk: 30 (unchanged)
# Note: Files are downloaded then immediately deleted if duplicateThe MD5 index is stored at downloads/.md5_index.sqlite by default. This file:
- Tracks all MD5 hashes of downloaded content
- Persists across runs and system restarts
- Can be safely deleted to reset deduplication tracking
- Is automatically checkpointed based on
--save-intervalsetting
--force Flag:
- Bypasses the failed URL check (retries previously failed downloads)
- MD5 deduplication still runs normally
- Useful for recovering from incomplete downloads
--debug Flag:
- Enables verbose logging output
- Also bypasses MD5 deduplication (keeps all files even if duplicates)
- Useful for testing, debugging, and verifying content differences
- Files are downloaded and kept on disk without duplicate deletion
- MD5 hashes are still recorded in database for future runs
Important: In normal operation (without --debug), MD5 deduplication always runs to ensure:
- You never store duplicate content
- Storage remains efficient
- Only unique files are kept
````## Troubleshooting
### Common Issues
#### Command Not Found
```bash
reddit-dl: command not foundSolutions:
- Ensure virtual environment is activated:
source venv/bin/activate - Verify installation:
pip list | grep reddit-dl - Check PATH configuration for pipx installations
- Use module execution:
python -m reddit_dl.extractor
HTTP 403: Forbidden
Solutions:
- Verify Reddit app credentials in
config.json - Ensure app type is set to "script" in Reddit preferences
- Check username and password are correct
- Confirm client ID and secret are accurate
Files appear to re-download unnecessarily
Solutions:
- Check
downloads/logs.txtfor detailed information - Verify MD5 database integrity
- Use
--debugfor verbose output
Downloads are slow or timing out
Solutions:
- Reduce
--max-postsfor testing - Omit
--save-jsonto skip metadata writing (faster) - Use
--per-pagewith smaller values (e.g., 25) for better rate limiting - Check network connectivity
- Monitor Reddit API rate limits
Enable comprehensive logging:
reddit-dl --config config.json --debug "https://www.reddit.com/user/SomeUser/"This provides detailed information about:
- Authentication status
- URL processing
- File deduplication decisions
- Download progress
- Error conditions
Check downloads/logs.txt for audit trails:
# View recent activity
tail -f downloads/logs.txt
# Search for errors
grep -i error downloads/logs.txt
# Check specific user downloads
grep "u_SomeUser" downloads/logs.txtIf you encounter issues:
- Enable debug mode and check logs
- Verify configuration against
config.example.json - Test with a small
--max-postsvalue - Check Reddit app settings and permissions
- Review GitHub issues for similar problems
- Create a new issue with debug output
We welcome contributions! Please follow these guidelines:
- Fork the repository
- Clone your fork locally
- Create a virtual environment
- Install in development mode
git clone https://github.com/YOUR_USERNAME/reddit-dl.git
cd reddit-dl
python -m venv venv
source venv/bin/activate
pip install -e .- Follow PEP 8 style guidelines
- Add docstrings for public functions
- Include type hints where appropriate
- Write descriptive commit messages
- Test changes with various Reddit URL types
- Verify OAuth authentication works
- Check deduplication functionality
- Test edge cases and error conditions
- Create a feature branch from
main - Make focused, atomic commits
- Include tests for new functionality
- Update documentation as needed
- Submit pull request with clear description
When reporting bugs, include:
- Python version and operating system
- Full command used and configuration
- Complete error output with
--debug - Steps to reproduce the issue
This project is derived from gallery-dl and maintains compatibility with its licensing. The original gallery-dl project is licensed under the GNU General Public License v2.0.
- This project: GPL-2.0 License (following gallery-dl)
- Dependencies: Various licenses (see requirements)
- Reddit API: Subject to Reddit's Terms of Service
Special thanks to the gallery-dl project and its contributors for providing the foundation for this focused Reddit extractor.
For complete license text, see the LICENSE file in the repository.