This project consists of a Reddit crawler and visualizer that collects subreddit relationships and visualizes them as interactive network graphs. The tool helps analyze connections between subreddits by crawling Reddit data and creating dynamic network visualizations.
- Systematic Reddit Crawling: Searches Reddit for subreddits matching generated search terms
- Relationship Analysis: Identifies connections between subreddits by analyzing descriptions
- Real-time Visualization: Creates interactive network graphs showing subreddit relationships
- Clustering: Groups related subreddits into clusters by search term
- High-Resolution Export: Exports visualizations in multiple resolutions (up to 16K)
- Data Persistence: Saves crawled data in JSON format for later analysis
- Optional Redis Integration: Supports real-time updates via Redis
The crawler component searches Reddit for subreddits and analyzes their relationships.
The visualization component creates interactive network graphs from the collected data.
A utility script to manage the Redis server for data communication between components.
The crawler collects the following data:
- Subreddit names: Names of subreddits matching search terms
- Related subreddits: Mentions of other subreddits in descriptions
- NSFW status: Whether subreddits are marked as NSFW
- Search term associations: Which search terms led to discovering each subreddit
All data is publicly available information from Reddit. No private user data is collected.
The visualization component represents:
- Nodes: Subreddits as network nodes
- Edges: Relationships between subreddits
- (buggy) Node size: Relevance to search terms
- (buggy) Node color: NSFW status (red for NSFW, green for SFW)
The tool provides:
- Cluster views: Groups subreddits by search terms
- Combined views: Merges multiple clusters
- Interactive exploration: Pan, zoom, and select nodes
- High-resolution exports: Generate images for publication
- Python 3.10+
- Redis server (optional, for real-time updates)
- Reddit API credentials
- Clone the repository:
git clone https://github.com/EinfachNurBaum/Subreddit-Visualization.git
# or
gh repo clone EinfachNurBaum/Subreddit-Visualization
cd reddit-graph-visualizer- Install dependencies:
pip install -r requirements.txtCreate a .env file with your Reddit API credentials:
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=your_user_agent_name
SAVE_DATA_PATH=reddit_crawl_data.json
SEARCH_TERM_LENGTH=1
SEARCH_LIMIT=10python3 main.pyAdd the '--server' flag if you want to use Redis for real-time visualization.
python visualization.py [--server] [--json path/to/data.json]./popOS_redis_manager.sh start
./popOS_redis_manager.sh status
./popOS_redis_manager.sh stopThis project is licensed under the GNU General Public License v3.0 (GPL-3.0)
This means that:
- You can use, modfy, and distribute the code freely
- If you distribute modified versions, you must:
- Make your source code available
- License your code under GPL-3.0
- Document your changes
- Multi-threaded search for maximum efficiency
- Queue-based processing of related subreddits
- Thread-safe data structures with proper locking
- TkInter GUI for monitoring and control
- Force-directed graph layout
- Customizable physics settings
- Cluster-based organization
- Browser-based interface
- High-resolution export capability
- Adaptive rendering for different data sizes
This software has only been tested on Ubuntu Linux. It may work on other Linux distributions or operating systems, but compatibility is not guaranteed. This project is developed with help of A.I.
Contributions are welcome! Please feel free to submit a Pull Request.
This tool is intended for educational and research purposes only. Please respect Reddit's terms of service and API usage guidelines.