A powerful desktop application for efficiently managing Discogs data dumps. Download, extract, and convert Discogs datasets to CSV format with an intuitive user interface.
- 🚀 Multi-threaded Downloads: Utilizes 8 parallel threads for faster downloads
- 📦 Smart Extraction: Automatic .gz file extraction with progress tracking
- 🔄 Efficient Conversion: Streams XML to CSV with memory-efficient processing
- 📊 Real-time Progress: Live tracking of all operations with speed and time estimates
- 🎨 Modern UI: Clean, dark-themed interface with intuitive controls
- 📝 Detailed Logging: Comprehensive logging system with color-coded messages
- 💾 Flexible Storage: Customizable download location and organized file structure
- Python 3.7 or higher
- Required Python packages:
ttkbootstrap
pandas
requests
- Clone the repository:
git clone https://github.com/ofurkancoban/discogs-data-processor.git
cd discogs-data-processor
- Install required packages:
pip install -r requirements.txt
- Run the application:
python main.py
- Launch the application
- Click the Settings button to set your preferred download folder
- Default:
~/Downloads/Discogs
- A Discogs folder will be automatically created
- Default:
- Data is automatically fetched on startup
- Use "Fetch Data" button for manual updates
- View available Discogs datasets in the main table
- Select desired files using checkboxes
- Click "Download" to start multi-threaded download
- Monitor progress with real-time speed and time estimates
- Select downloaded files (.gz)
- Click "Extract" to convert to XML format
- Progress bar shows extraction status
- Select extracted files (.xml)
- Click "Convert" for CSV conversion
- Uses streaming for memory efficiency
- Delete unwanted files with "Delete" button
- View file status with ✔/✖ indicators
- Track total downloaded size
Discogs/
├── Datasets/
│ ├── YYYY-MM/
│ │ ├── discogs_YYYY-MM-DD_type.xml.gz
│ │ ├── discogs_YYYY-MM-DD_type.xml
│ │ └── discogs_YYYY-MM-DD_type.csv
│ └── ...
├── Cover Arts/
└── discogs_data.csv
└── discogs_data.log
- Multi-threaded downloading (8 threads)
- Automatic fallback to single-thread
- Built-in retry mechanism
- Real-time progress tracking
- Memory-efficient streaming parser
- Two-pass conversion:
- Column discovery
- Data extraction
- Chunking for large files
- Timestamp-based logging
- Color-coded messages
- Both UI and file logging
- Detailed operation tracking
- Python 3.7+
- ttkbootstrap for UI
- pandas for data processing
- requests for downloads
Contributions are welcome! Please feel free to submit a Pull Request.
Furkan Coban
- LinkedIn: ofurkancoban
- GitHub: ofurkancoban
- Kaggle: ofurkancoban
- Discogs for providing data dumps
- ttkbootstrap for UI components
- Icons8 for application icons
If you encounter any issues or have questions:
- Check the detailed logs in the application
- Open an issue on GitHub
- Contact through LinkedIn
Made with ❤️ by ofurkancoban