A powerful Python application for searching text within files across various file types and directories. The application provides both a modern GUI interface and a command-line interface for flexible text searching capabilities.
- Microsoft Office: Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx), Access, Outlook, Publisher, Visio, OneNote
- Text & Code: Python, Java, C++, JavaScript, HTML, CSS, XML, JSON, YAML, Markdown, and many more
- Documents: PDF, EPUB, MOBI, RTF, OpenDocument formats, Pages, Numbers, Keynote
- Images: PNG, JPG, GIF, BMP, TIFF, WebP, SVG, AI, EPS, RAW formats
- Archives: ZIP, RAR, 7Z, TAR, GZ, BZ2, and more
- Media: MP3, WAV, MP4, AVI, MKV, MOV, and other audio/video formats
- Databases: SQLite, MDB, SQL, DBF, and backup files
- Virtual Machines: VMDK, VDI, VHD, OVA, and other VM formats
- Dark theme with professional styling
- Real-time progress tracking with progress bar
- File counter showing scanned files
- Organized file extension selection by categories
- Manual extension input for custom file types
- Search results displayed in real-time
- Start/Stop search functionality
- Case-insensitive text searching
- Recursive directory traversal
- Custom file extension filtering
- Non-predefined extension searching
- Directory name searching
- Progress callbacks for GUI integration
- Comprehensive error handling
- Detailed logging with timestamps
- Log files stored in
logs/
directory - Error tracking and debugging information
- Search operation monitoring
- Python 3.7 or higher
- Required Python packages (see requirements below)
pip install PyQt6 docx2txt alive-progress
PyQt6
- GUI frameworkdocx2txt
- Microsoft Word document text extractionalive-progress
- Progress bar functionality
Run the graphical interface:
python Word_Crawler_GUI.py
GUI Features:
- Directory Selection: Browse and select the directory to search
- Keyword Input: Enter the text you want to search for
- File Extensions: Choose specific file types to search
- Use category checkboxes to select all extensions in a category
- Add custom extensions manually (semicolon-separated)
- Enable searching non-predefined extensions
- Search Control: Start or stop the search operation
- Results: View real-time search results and progress
Run the command-line interface:
python Word_Crawler.py
CLI Features:
- Interactive prompts for directory and keyword input
- Automatic file type detection and searching
- Progress indication during search
- Summary of search results
Word_Searcher/
├── Word_Crawler_GUI.py # Main GUI application
├── Word_Crawler.py # Command-line application
├── logs/ # Log files directory
│ └── word_searcher_*.log # Timestamped log files
└── README.md # This documentation
The application organizes file extensions into logical categories:
- Microsoft Office: All Office suite file formats
- Text and Code: Programming and markup languages
- Documents: PDF and e-book formats
- Images: Raster and vector image formats
- Archives: Compression and archive formats
- Media: Audio and video file formats
- Databases: Database and backup files
- Virtual Machines: VM disk and configuration files
You can add custom file extensions by:
- Using the manual extension input field in the GUI
- Entering extensions separated by semicolons (e.g.,
.custom;.ext
) - Enabling "Search files with non-predefined extensions" option
The application includes comprehensive error handling for:
- File permission issues
- Corrupted or unreadable files
- Unicode decoding errors
- Network drive access problems
- Invalid file paths
All operations are logged with detailed information:
- Search progress and results
- Error messages and stack traces
- File processing statistics
- User actions and system events
Log files are automatically created in the logs/
directory with timestamps.
- Efficient File Processing: Optimized file reading and text extraction
- Progress Tracking: Real-time progress updates for long-running searches
- Memory Management: Stream-based file reading to handle large files
- Multi-threading: GUI operations run in separate threads to prevent freezing
- Permission Errors: Ensure you have read access to the target directory
- Missing Dependencies: Install required packages using pip
- Large File Searches: Use specific file extensions to limit search scope
- GUI Not Starting: Check PyQt6 installation and Python version
Check the log files in the logs/
directory for detailed error information and debugging details.
This is a complete, self-contained application. The codebase includes:
- Object-oriented design with clear separation of concerns
- Comprehensive error handling and logging
- Modern GUI with PyQt6
- Extensible file type support
- Clean, documented code structure
This project is provided as-is for educational and personal use.
- Current Version: Complete implementation with GUI and CLI interfaces
- Features: Comprehensive file type support, modern GUI, logging, error handling
- Status: Production-ready application
Note: This application is designed for searching text content within files. It does not modify or alter the original files in any way - it only reads and searches through their content.