Skip to content

For anyone who's looking for a python program which can find any word in any type of file (literally) you've come to the right repository. I was able to create the heart of this script then utilize some vibe coding to spin-up the GUI. Enjoy!

Notifications You must be signed in to change notification settings

dalin-sourcecode/Word_Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Word Searcher

A powerful Python application for searching text within files across various file types and directories. The application provides both a modern GUI interface and a command-line interface for flexible text searching capabilities.

Features

🔍 Comprehensive File Type Support

  • Microsoft Office: Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx), Access, Outlook, Publisher, Visio, OneNote
  • Text & Code: Python, Java, C++, JavaScript, HTML, CSS, XML, JSON, YAML, Markdown, and many more
  • Documents: PDF, EPUB, MOBI, RTF, OpenDocument formats, Pages, Numbers, Keynote
  • Images: PNG, JPG, GIF, BMP, TIFF, WebP, SVG, AI, EPS, RAW formats
  • Archives: ZIP, RAR, 7Z, TAR, GZ, BZ2, and more
  • Media: MP3, WAV, MP4, AVI, MKV, MOV, and other audio/video formats
  • Databases: SQLite, MDB, SQL, DBF, and backup files
  • Virtual Machines: VMDK, VDI, VHD, OVA, and other VM formats

🎨 Modern GUI Interface

  • Dark theme with professional styling
  • Real-time progress tracking with progress bar
  • File counter showing scanned files
  • Organized file extension selection by categories
  • Manual extension input for custom file types
  • Search results displayed in real-time
  • Start/Stop search functionality

Advanced Search Capabilities

  • Case-insensitive text searching
  • Recursive directory traversal
  • Custom file extension filtering
  • Non-predefined extension searching
  • Directory name searching
  • Progress callbacks for GUI integration
  • Comprehensive error handling

📊 Logging & Monitoring

  • Detailed logging with timestamps
  • Log files stored in logs/ directory
  • Error tracking and debugging information
  • Search operation monitoring

Installation

Prerequisites

  • Python 3.7 or higher
  • Required Python packages (see requirements below)

Dependencies

pip install PyQt6 docx2txt alive-progress

Required Packages

  • PyQt6 - GUI framework
  • docx2txt - Microsoft Word document text extraction
  • alive-progress - Progress bar functionality

Usage

GUI Mode (Recommended)

Run the graphical interface:

python Word_Crawler_GUI.py

GUI Features:

  1. Directory Selection: Browse and select the directory to search
  2. Keyword Input: Enter the text you want to search for
  3. File Extensions: Choose specific file types to search
    • Use category checkboxes to select all extensions in a category
    • Add custom extensions manually (semicolon-separated)
    • Enable searching non-predefined extensions
  4. Search Control: Start or stop the search operation
  5. Results: View real-time search results and progress

Command Line Mode

Run the command-line interface:

python Word_Crawler.py

CLI Features:

  • Interactive prompts for directory and keyword input
  • Automatic file type detection and searching
  • Progress indication during search
  • Summary of search results

File Structure

Word_Searcher/
├── Word_Crawler_GUI.py          # Main GUI application
├── Word_Crawler.py              # Command-line application
├── logs/                        # Log files directory
│   └── word_searcher_*.log     # Timestamped log files
└── README.md                    # This documentation

Configuration

File Extension Categories

The application organizes file extensions into logical categories:

  • Microsoft Office: All Office suite file formats
  • Text and Code: Programming and markup languages
  • Documents: PDF and e-book formats
  • Images: Raster and vector image formats
  • Archives: Compression and archive formats
  • Media: Audio and video file formats
  • Databases: Database and backup files
  • Virtual Machines: VM disk and configuration files

Custom Extensions

You can add custom file extensions by:

  1. Using the manual extension input field in the GUI
  2. Entering extensions separated by semicolons (e.g., .custom;.ext)
  3. Enabling "Search files with non-predefined extensions" option

Error Handling

The application includes comprehensive error handling for:

  • File permission issues
  • Corrupted or unreadable files
  • Unicode decoding errors
  • Network drive access problems
  • Invalid file paths

Logging

All operations are logged with detailed information:

  • Search progress and results
  • Error messages and stack traces
  • File processing statistics
  • User actions and system events

Log files are automatically created in the logs/ directory with timestamps.

Performance

  • Efficient File Processing: Optimized file reading and text extraction
  • Progress Tracking: Real-time progress updates for long-running searches
  • Memory Management: Stream-based file reading to handle large files
  • Multi-threading: GUI operations run in separate threads to prevent freezing

Troubleshooting

Common Issues

  1. Permission Errors: Ensure you have read access to the target directory
  2. Missing Dependencies: Install required packages using pip
  3. Large File Searches: Use specific file extensions to limit search scope
  4. GUI Not Starting: Check PyQt6 installation and Python version

Log Analysis

Check the log files in the logs/ directory for detailed error information and debugging details.

Contributing

This is a complete, self-contained application. The codebase includes:

  • Object-oriented design with clear separation of concerns
  • Comprehensive error handling and logging
  • Modern GUI with PyQt6
  • Extensible file type support
  • Clean, documented code structure

License

This project is provided as-is for educational and personal use.

Version History

  • Current Version: Complete implementation with GUI and CLI interfaces
  • Features: Comprehensive file type support, modern GUI, logging, error handling
  • Status: Production-ready application

Note: This application is designed for searching text content within files. It does not modify or alter the original files in any way - it only reads and searches through their content.

About

For anyone who's looking for a python program which can find any word in any type of file (literally) you've come to the right repository. I was able to create the heart of this script then utilize some vibe coding to spin-up the GUI. Enjoy!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages