Skip to content

josego85/pdf-content-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

31 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PDF Content Search

Version PHP Version Symfony Version Elasticsearch Kibana Vue.js Tailwind CSS PostgreSQL Node.js Docker PHP-CS-Fixer License: GPL-3.0

A Symfony application to search content within PDF files using Elasticsearch and Vue.js.

Table of Contents

Features

  • πŸ“„ Page-Level PDF Search
  • πŸ” Real-time Search Results
  • 🎯 Content Highlighting (Exact matches only)
  • πŸ“Š Relevance Scoring
  • πŸ“± Responsive Design
  • πŸš€ Fast Elasticsearch Backend
  • πŸ”„ Automatic PDF Processing
  • πŸ“‹ Page Context Display
  • πŸ”— Direct PDF Page Links
  • πŸ“ˆ Search Analytics via Kibana

Description

This application allows users to search for content within PDF files using Elasticsearch for efficient text searching and indexing, with a modern Vue.js frontend.

Technologies

  • PHP 8.4.11
  • Symfony 7.3.2
  • Elasticsearch 8.17.1
  • Kibana 8.17.1
  • Vue.js 3.5.x
  • Tailwind CSS 3.4.x
  • Docker 27.5.1 & Docker Compose
  • Node.js 22.x
  • PostgreSQL 16
  • Apache 2.4

Requirements

  • Docker 27.5.1 and Docker Compose
  • PHP 8.4.11
  • Composer 2.x
  • Node.js 22.x and npm
  • pdftotext utility (poppler-utils)
  • At least 4GB RAM (for Elasticsearch)

Installation

  1. Clone the repository:
git clone [email protected]:yourusername/pdf-content-search.git
cd pdf-content-search
  1. Install dependencies:
composer install
npm install
  1. Install pdftotext utility:
sudo apt-get install poppler-utils
  1. Build frontend assets:
npm run dev

Docker Setup

  1. Build and start the containers:
docker compose up -d --build
  1. Verify containers are running:
docker compose ps
  1. Access services:

Configuration

Environment Variables

# PostgreSQL
POSTGRES_DB=app
POSTGRES_PASSWORD=!ChangeMe!
POSTGRES_USER=app
POSTGRES_VERSION=16

# Elasticsearch
ELASTICSEARCH_HOST=http://elasticsearch:9200

Docker Services

  • apache: HTTP Server (2.4)
  • php: PHP-FPM 8.4.11
  • elasticsearch: Search Engine (8.17.1)
  • kibana: Analytics Dashboard (8.17.1)
  • database: PostgreSQL 16

PDF Management

  1. Create PDF directories:
mkdir -p public/pdfs
  1. Place your PDFs in public/pdfs/

  2. Index the PDFs:

docker compose exec php bin/console app:index-pdfs

Usage

  1. Access the application at http://localhost
  2. Use the search bar to find content in PDFs
  3. Results will show:
    • PDF filename
    • Page number
    • Content context
    • Highlighted matches
    • Direct link to PDF page

Development

  1. Start development environment:
docker compose up -d
npm run watch
  1. Run tests:
docker compose exec php bin/phpunit
  1. Check code style:
# Check for violations without fixing
docker compose exec php vendor/bin/php-cs-fixer fix --dry-run

# Check with detailed diff output
docker compose exec php vendor/bin/php-cs-fixer fix --dry-run --diff

# Fix code style violations
docker compose exec php vendor/bin/php-cs-fixer fix
  1. Frontend Development:
    • Components in assets/components/
    • Styles in assets/css/
    • Build: npm run build
    • Watch: npm run watch

Elasticsearch

  1. Check cluster health:
curl http://localhost:9200/_cluster/health
  1. View indices:
curl http://localhost:9200/_cat/indices
  1. Monitor with Kibana:
    • Access Kibana at http://localhost:5601
    • View index management
    • Monitor cluster health
    • Analyze search performance

Maintenance

  1. Clear caches:
docker compose exec php bin/console cache:clear
  1. Update dependencies:
docker compose exec php composer update
docker compose exec php npm update
  1. Rebuild containers:
docker compose down
docker compose build --no-cache
docker compose up -d

Troubleshooting

  1. Elasticsearch Issues:
# Check health
docker compose exec elasticsearch curl -X GET "localhost:9200/_cluster/health"
# View logs
docker compose logs elasticsearch
  1. Frontend Issues:
# Clear cache
npm cache clean --force
# Rebuild
npm run build
  1. PDF Indexing Issues:
# Check directory
ls public/pdfs/
# Verbose indexing
docker compose exec php bin/console app:index-pdfs -vv

Security

  • Change default PostgreSQL credentials
  • Enable Elasticsearch security in production
  • Configure HTTPS for production
  • Set proper file permissions

Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/AmazingFeature)
  3. Commit changes (git commit -m 'Add AmazingFeature')
  4. Push branch (git push origin feature/AmazingFeature)
  5. Open Pull Request

License

Licensed under GNU General Public License v3.0 - see LICENSE file.

About

πŸ“„πŸ” Search PDF content with Symfony + Elasticsearch. Extract, search. πŸš€

Resources

License

Stars

Watchers

Forks

Packages

No packages published