🚀 JobToday Candidate Scraper

A powerful, user-friendly browser extension and API solution for automating candidate data collection from JobToday with AI-powered chat summaries and seamless integrations.

📋 Table of Contents

Overview
Features
Screenshots
Architecture
Installation
Quick Start
Browser Extension
API Documentation
Configuration
Deployment
Contributing
License

🎯 Overview

JobToday Candidate Scraper is a comprehensive solution that automates the collection of candidate information from JobToday.com. It consists of:

🔌 Browser Extension: A sleek, user-friendly Chrome extension with a modern UI that requires zero technical knowledge
⚙️ Backend API: A robust Flask API with Playwright-based web scraping
🤖 AI Integration: Automatic chat history summarization using OpenAI GPT-4o-mini
📊 Data Management: Seamless integration with Airtable and n8n for automated workflows

Perfect for recruiters and HR professionals who want to streamline their candidate data collection process without writing a single line of code.

✨ Features

🎨 Browser Extension

Zero-Config Setup: Guided onboarding flow that walks you through setup in minutes
Beautiful UI: Modern, sleek interface with gradient designs and smooth animations
Real-Time Dashboard: Live stats showing total candidates, new today, and candidates with chat
Candidate Cards: Visual cards displaying key information at a glance
Chat Visualization: Beautiful chat-bubble interface for viewing conversation history
Search & Filter: Quickly find candidates by name, phone, location, or filter by chat availability
Detailed Profiles: Comprehensive candidate views with tabs for Overview, Chat, and Experience

⚡ Backend Scraper

Automated Data Collection: Extracts comprehensive candidate information including:
- Personal details (name, phone, email, location)
- Professional experience and work history
- Certificates and qualifications
- Languages spoken
- Complete chat conversation history
AI-Powered Summaries: Automatically generates concise chat summaries using OpenAI
Smart Duplicate Prevention: Checks Airtable to avoid duplicate entries
Session Management: Persistent login sessions reduce authentication overhead
Error Handling: Robust retry logic for failed operations
Progress Tracking: Real-time progress updates during scraping

🔗 Integrations

Airtable: Automatic syncing of new candidates to your database
n8n: Webhook integration for custom automation workflows
Local Export: JSON and CSV exports for backup and analysis

📸 Features in Action

Browser Extension Dashboard

The extension features a modern, gradient-based dashboard displaying:

Real-time Statistics: Total candidates, candidates with chat history, and new candidates today
Recent Candidates Preview: Quick access to the 5 most recent candidates with visual cards
Progress Tracking: Live progress updates during scraping operations
Quick Actions: One-click scraping and settings access

Candidate List View

Search Functionality: Instantly find candidates by name, phone number, or location
Smart Filters: Filter by "All", "With Chat", or "New" candidates
Visual Cards: Each candidate card displays key information with chat indicators
Smooth Scrolling: Efficient navigation through large candidate lists

Candidate Detail View

Comprehensive candidate profiles organized into three intuitive tabs:

Overview Tab
- Contact information (phone, email, location)
- Personal "About" section
- Languages spoken
- Certificates and qualifications
- Application date and job role
Chat Tab
- Beautiful chat-bubble interface with color-coded messages
- Candidate messages (left-aligned, white bubbles)
- Recruiter messages (right-aligned, gradient bubbles)
- System messages (centered, highlighted)
- Timestamps for each message
- AI-generated chat summary at the top
- Date separators for conversation organization
Experience Tab
- Formatted work experience
- Company names and roles
- Employment dates and durations
- Detailed job descriptions

Onboarding Flow

A guided, step-by-step setup process:

Welcome Screen: Introduction to the extension
Credentials Setup: Secure JobToday login configuration
Job ID Configuration: Easy-to-follow instructions for finding your Job ID
Optional Integrations: Airtable, n8n, and OpenAI setup (all optional)
Validation: Real-time testing of credentials and connections
Summary: Overview of your configuration before completion

🏗️ Architecture

┌─────────────────────────────────────────────────────────┐
│              Browser Extension (Chrome)                 │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
│  │   Dashboard  │  │ Candidate    │  │   Settings   │ │
│  │    View      │  │ Detail View  │  │ & Onboarding │ │
│  └──────────────┘  └──────────────┘  └──────────────┘ │
└─────────────────────────────────────────────────────────┘
                        ↕️ HTTP REST API
┌─────────────────────────────────────────────────────────┐
│              Flask Backend API                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
│  │  API Routes  │  │   Scraper    │  │   Config     │ │
│  │   /api/*     │  │   Engine     │  │   Storage    │ │
│  └──────────────┘  └──────────────┘  └──────────────┘ │
└─────────────────────────────────────────────────────────┘
                        ↕️ Playwright
┌─────────────────────────────────────────────────────────┐
│              JobToday.com                               │
│         (Web Scraping Target)                           │
└─────────────────────────────────────────────────────────┘
                        ↕️ Integrations
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   Airtable   │  │     n8n      │  │    OpenAI    │
│   Database   │  │   Webhooks   │  │     API      │
└──────────────┘  └──────────────┘  └──────────────┘

🚀 Installation

Prerequisites

Python 3.8+ - Download Python
pip - Usually included with Python
Chrome Browser - Required for the extension
Git - For cloning the repository (optional)

Backend Setup

Clone the repository

git clone https://github.com/yourusername/jobtoday-scraper.git
cd jobtoday-scraper

Create a virtual environment

Windows:

python -m venv venv
.\venv\Scripts\activate

macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```
Install Playwright browsers
```
playwright install chromium
```

Configure environment variables

Create a .env file in the project root:

# JobToday Credentials
JOBTODAY_EMAIL=your-email@example.com
JOBTODAY_PASSWORD=your-password

# Job ID (optional, can be set via extension)
JOB_ID=p3j9ox

# Airtable Configuration (optional)
AIRTABLE_PAT=your-airtable-token
AIRTABLE_BASE_ID=your-base-id
AIRTABLE_TABLE_NAME=Candidates

# n8n Webhook URL (optional)
N8N_WEBHOOK_URL=https://your-n8n-webhook-url

# OpenAI API Key (optional, for chat summaries)
OPENAI_API_KEY=sk-your-openai-key

Browser Extension Setup

Load the extension in Chrome

# Navigate to Chrome Extensions page
# chrome://extensions/

# Steps:
# 1. Enable "Developer mode" (toggle in top-right corner)
# 2. Click "Load unpacked" button
# 3. Select the 'extension' folder from this project

Configure the extension
- Click the JobToday Scraper icon in your Chrome toolbar
- Follow the guided 5-step onboarding flow:
  1. Welcome - Get introduced to the extension
  2. Credentials - Enter your JobToday email and password
  3. Job ID - Enter your job posting ID (found in the URL)
  4. Integrations - Optionally set up Airtable, n8n, and OpenAI
  5. Complete - Review your configuration and finish setup
Note: The backend URL is automatically configured. If running locally, it defaults to http://localhost:5001.

🎬 Quick Start

Starting the Backend Server

macOS/Linux:

# Navigate to project directory
cd jobtoday-scraper

# Activate virtual environment
source venv/bin/activate

# Install dependencies (if not already done)
pip install -r requirements.txt
playwright install chromium

# Start the Flask API server
python scraper_api.py

Windows:

# Navigate to project directory
cd jobtoday-scraper

# Activate virtual environment
.\venv\Scripts\activate

# Install dependencies (if not already done)
pip install -r requirements.txt
playwright install chromium

# Start the Flask API server
python scraper_api.py

The server will start on http://localhost:5001 by default (or the port specified in the PORT environment variable).

Using the Browser Extension

Start the backend server (see above)
Click the extension icon in your Chrome toolbar
Complete onboarding if this is your first time (or click Settings to reconfigure)
View your dashboard - You'll see statistics and recent candidates
Browse all candidates - Click "View All →" to see the full candidate list
Search and filter - Use the search bar and filter buttons to find specific candidates
View candidate details - Click any candidate card to see their full profile and chat history
Start scraping - Click the "Start Scraping" button to initiate a new scraping session
Monitor progress - Watch real-time progress updates on the dashboard

🌐 Browser Extension

Features Overview

The browser extension provides a complete, user-friendly interface for managing your candidate scraping workflow:

Dashboard: Overview of all candidates with quick statistics
Candidate List: Browse, search, and filter all scraped candidates
Candidate Detail: Comprehensive profile view with:
- Contact information and location
- Professional experience and qualifications
- Full chat conversation history with AI summary
- Languages and certificates

Key Benefits

✅ No command-line knowledge required
✅ Visual progress tracking
✅ Instant access to candidate data
✅ Beautiful chat interface
✅ Mobile-friendly design

📡 API Documentation

Base URL

http://localhost:5001

Endpoints

Health Check

GET /health

Returns the health status of the API.

Response:

{
  "status": "healthy",
  "service": "JobToday Scraper API",
  "timestamp": "2025-01-08T23:32:10.760000"
}

Get Status

GET /status

Returns the current scraper status and progress.

Response:

{
  "status": "idle",
  "last_run": "2025-01-08T20:00:00",
  "candidates_count": 58,
  "progress": {
    "section": "recommended",
    "candidate": "John Doe",
    "processed": 25,
    "total": 58
  }
}

Trigger Scraping

POST /trigger-scrape
Content-Type: application/json

Initiates a scraping process. Configuration can be provided in the request body or will use stored configuration.

Request Body (optional):

{
  "job_id": "p3j9ox",
  "email": "your-email@example.com",
  "password": "your-password",
  "airtable_pat": "pat...",
  "airtable_base_id": "app...",
  "airtable_table_name": "Candidates",
  "n8n_webhook_url": "https://...",
  "openai_api_key": "sk-..."
}

Response:

{
  "status": "started",
  "message": "Scraper started in background",
  "check_status_at": "/status",
  "started_at": "2025-01-08T23:32:10"
}

Get All Candidates

GET /api/candidates?limit=10&sort=date_desc

Retrieves all scraped candidates with optional pagination and sorting.

Query Parameters:

limit (optional): Maximum number of candidates to return
sort (optional): Sort order (date_desc, name_asc)

Response:

{
  "candidates": [...],
  "total": 58,
  "returned": 10,
  "scraped_at": "2025-01-08T19:56:07",
  "job_id": "p3j9ox"
}

Get Single Candidate

GET /api/candidates/{candidate_id}

Retrieves detailed information for a specific candidate.

Response:

{
  "name": "John Doe",
  "phone": "+1234567890",
  "email": "john@example.com",
  "location": "New York, NY",
  "chat_history": "...",
  "chat_summary": "...",
  ...
}

Configure Settings

POST /api/configure
Content-Type: application/json

Saves configuration settings (called automatically by the extension).

Validate Credentials

POST /api/validate-credentials
Content-Type: application/json

Tests JobToday login credentials without initiating a full scrape.

⚙️ Configuration

Environment Variables

All configuration can be set via environment variables or through the browser extension interface.

Variable	Description	Required
`JOBTODAY_EMAIL`	Your JobToday account email	Yes
`JOBTODAY_PASSWORD`	Your JobToday account password	Yes
`JOB_ID`	Job posting ID (e.g., `p3j9ox`)	Yes
`AIRTABLE_PAT`	Airtable Personal Access Token	No
`AIRTABLE_BASE_ID`	Airtable Base ID	No
`AIRTABLE_TABLE_NAME`	Airtable table name (default: `Candidates`)	No
`N8N_WEBHOOK_URL`	n8n webhook URL for notifications	No
`OPENAI_API_KEY`	OpenAI API key for chat summaries	No
`PORT`	Backend server port (default: `5001`)	No
`BACKEND_URL`	Backend URL (for production deployment)	No

Extension Configuration

The browser extension stores configuration locally and automatically syncs with the backend API. All settings can be managed through the extension's Settings button.

🚢 Deployment

Local Development

Simply run the Flask server as described in Quick Start.

Production Deployment

The application can be deployed to any platform that supports Python applications. Example deployment options:

Render

Connect your GitHub repository
Set environment variables in the Render dashboard
The app will automatically deploy using gunicorn (configured in render.yaml)

Docker

A Dockerfile is included for containerized deployment:

docker build -t jobtoday-scraper .
docker run -p 5001:5001 --env-file .env jobtoday-scraper

Other Platforms

The application can be deployed to:

Heroku
AWS Elastic Beanstalk
Google Cloud Run
Azure App Service
Any VPS with Python support

📁 Project Structure

jobtoday-scraper/
├── extension/                 # Browser extension files
│   ├── manifest.json         # Extension manifest
│   ├── popup.html            # Main popup UI
│   ├── popup.js              # Popup logic
│   ├── onboarding.html       # Onboarding flow
│   ├── onboarding.js         # Onboarding logic
│   ├── background.js         # Background service worker
│   ├── styles.css            # Extension styling
│   └── icons/                # Extension icons
├── scraper_api.py            # Main Flask API server
├── jobtoday_1.py             # Core scraping logic
├── config_storage.py         # Configuration management
├── requirements.txt          # Python dependencies
├── Dockerfile                # Docker configuration
├── render.yaml               # Render deployment config
├── install_dependencies.sh   # Installation script (macOS/Linux)
├── install_dependencies.bat  # Installation script (Windows)
└── README.md                 # This file

🔧 Development

Project Structure Explained

scraper_api.py: Main Flask API server with all endpoints
jobtoday_1.py: Core scraping engine using Playwright
config_storage.py: Secure configuration storage with encryption
extension/: Browser extension source code
- popup.html/js: Main extension interface
- onboarding.html/js: Setup wizard
- background.js: Service worker for background tasks
- styles.css: Modern UI styling

Running in Development Mode

# Start with auto-reload (install python-dotenv if not installed)
export FLASK_ENV=development
python scraper_api.py

Code Style

The project follows PEP 8 style guidelines. For development, consider using:

black - Code formatting
flake8 - Linting
pylint - Advanced linting

🐛 Troubleshooting

Common Issues

Port 5000/5001 already in use:

On macOS, port 5000 is often used by AirPlay Receiver
The app defaults to port 5001 to avoid conflicts
You can change it via the PORT environment variable

Playwright not found:

# Make sure you've installed playwright browsers
playwright install chromium

Extension not loading:

Ensure "Developer mode" is enabled in Chrome
Check the browser console for errors (chrome://extensions/ → Details → Inspect views)
Verify all files are in the extension folder

Cannot connect to backend:

Verify the Flask server is running (python scraper_api.py)
Check the backend URL in extension settings (default: http://localhost:5001)
Ensure no firewall is blocking the connection

Candidates not loading:

Verify candidates_detailed.json exists in the project root
Check that a scraping session has been completed at least once
Review backend logs for any errors

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is licensed under the MIT License.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

🙏 Acknowledgments

Playwright - Powerful browser automation
Flask - Web framework
OpenAI - AI chat summarization
Airtable - Database integration
n8n - Workflow automation

🔒 Security & Privacy

Local Storage: All credentials are encrypted and stored locally in the extension
No Data Collection: The extension does not collect or transmit any user data to third parties
Secure Communication: All API communication uses HTTPS (in production)
Session Management: Login sessions are stored locally and never shared

📊 Data Collected

The scraper collects the following information from JobToday candidate profiles:

Personal information (name, phone, email, location)
Professional experience and work history
Languages and certifications
Complete chat conversation history
Application dates and job role information

All data is stored locally and can be exported to JSON/CSV format. When configured, data is also synced to your Airtable database.

🛠️ Built With

Python 3.8+ - Programming language
Flask - Web framework
Playwright - Browser automation
Chrome Extension API - Browser extension platform
OpenAI API - AI chat summarization
Airtable API - Database integration
n8n - Workflow automation

📈 Roadmap

Future enhancements may include:

🤝 Contributing

We welcome contributions! Here's how you can help:

Report Bugs: Open an issue with detailed information about the bug
Suggest Features: Share your ideas for new features
Submit Pull Requests: Help improve the codebase
Improve Documentation: Make the docs better for everyone
Share Feedback: Let us know how we can improve

Please ensure your code follows the existing style and includes appropriate tests.

📧 Support & Contact

GitHub Issues: Open an issue
Questions: Check existing issues or open a new one
Feature Requests: We'd love to hear your ideas!

⭐ Show Your Support

If you find this project useful, please:

⭐ Star the repository on GitHub
🐛 Report bugs to help improve the project
💡 Suggest features to make it even better
📢 Share with others who might benefit from it

Made with ❤️ for recruiters and HR professionals

⭐ Star on GitHub • 🐛 Report Bug • 💡 Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
__pycache__		__pycache__
docs		docs
extension		extension
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
Scrape_local_api.py		Scrape_local_api.py
config_storage.py		config_storage.py
jobToday.py		jobToday.py
jobtoday_1.py		jobtoday_1.py
render.yaml		render.yaml
requirements.txt		requirements.txt
scraper_api.py		scraper_api.py

Folders and files

Latest commit

History

Repository files navigation

🚀 JobToday Candidate Scraper

📋 Table of Contents

🎯 Overview

✨ Features

🎨 Browser Extension

⚡ Backend Scraper

🔗 Integrations

📸 Features in Action

Browser Extension Dashboard

Candidate List View

Candidate Detail View

Onboarding Flow

🏗️ Architecture

🚀 Installation

Prerequisites

Backend Setup

Browser Extension Setup

🎬 Quick Start

Starting the Backend Server

Using the Browser Extension

🌐 Browser Extension

Features Overview

Key Benefits

📡 API Documentation

Base URL

Endpoints

Health Check

Get Status

Trigger Scraping

Get All Candidates

Get Single Candidate

Configure Settings

Validate Credentials

⚙️ Configuration

Environment Variables

Extension Configuration

🚢 Deployment

Local Development

Production Deployment

Render

Docker

Other Platforms

📁 Project Structure

🔧 Development

Project Structure Explained

Running in Development Mode

Code Style

🐛 Troubleshooting

Common Issues

🤝 Contributing

📝 License

🙏 Acknowledgments

🔒 Security & Privacy

📊 Data Collected

🛠️ Built With

📈 Roadmap

🤝 Contributing

📧 Support & Contact

⭐ Show Your Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages