A powerful, user-friendly browser extension and API solution for automating candidate data collection from JobToday with AI-powered chat summaries and seamless integrations.
- Overview
- Features
- Screenshots
- Architecture
- Installation
- Quick Start
- Browser Extension
- API Documentation
- Configuration
- Deployment
- Contributing
- License
JobToday Candidate Scraper is a comprehensive solution that automates the collection of candidate information from JobToday.com. It consists of:
- π Browser Extension: A sleek, user-friendly Chrome extension with a modern UI that requires zero technical knowledge
- βοΈ Backend API: A robust Flask API with Playwright-based web scraping
- π€ AI Integration: Automatic chat history summarization using OpenAI GPT-4o-mini
- π Data Management: Seamless integration with Airtable and n8n for automated workflows
Perfect for recruiters and HR professionals who want to streamline their candidate data collection process without writing a single line of code.
- Zero-Config Setup: Guided onboarding flow that walks you through setup in minutes
- Beautiful UI: Modern, sleek interface with gradient designs and smooth animations
- Real-Time Dashboard: Live stats showing total candidates, new today, and candidates with chat
- Candidate Cards: Visual cards displaying key information at a glance
- Chat Visualization: Beautiful chat-bubble interface for viewing conversation history
- Search & Filter: Quickly find candidates by name, phone, location, or filter by chat availability
- Detailed Profiles: Comprehensive candidate views with tabs for Overview, Chat, and Experience
- Automated Data Collection: Extracts comprehensive candidate information including:
- Personal details (name, phone, email, location)
- Professional experience and work history
- Certificates and qualifications
- Languages spoken
- Complete chat conversation history
- AI-Powered Summaries: Automatically generates concise chat summaries using OpenAI
- Smart Duplicate Prevention: Checks Airtable to avoid duplicate entries
- Session Management: Persistent login sessions reduce authentication overhead
- Error Handling: Robust retry logic for failed operations
- Progress Tracking: Real-time progress updates during scraping
- Airtable: Automatic syncing of new candidates to your database
- n8n: Webhook integration for custom automation workflows
- Local Export: JSON and CSV exports for backup and analysis
The extension features a modern, gradient-based dashboard displaying:
- Real-time Statistics: Total candidates, candidates with chat history, and new candidates today
- Recent Candidates Preview: Quick access to the 5 most recent candidates with visual cards
- Progress Tracking: Live progress updates during scraping operations
- Quick Actions: One-click scraping and settings access
- Search Functionality: Instantly find candidates by name, phone number, or location
- Smart Filters: Filter by "All", "With Chat", or "New" candidates
- Visual Cards: Each candidate card displays key information with chat indicators
- Smooth Scrolling: Efficient navigation through large candidate lists
Comprehensive candidate profiles organized into three intuitive tabs:
-
Overview Tab
- Contact information (phone, email, location)
- Personal "About" section
- Languages spoken
- Certificates and qualifications
- Application date and job role
-
Chat Tab
- Beautiful chat-bubble interface with color-coded messages
- Candidate messages (left-aligned, white bubbles)
- Recruiter messages (right-aligned, gradient bubbles)
- System messages (centered, highlighted)
- Timestamps for each message
- AI-generated chat summary at the top
- Date separators for conversation organization
-
Experience Tab
- Formatted work experience
- Company names and roles
- Employment dates and durations
- Detailed job descriptions
A guided, step-by-step setup process:
- Welcome Screen: Introduction to the extension
- Credentials Setup: Secure JobToday login configuration
- Job ID Configuration: Easy-to-follow instructions for finding your Job ID
- Optional Integrations: Airtable, n8n, and OpenAI setup (all optional)
- Validation: Real-time testing of credentials and connections
- Summary: Overview of your configuration before completion
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Browser Extension (Chrome) β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Dashboard β β Candidate β β Settings β β
β β View β β Detail View β β & Onboarding β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βοΈ HTTP REST API
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Flask Backend API β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β API Routes β β Scraper β β Config β β
β β /api/* β β Engine β β Storage β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βοΈ Playwright
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JobToday.com β
β (Web Scraping Target) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βοΈ Integrations
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Airtable β β n8n β β OpenAI β
β Database β β Webhooks β β API β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
- Python 3.8+ - Download Python
- pip - Usually included with Python
- Chrome Browser - Required for the extension
- Git - For cloning the repository (optional)
-
Clone the repository
git clone https://github.com/yourusername/jobtoday-scraper.git cd jobtoday-scraper -
Create a virtual environment
Windows:
python -m venv venv .\venv\Scripts\activate
macOS/Linux:
python3 -m venv venv source venv/bin/activate -
Install dependencies
pip install -r requirements.txt
-
Install Playwright browsers
playwright install chromium
-
Configure environment variables
Create a
.envfile in the project root:# JobToday Credentials JOBTODAY_EMAIL=your-email@example.com JOBTODAY_PASSWORD=your-password # Job ID (optional, can be set via extension) JOB_ID=p3j9ox # Airtable Configuration (optional) AIRTABLE_PAT=your-airtable-token AIRTABLE_BASE_ID=your-base-id AIRTABLE_TABLE_NAME=Candidates # n8n Webhook URL (optional) N8N_WEBHOOK_URL=https://your-n8n-webhook-url # OpenAI API Key (optional, for chat summaries) OPENAI_API_KEY=sk-your-openai-key
-
Load the extension in Chrome
# Navigate to Chrome Extensions page # chrome://extensions/ # Steps: # 1. Enable "Developer mode" (toggle in top-right corner) # 2. Click "Load unpacked" button # 3. Select the 'extension' folder from this project
-
Configure the extension
- Click the JobToday Scraper icon in your Chrome toolbar
- Follow the guided 5-step onboarding flow:
- Welcome - Get introduced to the extension
- Credentials - Enter your JobToday email and password
- Job ID - Enter your job posting ID (found in the URL)
- Integrations - Optionally set up Airtable, n8n, and OpenAI
- Complete - Review your configuration and finish setup
Note: The backend URL is automatically configured. If running locally, it defaults to
http://localhost:5001.
macOS/Linux:
# Navigate to project directory
cd jobtoday-scraper
# Activate virtual environment
source venv/bin/activate
# Install dependencies (if not already done)
pip install -r requirements.txt
playwright install chromium
# Start the Flask API server
python scraper_api.pyWindows:
# Navigate to project directory
cd jobtoday-scraper
# Activate virtual environment
.\venv\Scripts\activate
# Install dependencies (if not already done)
pip install -r requirements.txt
playwright install chromium
# Start the Flask API server
python scraper_api.pyThe server will start on http://localhost:5001 by default (or the port specified in the PORT environment variable).
- Start the backend server (see above)
- Click the extension icon in your Chrome toolbar
- Complete onboarding if this is your first time (or click Settings to reconfigure)
- View your dashboard - You'll see statistics and recent candidates
- Browse all candidates - Click "View All β" to see the full candidate list
- Search and filter - Use the search bar and filter buttons to find specific candidates
- View candidate details - Click any candidate card to see their full profile and chat history
- Start scraping - Click the "Start Scraping" button to initiate a new scraping session
- Monitor progress - Watch real-time progress updates on the dashboard
The browser extension provides a complete, user-friendly interface for managing your candidate scraping workflow:
- Dashboard: Overview of all candidates with quick statistics
- Candidate List: Browse, search, and filter all scraped candidates
- Candidate Detail: Comprehensive profile view with:
- Contact information and location
- Professional experience and qualifications
- Full chat conversation history with AI summary
- Languages and certificates
- β No command-line knowledge required
- β Visual progress tracking
- β Instant access to candidate data
- β Beautiful chat interface
- β Mobile-friendly design
http://localhost:5001
GET /healthReturns the health status of the API.
Response:
{
"status": "healthy",
"service": "JobToday Scraper API",
"timestamp": "2025-01-08T23:32:10.760000"
}GET /statusReturns the current scraper status and progress.
Response:
{
"status": "idle",
"last_run": "2025-01-08T20:00:00",
"candidates_count": 58,
"progress": {
"section": "recommended",
"candidate": "John Doe",
"processed": 25,
"total": 58
}
}POST /trigger-scrape
Content-Type: application/jsonInitiates a scraping process. Configuration can be provided in the request body or will use stored configuration.
Request Body (optional):
{
"job_id": "p3j9ox",
"email": "your-email@example.com",
"password": "your-password",
"airtable_pat": "pat...",
"airtable_base_id": "app...",
"airtable_table_name": "Candidates",
"n8n_webhook_url": "https://...",
"openai_api_key": "sk-..."
}Response:
{
"status": "started",
"message": "Scraper started in background",
"check_status_at": "/status",
"started_at": "2025-01-08T23:32:10"
}GET /api/candidates?limit=10&sort=date_descRetrieves all scraped candidates with optional pagination and sorting.
Query Parameters:
limit(optional): Maximum number of candidates to returnsort(optional): Sort order (date_desc,name_asc)
Response:
{
"candidates": [...],
"total": 58,
"returned": 10,
"scraped_at": "2025-01-08T19:56:07",
"job_id": "p3j9ox"
}GET /api/candidates/{candidate_id}Retrieves detailed information for a specific candidate.
Response:
{
"name": "John Doe",
"phone": "+1234567890",
"email": "john@example.com",
"location": "New York, NY",
"chat_history": "...",
"chat_summary": "...",
...
}POST /api/configure
Content-Type: application/jsonSaves configuration settings (called automatically by the extension).
POST /api/validate-credentials
Content-Type: application/jsonTests JobToday login credentials without initiating a full scrape.
All configuration can be set via environment variables or through the browser extension interface.
| Variable | Description | Required |
|---|---|---|
JOBTODAY_EMAIL |
Your JobToday account email | Yes |
JOBTODAY_PASSWORD |
Your JobToday account password | Yes |
JOB_ID |
Job posting ID (e.g., p3j9ox) |
Yes |
AIRTABLE_PAT |
Airtable Personal Access Token | No |
AIRTABLE_BASE_ID |
Airtable Base ID | No |
AIRTABLE_TABLE_NAME |
Airtable table name (default: Candidates) |
No |
N8N_WEBHOOK_URL |
n8n webhook URL for notifications | No |
OPENAI_API_KEY |
OpenAI API key for chat summaries | No |
PORT |
Backend server port (default: 5001) |
No |
BACKEND_URL |
Backend URL (for production deployment) | No |
The browser extension stores configuration locally and automatically syncs with the backend API. All settings can be managed through the extension's Settings button.
Simply run the Flask server as described in Quick Start.
The application can be deployed to any platform that supports Python applications. Example deployment options:
- Connect your GitHub repository
- Set environment variables in the Render dashboard
- The app will automatically deploy using
gunicorn(configured inrender.yaml)
A Dockerfile is included for containerized deployment:
docker build -t jobtoday-scraper .
docker run -p 5001:5001 --env-file .env jobtoday-scraperThe application can be deployed to:
- Heroku
- AWS Elastic Beanstalk
- Google Cloud Run
- Azure App Service
- Any VPS with Python support
jobtoday-scraper/
βββ extension/ # Browser extension files
β βββ manifest.json # Extension manifest
β βββ popup.html # Main popup UI
β βββ popup.js # Popup logic
β βββ onboarding.html # Onboarding flow
β βββ onboarding.js # Onboarding logic
β βββ background.js # Background service worker
β βββ styles.css # Extension styling
β βββ icons/ # Extension icons
βββ scraper_api.py # Main Flask API server
βββ jobtoday_1.py # Core scraping logic
βββ config_storage.py # Configuration management
βββ requirements.txt # Python dependencies
βββ Dockerfile # Docker configuration
βββ render.yaml # Render deployment config
βββ install_dependencies.sh # Installation script (macOS/Linux)
βββ install_dependencies.bat # Installation script (Windows)
βββ README.md # This file
scraper_api.py: Main Flask API server with all endpointsjobtoday_1.py: Core scraping engine using Playwrightconfig_storage.py: Secure configuration storage with encryptionextension/: Browser extension source codepopup.html/js: Main extension interfaceonboarding.html/js: Setup wizardbackground.js: Service worker for background tasksstyles.css: Modern UI styling
# Start with auto-reload (install python-dotenv if not installed)
export FLASK_ENV=development
python scraper_api.pyThe project follows PEP 8 style guidelines. For development, consider using:
black- Code formattingflake8- Lintingpylint- Advanced linting
Port 5000/5001 already in use:
- On macOS, port 5000 is often used by AirPlay Receiver
- The app defaults to port 5001 to avoid conflicts
- You can change it via the
PORTenvironment variable
Playwright not found:
# Make sure you've installed playwright browsers
playwright install chromiumExtension not loading:
- Ensure "Developer mode" is enabled in Chrome
- Check the browser console for errors (
chrome://extensions/β Details β Inspect views) - Verify all files are in the
extensionfolder
Cannot connect to backend:
- Verify the Flask server is running (
python scraper_api.py) - Check the backend URL in extension settings (default:
http://localhost:5001) - Ensure no firewall is blocking the connection
Candidates not loading:
- Verify
candidates_detailed.jsonexists in the project root - Check that a scraping session has been completed at least once
- Review backend logs for any errors
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- Playwright - Powerful browser automation
- Flask - Web framework
- OpenAI - AI chat summarization
- Airtable - Database integration
- n8n - Workflow automation
- Local Storage: All credentials are encrypted and stored locally in the extension
- No Data Collection: The extension does not collect or transmit any user data to third parties
- Secure Communication: All API communication uses HTTPS (in production)
- Session Management: Login sessions are stored locally and never shared
The scraper collects the following information from JobToday candidate profiles:
- Personal information (name, phone, email, location)
- Professional experience and work history
- Languages and certifications
- Complete chat conversation history
- Application dates and job role information
All data is stored locally and can be exported to JSON/CSV format. When configured, data is also synced to your Airtable database.
- Python 3.8+ - Programming language
- Flask - Web framework
- Playwright - Browser automation
- Chrome Extension API - Browser extension platform
- OpenAI API - AI chat summarization
- Airtable API - Database integration
- n8n - Workflow automation
Future enhancements may include:
- Support for multiple job postings
- Advanced filtering and sorting options
- Export to Excel format
- Dark mode theme
- Candidate comparison view
- Automated scheduling of scraping tasks
- Email notifications
- Mobile app companion
We welcome contributions! Here's how you can help:
- Report Bugs: Open an issue with detailed information about the bug
- Suggest Features: Share your ideas for new features
- Submit Pull Requests: Help improve the codebase
- Improve Documentation: Make the docs better for everyone
- Share Feedback: Let us know how we can improve
Please ensure your code follows the existing style and includes appropriate tests.
- GitHub Issues: Open an issue
- Questions: Check existing issues or open a new one
- Feature Requests: We'd love to hear your ideas!
If you find this project useful, please:
- β Star the repository on GitHub
- π Report bugs to help improve the project
- π‘ Suggest features to make it even better
- π’ Share with others who might benefit from it
Made with β€οΈ for recruiters and HR professionals
β Star on GitHub β’ π Report Bug β’ π‘ Request Feature