An automated data pipeline designed to streamline the job application process. This tool scrapes job listings from LinkedIn, processes the extracted data, and automatically generates tailored cover letters to give you a competitive edge in your job search.
- Automated Web Scraping: Extracts job postings, titles, locations, and posting dates directly from LinkedIn.
- Data Processing & Analysis: Cleans and analyzes scraped HTML data to identify key job requirements and details.
- Dynamic Cover Letter Generation: Automatically crafts personalized cover letters based on the extracted job data.
- Interactive Notebooks: Includes a Jupyter Notebook environment for data exploration and testing.
Warning
CRITICAL WARNING: You should not run this program on your local machine while signed in to your LinkedIn account on your web browser. Automated scraping can trigger security flags, which may result in your personal account and IP address being banned. Instead, it is highly recommended to run this anonymously through the Google Colab notebook.
This tool interacts with LinkedIn's front-end HTML elements (e.g., results-context-header, base-search-card). Web structures change frequently, which may require you to update the HTML tags in the scraping script. Furthermore, please be mindful of LinkedIn's Terms of Service regarding automated scraping and use this tool responsibly.
Note
⏱️ Execution Time & Rate Limiting Note: To mimic human behavior and avoid attracting attention from anti-bot protections, this program is intentionally designed to run slowly.
- Execution Time: Depending on how many job postings are being scraped, it can take over 5 minutes to run to completion. You can adjust this speed in the code by modifying the
wait_seconds = 2variable. - Scraping Limits: The script includes a limiter for how many postings can be scraped per run to prevent account flagging. Upon running, it will prompt you via user input to ask how many jobs you want to scrape (the current default limit is set to 100).
Job_Scrapper/
├── letters/ # Directory containing generated cover letters
├── processed_data/ # Cleaned and structured data ready for analysis
├── scrapped_data/ # Raw HTML and JSON data scraped from LinkedIn
├── Job_Scapper_Ext.ipynb # Interactive Jupyter Notebook for extended analysis
├── analyze_data.py # Scripts for analyzing processed job data
├── extract_data.py # Extracts targeted information from raw HTML
├── generate_letters.py # Logic for drafting tailored cover letters
├── main.py # Main execution script to run the full pipeline
├── scrape_data.py # Web scraping logic utilizing LinkedIn HTML tags
└── README.md # Project documentation
To avoid local setup and protect your personal IP and accounts, you can run this scraper entirely in the cloud: Run in Google Colab
-
Clone the repository:
git clone [https://github.com/jgarvey928/Job_Scrapper.git](https://github.com/jgarvey928/Job_Scrapper.git)cd Job_Scrapper -
Set up a virtual environment (recommended):
python -m venv venvsource venv/bin/activate # On Windows use `venv\Scripts\activate` -
Install dependencies: (Note: Ensure you have a
requirements.txtfile, or install the necessary scraping/data libraries likeBeautifulSoup4,pandas,requests, etc.)pip install -r requirements.txt
To run the complete pipeline from scraping to cover letter generation, execute the main script:
python main.py
Step-by-Step Execution: If you prefer to run the modules individually:
- Run
python scrape_data.pyto fetch the latest job postings. - Run
python extract_data.pyto parse the raw data into theprocessed_data/folder. - Run
python analyze_data.pyto gain insights into the job market. - Run
python generate_letters.pyto output tailored documents into theletters/folder.
You can also open Job_Scapper_Ext.ipynb in Jupyter Notebook or Google Colab for an interactive walk-through of the data.
John S. Garvey
- GitHub: @jgarvey928
- LinkedIn: John S. Garvey
- Portfolio: My Portfolio
If you find this project helpful, please consider giving it a ⭐!