Skip to content

jgarvey928/Job_Scrapper

Repository files navigation

🚀 Job Scrapper & Cover Letter Generator

Python Jupyter Status

An automated data pipeline designed to streamline the job application process. This tool scrapes job listings from LinkedIn, processes the extracted data, and automatically generates tailored cover letters to give you a competitive edge in your job search.

✨ Features

  • Automated Web Scraping: Extracts job postings, titles, locations, and posting dates directly from LinkedIn.
  • Data Processing & Analysis: Cleans and analyzes scraped HTML data to identify key job requirements and details.
  • Dynamic Cover Letter Generation: Automatically crafts personalized cover letters based on the extracted job data.
  • Interactive Notebooks: Includes a Jupyter Notebook environment for data exploration and testing.

⚠️ Disclaimer

Warning

CRITICAL WARNING: You should not run this program on your local machine while signed in to your LinkedIn account on your web browser. Automated scraping can trigger security flags, which may result in your personal account and IP address being banned. Instead, it is highly recommended to run this anonymously through the Google Colab notebook.

This tool interacts with LinkedIn's front-end HTML elements (e.g., results-context-header, base-search-card). Web structures change frequently, which may require you to update the HTML tags in the scraping script. Furthermore, please be mindful of LinkedIn's Terms of Service regarding automated scraping and use this tool responsibly.

Note

⏱️ Execution Time & Rate Limiting Note: To mimic human behavior and avoid attracting attention from anti-bot protections, this program is intentionally designed to run slowly.

  • Execution Time: Depending on how many job postings are being scraped, it can take over 5 minutes to run to completion. You can adjust this speed in the code by modifying the wait_seconds = 2 variable.
  • Scraping Limits: The script includes a limiter for how many postings can be scraped per run to prevent account flagging. Upon running, it will prompt you via user input to ask how many jobs you want to scrape (the current default limit is set to 100).

📁 Repository Structure

Job_Scrapper/
├── letters/                   # Directory containing generated cover letters
├── processed_data/            # Cleaned and structured data ready for analysis
├── scrapped_data/             # Raw HTML and JSON data scraped from LinkedIn
├── Job_Scapper_Ext.ipynb      # Interactive Jupyter Notebook for extended analysis
├── analyze_data.py            # Scripts for analyzing processed job data
├── extract_data.py            # Extracts targeted information from raw HTML 
├── generate_letters.py        # Logic for drafting tailored cover letters
├── main.py                    # Main execution script to run the full pipeline
├── scrape_data.py             # Web scraping logic utilizing LinkedIn HTML tags
└── README.md                  # Project documentation

🛠️ Installation & Setup

Option 1: Google Colab (Recommended)

To avoid local setup and protect your personal IP and accounts, you can run this scraper entirely in the cloud: Run in Google Colab

Option 2: Local Setup

  1. Clone the repository:

    git clone [https://github.com/jgarvey928/Job_Scrapper.git](https://github.com/jgarvey928/Job_Scrapper.git)
    
    cd Job_Scrapper
    
  2. Set up a virtual environment (recommended):

    python -m venv venv
    
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    
  3. Install dependencies: (Note: Ensure you have a requirements.txt file, or install the necessary scraping/data libraries like BeautifulSoup4, pandas, requests, etc.)

    pip install -r requirements.txt
    

🚀 Usage

To run the complete pipeline from scraping to cover letter generation, execute the main script:

python main.py

Step-by-Step Execution: If you prefer to run the modules individually:

  1. Run python scrape_data.py to fetch the latest job postings.
  2. Run python extract_data.py to parse the raw data into the processed_data/ folder.
  3. Run python analyze_data.py to gain insights into the job market.
  4. Run python generate_letters.py to output tailored documents into the letters/ folder.

You can also open Job_Scapper_Ext.ipynb in Jupyter Notebook or Google Colab for an interactive walk-through of the data.

👨‍💻 Author

John S. Garvey


If you find this project helpful, please consider giving it a ⭐!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors