This repository contains a project that automates the process of scraping LinkedIn profiles of recruiters, HR personnel, and hiring managers to collect relevant data such as email addresses. Using this data, the project leverages OpenAI's API to generate customized emails based on the company’s needs, job openings, and other factors. The end goal is to create a fully automated pipeline that sends personalized emails targeting specific recruiters and companies, showcasing how the sender can add value to the organization.
The scraping process is carried out using PhantomBuster and Apollo.io, while email generation and sending are managed using Python, OpenAI's API, and SMTP or SendGrid.
- Automated LinkedIn Profile Scraping using PhantomBuster and Apollo.io.
- Data storage in a structured format (PostgreSQL/SQLite).
- Personalized email generation based on recruiter/company data using OpenAI's API.
- Automatic email sending via SMTP or SendGrid.
- Monitoring and logging of sent emails.
- Scalable design for handling larger datasets.
- Use PhantomBuster to scrape LinkedIn profiles of recruiters and hiring managers based on specific search criteria (e.g., company, job title, location).
- Extract recruiter information such as Name, Job Title, Company, and LinkedIn URL.
- Use Apollo.io to retrieve email addresses based on LinkedIn profiles.
- Store the scraped data in a structured format (e.g., PostgreSQL or SQLite).
- Store the scraped recruiter and company data in a database.
- Database Tables:
recruiters
: (id, name, job_title, company, linkedin_url, email)companies
: (id, name, industry, technologies, job_postings_url)interactions
: (id, recruiter_id, email_content, date_sent, status)
- Fetch recruiter and company data from the database.
- Generate customized emails using OpenAI's GPT API.
- Prompt-based email customization based on:
- Recruiter’s role (Hiring Manager, HR, etc.)
- Company’s industry and job openings.
- The sender’s skills and experience.
- Automate email sending via SMTP (Gmail, Outlook, etc.) or SendGrid.
- Implement email templates with placeholders for personalized data.
- Track sent emails, including the recruiter’s name, email, and timestamp, for monitoring.
- Log sent emails and status (success, failure).
- Implement error handling and retries for failed emails.
- Use Python’s logging module or a dashboard for real-time monitoring.
- Set up a task scheduler (e.g., Celery) to automate follow-up emails if no response is received after a set time.
- A/B test email subject lines, body content, and personalization strategies for better engagement.
- Deploy the scraping and automation pipeline to the cloud using AWS EC2 or Google Cloud.
- For large-scale scraping and email automation, use Redis and Celery for task queuing.
- Monitor system performance and optimize the flow for parallel execution of tasks.
-
Clone the repository:
git clone https://github.com/username/repo-name.git
-
Install necessary dependencies:
pip install -r requirements.txt
-
Set up environment variables for API keys and database:
- OpenAI API Key
- PhantomBuster API Key
- Apollo API Key
- SMTP credentials (or SendGrid credentials)
-
Run the LinkedIn scraping script:
python linkedin_scraper.py
-
Run the email automation script:
python email_automation.py
This repository provides a structured workflow for scraping LinkedIn profiles and automating personalized emails at scale. Feel free to extend or modify the code to suit your needs!