Skip to content

Latest commit

 

History

History
87 lines (70 loc) · 3.07 KB

File metadata and controls

87 lines (70 loc) · 3.07 KB

OSINT Web Resources

https://osintframework.com/

Scraping usernames from LinkedIn

Install the WeakestLink Chrome extension: https://chrome.google.com/webstore/detail/weakestlink/jiobcfhamdgbhhhnkmoblghheddjfnpo

It will generate an Excel doc with the names formatted in the most common username formats.

Use of this extension is against LinkedIn TOS and your account may be restricted. Sign into LinkedIn Browse to the required company page Click '# employees' If there are more than 1000 results (100 pages) then use the filters to reduce the number and run multiple dumps Click the Dump Users button below and wait for the WeakestLink completion page If you need to cancel the dump then close the LinkedIn tab

Python LinkedIn Scraper

This script will output a list of employee names from LinkedIn. It may be broken as LinkedIn is known for changing things in the web page source from time to time but at the very least you could use this code for a starting point and tweak as required.

#!/usr/bin/env python3

from splinter import Browser
import argparse
import re

parser = argparse.ArgumentParser(description="LinkedIn Scraper. Author: Steve Campbell, @lpha3ch0")
parser.add_argument("-u", required=True, help="LinkedIn username")
parser.add_argument("-p", required=True, help="LinkedIn password")
parser.add_argument("-id", required=True, help="Company ID")
parser.add_argument("-n", required=True, help="Number of pages to scrape")
parser.add_argument("-uf", help="Username format: flast, first.last, firstlast. If omitted this script will print out employee names instead of formatted usernames.")
args = parser.parse_args()
username = args.u
password = args.p
company_id = args.id
pages = int(args.n)
uFormat = args.uf

# browser = Browser('firefox', user_agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/76.0.3809.87 Chrome/76.0.3809.87 Safari/537.36")
url = 'https://www.linkedin.com/search/results/people/?facetCurrentCompany=%5B%22{}%22%5D&page={}'
xpath = "//*[contains(@class, 'name actor-name')]"
employees = []
counter = 1
browser = Browser('firefox')
browser.visit('https://www.linkedin.com')
browser.find_by_text('Sign in').click()
browser.fill('session_key', username)
browser.fill('session_password', password)
browser.find_by_text('Sign in').click()
while counter <= pages:
  browser.visit(url.format(company_id, str(counter)))
  names = browser.find_by_xpath(xpath)
  for name in names:
    employees.append(name.text)
  counter += 1

browser.quit()

usernames = []

for username in usernames:
  print(username.lower())
for employee in employees:
  employee = re.sub('[,.]', '', employee)
  employee = re.sub('"', '', employee)
  employee = re.sub("'", '', employee)
  names = employee.split()
  if uFormat == 'flast':
    username = names[0][0] + names[1]
    usernames.append(username)
  elif uFormat == 'first.last':
    usernames.append(names[0] + '.' + names[1])
  elif uFormat == 'firstlast':
    usernames.append(names[0] + names[1])
  else:
    print(employee)
    

if uFormat:
  for username in usernames:
    print(username.lower())