Skip to content

noah-art3mis/intersect

Repository files navigation

Intersect - Personalized job matching

Find the job you actually want using AI.

Access here: https://intersect.streamlit.app

Intersect (web app) is a job-searching tool that uses NLP to reorder job postings based on semantic similarity rather than traditional keyword searches. Unlike lexical search (BM25), which relies on exact word matches, semantic search uses dense vectors to represent meaning (Boykis, 2023; Mitchell, 2019; Schmidt, 2015), providing more personalized results when used with user-provided text. By providing the user with different information retrieval methods (semantic search, lexical search, reranking), the purpose of Intersect is to enhance job discovery and reduce manual effort.

Intersect uncovers non-obvious job opportunities by enhancing traditional search methods with NLP. The varied outcomes suggest a hybrid approach—combining keyword, semantic, and reranking techniques—could yield optimal results.

It involves

  • Scraping job listings and vectorizing results with OpenAI's text-embedding-3-small.
  • Generating word clouds with TF-IDF.
  • Capturing user input and reordering results by computing similarity via dot product.
  • Visualizing clusters using PCA and KMeans.
  • Reordering results using BM25 (lexical search).
  • Reranking with Cohere’s cross-encoder.

Implementation details:

  • web development
    • uv: environment and dependency management
    • streamlit: web framework (frontend and backend) and hosting
    • pypdf: pdf cv parsing
  • data science
    • semantic search: OpenAI's text-embedding-3-small
    • lexical search: bm25s (Lucene method)
      • preprocessing (tokenizer, stemmer, stop words)
    • visualization: PCA+KMeans scikit-learn (Might be more appropriate to use other algorithms such as t-SNE, LSA, mean-shift and dbscan)
    • reranker: Cohere's reranking model

References

TODO

  • fix currency showing up as none

  • how is running the same thing getting different results eaech time

  • fix viz labels again

  • add limits for embedding and for user submission

  • add sanitize user input

  • add topic modelling to name the clusters

  • add llm permutation

    • sync old indices with new indices
  • turn tables into cards

  • infer keyword and location from the text

  • find the last page automatically

  • add async to openai embedding

  • add local

    • semantic search
    • reranker
    • llm permutation
  • prepend other cols before embedding

  • features

    • add sponsor column by comparing to the ukvi excel spreadsheet
    • add tracking the bluesky firehose for ai jobs
    • 'tell me who your friends are' mode where you give other peoples cvs and average the vectors

Job boards

Some info on this here and here. Each one of these would need a bespoke scraping strategy.

Other sources: https://theirstack.com/en/docs/data/job/sources#job-data-sources

About

Job board which uses NLP to find more relevant roles

Topics

Resources

Stars

Watchers

Forks

Languages