Intersect - Personalized job matching

Find the job you actually want using AI.

Access here: https://intersect.streamlit.app

Intersect (web app) is a job-searching tool that uses NLP to reorder job postings based on semantic similarity rather than traditional keyword searches. Unlike lexical search (BM25), which relies on exact word matches, semantic search uses dense vectors to represent meaning (Boykis, 2023; Mitchell, 2019; Schmidt, 2015), providing more personalized results when used with user-provided text. By providing the user with different information retrieval methods (original ranking, semantic search, lexical search, semantic delta, reranking), the purpose of Intersect is to enhance job discovery and reduce manual effort.

Intersect uncovers non-obvious job opportunities by enhancing traditional search methods with NLP. The varied outcomes suggest a hybrid approach—combining keyword, semantic, and reranking techniques—could yield optimal results.

It involves

Fetching job listings via APIs (currently Reed API) and vectorizing results with OpenAI's text-embedding-3-small.
Capturing user input (text or PDF CV) and reordering results by computing similarity via dot product.
Visualizing clusters using UMAP and HDBSCAN.
Displaying original ranking from the job API.
Reordering results using BM25 (lexical search).
Reordering results using semantic search (embedding similarity).
Identifying semantic delta (jobs that rank differently between lexical and semantic search).
Reranking with Cohere's cross-encoder.

Implementation details:

web development
- uv: environment and dependency management
- streamlit: web framework (frontend and backend) and hosting
- pypdf: pdf cv parsing
data science
- semantic search: OpenAI's text-embedding-3-small
- lexical search: bm25s (Lucene method)
  - preprocessing (tokenizer, stemmer, stop words)
- visualization: UMAP + HDBSCAN (umap-learn, hdbscan)
- reranker: Cohere's reranking model (rerank-v3.5)

References

Boykis, V. (2023). What are embeddings?. Retrieved from https://github.com/veekaybee/what_are_embeddings
Mitchell, M. (2019). Artificial Intelligence: A Guide for Thinking Humans. Pelican Books.
Sanseviero, O. (2024). Sentence Embeddings. Cross-encoders and Re-ranking. hackerllama. Retrieved from https://osanseviero.github.io/hackerllama/blog/posts/sentence_embeddings2/
Schmidt, B. (2015). Vector Space Models for the Digital Humanities. Bookworm. Retrieved from https://bookworm.benschmidt.org/posts/2015-10-25-Word-Embeddings.html
Sun, W., Yan, L., Ma, X., Wang, S., Ren, P., Chen, Z., Yin, D., & Ren, Z. (2024). Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents (No. ArXiv: 2304.09542). ArXiv. https://doi.org/10.48550/arXiv.2304.09542

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.devcontainer		.devcontainer
intersect		intersect
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
intersect.log		intersect.log
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intersect - Personalized job matching

Implementation details:

References

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Intersect - Personalized job matching

Implementation details:

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages