Find the job you actually want using AI.
Access here: https://intersect.streamlit.app
Intersect (web app) is a job-searching tool that uses NLP to reorder job postings based on semantic similarity rather than traditional keyword searches. Unlike lexical search (BM25), which relies on exact word matches, semantic search uses dense vectors to represent meaning (Boykis, 2023; Mitchell, 2019; Schmidt, 2015), providing more personalized results when used with user-provided text. By providing the user with different information retrieval methods (semantic search, lexical search, reranking), the purpose of Intersect is to enhance job discovery and reduce manual effort.
Intersect uncovers non-obvious job opportunities by enhancing traditional search methods with NLP. The varied outcomes suggest a hybrid approach—combining keyword, semantic, and reranking techniques—could yield optimal results.
It involves
- Scraping job listings and vectorizing results with OpenAI's
text-embedding-3-small. - Generating word clouds with TF-IDF.
- Capturing user input and reordering results by computing similarity via dot product.
- Visualizing clusters using PCA and KMeans.
- Reordering results using BM25 (lexical search).
- Reranking with Cohere’s cross-encoder.
- web development
uv: environment and dependency managementstreamlit: web framework (frontend and backend) and hostingpypdf: pdf cv parsing
- data science
- semantic search: OpenAI's
text-embedding-3-small - lexical search:
bm25s(Lucene method)- preprocessing (tokenizer, stemmer, stop words)
- visualization: PCA+KMeans
scikit-learn(Might be more appropriate to use other algorithms such as t-SNE, LSA, mean-shift and dbscan) - reranker: Cohere's reranking model
- semantic search: OpenAI's
- Boykis, V. (2023). What are embeddings?. Retrieved from https://github.com/veekaybee/what_are_embeddings
- Mitchell, M. (2019). Artificial Intelligence: A Guide for Thinking Humans. Pelican Books.
- Sanseviero, O. (2024). Sentence Embeddings. Cross-encoders and Re-ranking. hackerllama. Retrieved from https://osanseviero.github.io/hackerllama/blog/posts/sentence_embeddings2/
- Schmidt, B. (2015). Vector Space Models for the Digital Humanities. Bookworm. Retrieved from https://bookworm.benschmidt.org/posts/2015-10-25-Word-Embeddings.html
- Sun, W., Yan, L., Ma, X., Wang, S., Ren, P., Chen, Z., Yin, D., & Ren, Z. (2024). Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents (No. ArXiv: 2304.09542). ArXiv. https://doi.org/10.48550/arXiv.2304.09542
-
fix currency showing up as none
-
how is running the same thing getting different results eaech time
-
fix viz labels again
-
add limits for embedding and for user submission
-
add sanitize user input
-
add topic modelling to name the clusters
-
add llm permutation
- sync old indices with new indices
-
turn tables into cards
-
infer keyword and location from the text
-
find the last page automatically
-
add async to openai embedding
-
add local
- semantic search
- reranker
- llm permutation
-
prepend other cols before embedding
-
features
- add sponsor column by comparing to the ukvi excel spreadsheet
- add tracking the bluesky firehose for ai jobs
- 'tell me who your friends are' mode where you give other peoples cvs and average the vectors
- With API
- https://publicapi.dev/category/jobs
- https://api.theirstack.com/#tag/jobs/post/v1/jobs/search
- rate 300 per minute
- 200 credits free, $100/5k credits
- https://fantastic.jobs/api
- https://rapidapi.com/fantastic-jobs-fantastic-jobs-default/api/active-jobs-db/pricing
- https://rapidapi.com/fantastic-jobs-fantastic-jobs-default/api/linkedin-job-search-api/pricing
- https://rapidapi.com/fantastic-jobs-fantastic-jobs-default/api/internships-api/pricing
- free 250 per month / 25 requests per month
- https://rapidapi.com/fantastic-jobs-fantastic-jobs-default/api/free-y-combinator-jobs-api/pricing
- free
- https://rapidapi.com/fantastic-jobs-fantastic-jobs-default/api/upwork-jobs-api2/pricing
- free 500 per month
- https://rapidapi.com/techmap-io-techmap-io-default/api/daily-international-job-postings/pricing
- 1000 free p month
- https://publicapi.dev/jobdata-api
- However, please note that there's an hourly rate limit with a handful of requests without a valid API key
- https://www.reed.co.uk/developers/Jobseeker
- no terms
- https://publicapi.dev/adzuna-api
- https://developer.adzuna.com/docs/terms_of_service
- 25 hits per minute 250 day, 1000 week, 2500 month
- display logo
- https://developer.adzuna.com/docs/terms_of_service
- https://api.theirstack.com/#tag/jobs/post/v1/jobs/search
- https://publicapi.dev/category/jobs
Some info on this here and here. Each one of these would need a bespoke scraping strategy.
- General
- CV-Library
- Indeed
- Indeed UK
- Adzuna
- Reed
- Google for jobs
- Monster
- Totaljobs
- Jobserve
- r/forhire
- https://www.jobsite.co.uk/
- https://www.lhh.com/uk/en/
- https://www.prospects.ac.uk/
- Glassdoor
- CWJobs
- Guardian Jobs
- https://uk.whatjobs.com/
- Tech
- Technojobs
- Jobserve
- Hackernews
- https://wellfound.com/jobs
- https://weworkremotely.com/
- https://workinstartups.com/
- https://www.haystackapp.io/
- https://wearetechwomen.com/
- https://jobs.revoco-talent.co.uk/jobs.aspx
- https://devitjobs.uk/
- https://www.cwjobs.co.uk/
- https://www.f6s.com/jobs
- Jora
- https://remotejobs.careers/job-location/uk/
- Responsible Tech
Other sources: https://theirstack.com/en/docs/data/job/sources#job-data-sources