Skip to content

Commit 8d6c0e2

Browse files
committed
Update prompt and scoring logic
1 parent 5149c45 commit 8d6c0e2

7 files changed

Lines changed: 181 additions & 125 deletions

File tree

hire-shark/prompts/data_connectors_and_models.md

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -41,16 +41,12 @@ This reference summarizes every external connector and model configuration that
4141

4242
## 3. JD Skill Extraction Flow
4343
- **Used for:** Building the JD skill list that powers match scoring (`hire-shark/src/lib/skillExtractor.ts`).
44-
- **Inputs:** Job title, sanitized Adzuna description (up to ~2.5k chars) and any keywords returned by the API.
45-
- **LLM layer (optional):**
44+
- **Inputs:** Job title, sanitized Adzuna description (up to ~3.5k chars) and any keywords returned by the API.
45+
- **LLM layer (required when enabled):**
4646
- Model: `gemini-2.5-flash` via `@google/generative-ai`.
4747
- Generation config: `temperature=0`, `topK=1`, `topP=0.1` for deterministic output.
48-
- Prompt: see `prompts.md` ("JD Skill Extraction").
49-
- **Heuristic fallback (always runs first):**
50-
- Keyword seeding + dictionary lookup (`COMMON_SKILL_CANDIDATES`).
51-
- Bullet/phrase parsing with strict sanitization (stop-word filtering, descriptive suffix ban, short-token whitelist).
52-
- Signals (uppercase, special characters, domain vocabulary) must be present for single-word skills.
48+
- Prompt: see `prompts.md` ("JD Skill Extraction"). Asks for inferred skills (implied by responsibilities), excludes location/schedule/pay/benefits, outputs JSON array of 1–4 word skill phrases.
49+
- **Fallback when LLM unavailable:** returns provided keywords only (no heuristic phrase parsing).
5350
- **Post-processing:**
54-
- Merge LLM output with heuristic skills (dedupe + sanitization) only when new items pass validation.
55-
- Stable ordering by first appearance in the JD text, then alphabetically.
51+
- Sanitization + deduping of LLM output; stable ordering by first appearance in the JD text, then alphabetically.
5652
- In-memory cache keyed by title/description/keywords/limit so repeated matches reuse the same list.

hire-shark/prompts/prompts.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,10 +64,10 @@ ${JSON.stringify(editedResume.parsed, null, 2)}
6464

6565
## JD Skill Extraction (hire-shark/src/lib/skillExtractor.ts)
6666
```
67-
You are identifying the most important skills required for a job posting.
68-
- Read the provided job title and description.
69-
- Return ONLY a JSON array (no additional commentary) listing up to ${limit} unique skill phrases.
70-
- Each skill should be 1-4 words, concrete, and deduplicated (e.g., "React", "Stakeholder Management", "AWS Cloud").
67+
You extract the most important skills a candidate needs for this job. Infer required skills even if only implied by responsibilities.
68+
- Focus on technologies, tools, frameworks, domain skills, certifications, and relevant soft skills.
69+
- Exclude locations, schedules (hours/week), pay/benefits, employment type, headcount, and generic nouns.
70+
- Return ONLY a JSON array (no commentary) of up to ${limit} unique skill phrases, each 1-4 words, title-cased when appropriate.
7171
7272
Job Title: ${title || "(missing)"}
7373
@@ -78,4 +78,4 @@ Existing keywords: ${keywords.length ? keywords.join(", ") : "none"}
7878
```
7979
Notes:
8080
- Called with `temperature: 0`, `topK: 1`, `topP: 0.1` for deterministic output.
81-
- Response is merged with the heuristic fallback list only when new skills are returned.
81+
- LLM-only: no heuristic fallback; if Gemini unavailable, it falls back to provided keywords.

hire-shark/src/lib/gemini_parser.ts

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -197,11 +197,14 @@ export async function parseResumeWithGemini(file: File): Promise<ResumeParsed> {
197197
}
198198
}
199199
200-
Important: Extract all education entries including degrees, certifications, and educational qualifications.
201-
Include the degree type (e.g., Bachelor's, Master's, PhD), field of study, institution name, location, dates, GPA (if available), and any honors or distinctions.
202-
If you cannot find any information for a field, leave it as an empty string or an empty array.
203-
If the uploaded file is not a resume, try your best and you should still return a JSON object.
204-
When nothing is found for a section, The confidence for that section should be low.
200+
Important:
201+
- Skills must be exhaustive: include hard skills, tools, frameworks, domains, certifications, and relevant soft skills.
202+
- Infer skills from projects/responsibilities even if not in a "Skills" section.
203+
- Include common aliases where helpful (e.g., ["C++", "C Plus Plus"]), keep items 1-4 words, deduplicated and normalized.
204+
- Extract all education entries including degrees, certifications, and educational qualifications with degree, field, institution, location, dates, GPA (if available), and honors.
205+
- If you cannot find any information for a field, leave it as an empty string or an empty array.
206+
- If the uploaded file is not a resume, try your best and you should still return a JSON object.
207+
- When nothing is found for a section, the confidence for that section should be low.
205208
`;
206209

207210
const generativePart = await buildGenerativePart(file);

hire-shark/src/lib/jobMatcher.ts

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ import type { ResumeParsed, MatchResult, JobPreferences } from "../types";
22
import { fetchAdzunaJobs } from "./adzunaApi";
33
import { extractJobSkills } from "./skillExtractor";
44

5-
const MATCH_THRESHOLD = 0.75;
5+
const MATCH_THRESHOLD = 0.5;
66
const MAX_SKILL_BADGES = 10;
77

88
type SkillVector = {
@@ -57,7 +57,7 @@ export async function matchJobs(resume: ResumeParsed, preferences: JobPreference
5757
const embeddingEngine = skillCorpus.length ? await buildSkillEmbeddingEngine(skillCorpus) : null;
5858
const vectorCache = new Map<string, number[]>();
5959
const getVector = (text: string): number[] => {
60-
const key = text.toLowerCase();
60+
const key = normalizeSkill(text);
6161
if (vectorCache.has(key)) {
6262
return vectorCache.get(key)!;
6363
}
@@ -84,12 +84,14 @@ export async function matchJobs(resume: ResumeParsed, preferences: JobPreference
8484

8585
const matchedSkills: string[] = [];
8686
const missingSkills: string[] = [];
87+
let coverageSum = 0;
8788

8889
for (const jobSkill of jobSkillVectors) {
8990
if (!jobSkill.normalized) continue;
9091

9192
if (resumeSkillMap.has(jobSkill.normalized)) {
9293
matchedSkills.push(jobSkill.raw);
94+
coverageSum += 1;
9395
continue;
9496
}
9597

@@ -104,17 +106,19 @@ export async function matchJobs(resume: ResumeParsed, preferences: JobPreference
104106
}
105107
}
106108

109+
const contribution = Math.max(0, Math.min(1, bestScore));
110+
coverageSum += contribution;
107111
if (bestScore >= MATCH_THRESHOLD) {
108112
matchedSkills.push(jobSkill.raw);
109-
} else {
113+
} else if (contribution < 1) {
110114
missingSkills.push(jobSkill.raw);
111115
}
112116
}
113117

114118
const totalRequired = jobSkills.length;
115119
const matchedCount = matchedSkills.length;
116120
const missingCount = missingSkills.length;
117-
const coverage = totalRequired ? matchedCount / totalRequired : 0;
121+
const coverage = totalRequired ? coverageSum / totalRequired : 0;
118122

119123
return {
120124
jobId: job.id,
@@ -140,7 +144,26 @@ export async function matchJobs(resume: ResumeParsed, preferences: JobPreference
140144
}
141145

142146
function normalizeSkill(value: string): string {
143-
return value.toLowerCase().replace(/\s+/g, " ").trim();
147+
const lowered = (value || "")
148+
.toLowerCase()
149+
.replace(/[^\w\s#+/./-]/g, " ")
150+
.replace(/\s+/g, " ")
151+
.trim();
152+
return stemSkill(lowered);
153+
}
154+
155+
function stemSkill(value: string): string {
156+
let result = value;
157+
result = result.replace(/\b(development|developing)\b/g, "develop");
158+
result = result.replace(/\b(engineering|engineer|engineers)\b/g, "engineer");
159+
result = result.replace(/\b(programming|programmer|programmers)\b/g, "program");
160+
result = result.replace(/\b(designer|designers|designing)\b/g, "design");
161+
result = result.replace(/\b(optimization|optimisation|optimizing)\b/g, "optimize");
162+
result = result.replace(/\b(maintenance|maintaining)\b/g, "maintain");
163+
result = result.replace(/\b(problem solving|problem-solving)\b/g, "problem solving");
164+
// Drop simple plurals.
165+
result = result.replace(/\b(\w+?)s\b/g, "$1");
166+
return result.trim();
144167
}
145168

146169
function dedupeSkills(list: string[]): string[] {
@@ -165,7 +188,7 @@ async function buildSkillEmbeddingEngine(corpus: string[]): Promise<SkillEmbeddi
165188
const adapter = await getSkipGramAdapter(corpus);
166189
if (adapter) {
167190
return {
168-
vectorize: (text: string) => adapter.vectorizeText(text),
191+
vectorize: (text: string) => adapter.vectorizeText(normalizeSkill(text)),
169192
cosine: (a: number[], b: number[]) => adapter.cosineSimilarity(a, b),
170193
};
171194
}
@@ -174,9 +197,9 @@ async function buildSkillEmbeddingEngine(corpus: string[]): Promise<SkillEmbeddi
174197
}
175198

176199
const { buildVectorizer, cosineSimilarity } = await import("./embeddings");
177-
const vectorizer = buildVectorizer(corpus);
200+
const vectorizer = buildVectorizer(corpus.map(normalizeSkill));
178201
return {
179-
vectorize: (text: string) => vectorizer.vectorize(text),
202+
vectorize: (text: string) => vectorizer.vectorize(normalizeSkill(text)),
180203
cosine: (a: number[], b: number[]) => cosineSimilarity(a, b),
181204
};
182205
}

hire-shark/src/lib/skillExtractor.ts

Lines changed: 21 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import { hasGeminiApiKeyConfigured, runWithGeminiModel } from "./gemini_parser";
22

3-
const DEFAULT_SKILL_LIMIT = 12;
3+
const DEFAULT_SKILL_LIMIT = 10;
44
const COMMON_SKILL_CANDIDATES = [
55
"python",
66
"java",
@@ -184,9 +184,8 @@ type ExtractJobSkillsInput = {
184184

185185
/**
186186
* Extracts a concise set of skills required for a job posting.
187-
* 1. Deterministic fallback extraction (keyword + heuristic parsing).
188-
* 2. Optional Gemini enrichment with low-temperature settings.
189-
* 3. Stable ordering + caching guarded by a hash of the JD content.
187+
* Uses Gemini (deterministic settings) only; no heuristic fallback parsing.
188+
* Stable ordering + caching guarded by a hash of the JD content.
190189
*/
191190
export async function extractJobSkills(input: ExtractJobSkillsInput): Promise<string[]> {
192191
const {
@@ -196,7 +195,7 @@ export async function extractJobSkills(input: ExtractJobSkillsInput): Promise<st
196195
limit = DEFAULT_SKILL_LIMIT,
197196
} = input;
198197

199-
const cleanDescription = stripHtml(description).slice(0, 2500);
198+
const cleanDescription = stripHtml(description).slice(0, 3500);
200199
const keywordHints = keywords.filter(Boolean);
201200
if (!cleanDescription.trim() && !title.trim() && !keywordHints.length) {
202201
return [];
@@ -207,20 +206,13 @@ export async function extractJobSkills(input: ExtractJobSkillsInput): Promise<st
207206
return SKILL_CACHE.get(cacheKey)!;
208207
}
209208

210-
const fallbackSkills = fallbackSkillExtraction({
211-
title,
212-
description: cleanDescription,
213-
keywords: keywordHints,
214-
limit: Math.max(limit, DEFAULT_SKILL_LIMIT),
215-
});
216-
217209
const shouldUseGemini = !GEMINI_DISABLED && hasGeminiApiKeyConfigured();
218-
let mergedSkills = fallbackSkills;
210+
let rawSkills: string[] = [];
219211

220212
if (shouldUseGemini) {
221213
try {
222214
const prompt = buildPrompt({ title, description: cleanDescription, keywords: keywordHints, limit });
223-
const parsed = await runWithGeminiModel(
215+
rawSkills = await runWithGeminiModel(
224216
async model => {
225217
const result = await model.generateContent(prompt);
226218
const response = await result.response;
@@ -236,27 +228,32 @@ export async function extractJobSkills(input: ExtractJobSkillsInput): Promise<st
236228
},
237229
},
238230
);
239-
240-
if (parsed.length) {
241-
mergedSkills = mergeSkillLists(fallbackSkills, parsed);
242-
}
243231
} catch (error) {
244-
console.warn("Gemini skill extraction failed, using fallback skills.", error);
232+
console.warn("Gemini skill extraction failed, returning keywords only.", error);
233+
rawSkills = [];
245234
}
235+
} else {
236+
console.warn("Gemini disabled or not configured; returning keywords only.");
246237
}
247238

248-
const finalSkills = finalizeSkillList(mergedSkills, cleanDescription, limit);
239+
// If LLM unavailable or empty, at least return any provided keywords.
240+
if (!rawSkills.length && keywordHints.length) {
241+
rawSkills = keywordHints;
242+
}
243+
244+
const finalSkills = finalizeSkillList(rawSkills, cleanDescription, limit);
249245
SKILL_CACHE.set(cacheKey, finalSkills);
250246
return finalSkills;
251247
}
252248

253249
function buildPrompt(args: { title: string; description: string; keywords: string[]; limit: number }): string {
254250
const { title, description, keywords, limit } = args;
255251
return `
256-
You are identifying the most important skills required for a job posting.
257-
- Read the provided job title and description.
258-
- Return ONLY a JSON array (no additional commentary) listing up to ${limit} unique skill phrases.
259-
- Each skill should be 1-4 words, concrete, and deduplicated (e.g., "React", "Stakeholder Management", "AWS Cloud").
252+
You extract the most important skills a candidate needs for this job. Infer required skills even if only implied by responsibilities.
253+
- Focus on technologies, tools, frameworks, domain skills, certifications, and relevant soft skills.
254+
- Exclude locations, schedules (hours/week), pay/benefits, employment type, headcount, and generic nouns.
255+
- Return ONLY a JSON array (no commentary) of up to ${limit} unique skill phrases, each 1-4 words, title-cased when appropriate.
256+
- Prioritize the top ${limit} most critical skills; avoid long tail/overly specific variants.
260257
261258
Job Title: ${title || "(missing)"}
262259
@@ -282,67 +279,11 @@ function parseSkillArray(payload: string): string[] {
282279
return [];
283280
}
284281

285-
function fallbackSkillExtraction(args: { title: string; description: string; keywords: string[]; limit: number }): string[] {
286-
const { title, description, keywords, limit } = args;
287-
const haystack = `${title}\n${description}`.toLowerCase();
288-
289-
const addSkill = (acc: string[], value?: string) => {
290-
const sanitized = sanitizeSkill(value);
291-
if (!sanitized) return;
292-
if (acc.some(skill => skill.toLowerCase() === sanitized.toLowerCase())) return;
293-
acc.push(sanitized);
294-
};
295-
296-
const collected: string[] = [];
297-
for (const keyword of keywords) {
298-
if (collected.length >= limit) break;
299-
addSkill(collected, keyword);
300-
}
301-
302-
for (const candidate of COMMON_SKILL_CANDIDATES) {
303-
if (collected.length >= limit) break;
304-
if (haystack.includes(candidate)) {
305-
addSkill(collected, candidate);
306-
}
307-
}
308-
309-
if (collected.length < limit) {
310-
const quickPhrases = description
311-
.split(/[\n,;\.]/)
312-
.map(chunk => chunk.trim())
313-
.filter(chunk => chunk && chunk.length <= 40 && chunk.split(/\s+/).length <= 4);
314-
for (const phrase of quickPhrases) {
315-
if (collected.length >= limit) break;
316-
addSkill(collected, phrase);
317-
}
318-
}
319-
320-
if (!collected.length && title) {
321-
addSkill(collected, title);
322-
}
323-
324-
return collected.slice(0, limit);
325-
}
326-
327282
function stripHtml(value?: string): string {
328283
if (!value) return "";
329284
return value.replace(/<[^>]+>/g, " ");
330285
}
331286

332-
function mergeSkillLists(base: string[], extras: string[]): string[] {
333-
const seen = new Set(base.map(s => s.toLowerCase()));
334-
const result = [...base];
335-
for (const skill of extras) {
336-
const sanitized = sanitizeSkill(skill);
337-
if (!sanitized) continue;
338-
const normalized = sanitized.toLowerCase();
339-
if (seen.has(normalized)) continue;
340-
seen.add(normalized);
341-
result.push(sanitized);
342-
}
343-
return result;
344-
}
345-
346287
function finalizeSkillList(skills: string[], description: string, limit: number): string[] {
347288
const deduped = dedupeSkills(skills);
348289
const loweredDescription = description.toLowerCase();

0 commit comments

Comments
 (0)