Skip to content

Commit 575b8ba

Browse files
committed
feat(api): added Groq Llama 3.3 70B as cross-provider fallback
- replaced Google Flash/Flash-Lite fallbacks with Groq (independent quota, 14,400 RPD) - moved JSON validation inside retry loop so bad responses trigger next provider - per-provider timeouts [35s, 20s] fit within Vercel's 60s maxDuration - removed all Cerebras references from docs and .env.example - updated about page, README, and all Starlight docs with new provider chain - added Google/Groq rate limit doc links where RPM/RPD/TPM tables appear - added scripts/test-providers.mjs for dry-run provider testing
1 parent 89a6d62 commit 575b8ba

File tree

13 files changed

+399
-69
lines changed

13 files changed

+399
-69
lines changed

.env.example

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
# Get yours at https://aistudio.google.com/apikey
33
GEMINI_API_KEY=
44

5-
# Optional fallback providers (for self-hosting)
5+
# Groq API key (recommended fallback, Llama 3.3 70B)
6+
# Get yours at https://console.groq.com/keys
67
# GROQ_API_KEY=
7-
# CEREBRAS_API_KEY=
88

99
# Firebase configuration (required)
1010
# Get these from Firebase Console → Project Settings → Your Apps

.prettierignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,5 @@ coverage/
66
playwright-report/
77
test-results/
88
pnpm-lock.yaml
9+
docs/.astro/
10+
docs/pnpm-lock.yaml

README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -74,18 +74,18 @@ Each profile is based on research into the platform's documented parsing and mat
7474

7575
## Tech Stack
7676

77-
| Layer | Choice | Why |
78-
| ---------------- | -------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
79-
| **Framework** | SvelteKit 5 (Svelte 5 runes) | Compiled to vanilla JS, ~15KB runtime. No VDOM overhead. |
80-
| **Styling** | Scoped CSS + CSS custom properties | Dark glassmorphic design. No Tailwind. Component-scoped. |
81-
| **PDF Parsing** | pdfjs-dist (Web Worker) | Mozilla-maintained, fully client-side. |
82-
| **DOCX Parsing** | mammoth | Client-side Word to text extraction. |
83-
| **NLP** | Custom TF-IDF + tokenizer + skills taxonomy | Lightweight, browser-native, supports 8+ industries. |
84-
| **LLM** | Gemma 3 27B (primary), Gemini 2.5 Flash (fallback) | 14,400 RPD free tier via Google Generative Language API. Groq + Cerebras available for self-host. |
85-
| **Auth** | Firebase Authentication | Google + email/password sign-in. Free Spark plan. |
86-
| **Storage** | Cloud Firestore | Scan history per user. Free Spark plan. |
87-
| **Hosting** | Vercel | Free hobby tier. Edge functions for API. |
88-
| **Testing** | Vitest + Playwright + @testing-library/svelte | Unit, integration, and E2E coverage. |
77+
| Layer | Choice | Why |
78+
| ---------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------- |
79+
| **Framework** | SvelteKit 5 (Svelte 5 runes) | Compiled to vanilla JS, ~15KB runtime. No VDOM overhead. |
80+
| **Styling** | Scoped CSS + CSS custom properties | Dark glassmorphic design. No Tailwind. Component-scoped. |
81+
| **PDF Parsing** | pdfjs-dist (Web Worker) | Mozilla-maintained, fully client-side. |
82+
| **DOCX Parsing** | mammoth | Client-side Word to text extraction. |
83+
| **NLP** | Custom TF-IDF + tokenizer + skills taxonomy | Lightweight, browser-native, supports 8+ industries. |
84+
| **LLM** | Gemma 3 27B (primary), Llama 3.3 70B via Groq (fallback) | Cross-provider fallback: Google (14,400 RPD) + Groq (14,400 RPD) on independent free tiers. |
85+
| **Auth** | Firebase Authentication | Google + email/password sign-in. Free Spark plan. |
86+
| **Storage** | Cloud Firestore | Scan history per user. Free Spark plan. |
87+
| **Hosting** | Vercel | Free hobby tier. Edge functions for API. |
88+
| **Testing** | Vitest + Playwright + @testing-library/svelte | Unit, integration, and E2E coverage. |
8989

9090
**Total infrastructure cost: $0.** Everything runs on free tiers.
9191

docs/src/content/docs/api/rate-limits.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,16 @@ When you receive a `429` response:
5050

5151
When self-hosting, rate limits are configurable. The actual bottleneck becomes your LLM provider's free tier:
5252

53-
| Provider | Model | RPM | RPD |
54-
| -------- | --------------- | --- | ------ |
55-
| Gemma | 3 27B (primary) | 30 | 14,400 |
56-
| Gemini | 2.5 Flash | 5 | 20 |
57-
| Gemini | 2.5 Flash Lite | 10 | 20 |
58-
| Groq | Llama 3.3 70B | 30 | 14,400 |
59-
| Cerebras | Llama 3.3 70B | 30 | 1,000 |
53+
| Provider | Model | RPM | RPD | TPM |
54+
| -------- | ------------- | ---- | ------ | --- |
55+
| Google | Gemma 3 27B | 30 | 14,400 | 15K |
56+
| Groq | Llama 3.3 70B | 1000 | 14,400 | 12K |
57+
58+
For the latest limits, see the official documentation:
59+
60+
- [Google AI rate limits](https://ai.google.dev/gemini-api/docs/rate-limits)
61+
- [Groq rate limits](https://console.groq.com/docs/rate-limits)
6062

6163
:::tip
62-
The hosted version uses Gemma 3 27B as the primary model (14,400 RPD), giving roughly 14,000+ scans per day on the free tier. Groq and Cerebras are available as optional fallbacks for self-hosted instances.
64+
The hosted version uses Gemma 3 27B as the primary model with Llama 3.3 70B via Groq as fallback. Both run on independent free tiers. The binding constraint is TPM (tokens per minute), not RPD. Each scan uses ~8,000 tokens total (prompt + response), giving a realistic combined throughput of roughly 4,500 scans per day under sustained load.
6365
:::

docs/src/content/docs/getting-started/introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Built with performance and privacy in mind:
5656
- **SvelteKit 5** with Svelte 5 runes for the frontend
5757
- **pdfjs-dist** (Web Worker) for client-side PDF parsing
5858
- **mammoth** for client-side DOCX parsing
59-
- **Gemma 3 27B** (primary) with Gemini fallbacks for AI-powered analysis
59+
- **Gemma 3 27B** (primary) with **Llama 3.3 70B** via Groq as fallback for AI-powered analysis
6060
- **Firebase** for authentication and scan history
6161
- **Vercel** for hosting (free tier)
6262

docs/src/content/docs/self-hosting/configuration.md

Lines changed: 22 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -7,39 +7,37 @@ description: Environment variables and configuration options for self-hosted ins
77

88
All configuration is done through environment variables in the `.env` file.
99

10-
| Variable | Required | Description |
11-
| ------------------ | -------- | ---------------------------------------------------- |
12-
| `GEMINI_API_KEY` | Yes | Google AI API key (used for Gemma 3 + Gemini models) |
13-
| `GROQ_API_KEY` | Optional | Groq API key (optional fallback) |
14-
| `CEREBRAS_API_KEY` | Optional | Cerebras API key (optional fallback) |
10+
| Variable | Required | Description |
11+
| ---------------- | ----------- | ---------------------------------------------- |
12+
| `GEMINI_API_KEY` | Yes | Google AI API key (powers Gemma 3 27B primary) |
13+
| `GROQ_API_KEY` | Recommended | Groq API key (Llama 3.3 70B fallback) |
1514

1615
:::caution
1716
Never commit your `.env` file to version control. It's already in `.gitignore`, but double-check before pushing.
1817
:::
1918

2019
## Provider Priority
2120

22-
The LLM fallback chain follows this order:
21+
The LLM fallback chain uses cross-provider redundancy so quota limits on one provider don't cascade:
2322

24-
1. **Gemma 3 27B** (primary, 14,400 RPD via `GEMINI_API_KEY`)
25-
2. **Gemini 2.5 Flash** (fallback, 20 RPD via `GEMINI_API_KEY`)
26-
3. **Gemini 2.5 Flash Lite** (fallback, 20 RPD via `GEMINI_API_KEY`)
27-
4. **Groq Llama 3.3 70B** (if `GROQ_API_KEY` is set)
28-
5. **Cerebras Llama 3.3 70B** (if `CEREBRAS_API_KEY` is set)
23+
1. **Gemma 3 27B** via Google (primary, `GEMINI_API_KEY`)
24+
2. **Llama 3.3 70B** via Groq (fallback, `GROQ_API_KEY`)
2925

30-
If a provider fails (timeout, rate limit, malformed response), the system automatically tries the next one. All Google models (Gemma + Gemini) use the same API key.
26+
If a provider fails (timeout, rate limit, malformed response), the system automatically tries the next one. Because each provider uses a separate API key, their quotas are completely independent.
3127

3228
## Free Tier Limits
3329

34-
| Provider | Model | RPM | RPD | Cost |
35-
| -------- | --------------- | --- | ------ | ---------------------- |
36-
| Gemma | 3 27B (primary) | 30 | 14,400 | Free (blocks at limit) |
37-
| Gemini | 2.5 Flash | 5 | 20 | Free (blocks at limit) |
38-
| Gemini | 2.5 Flash Lite | 10 | 20 | Free (blocks at limit) |
39-
| Groq | Llama 3.3 70B | 30 | 14,400 | Free |
40-
| Cerebras | Llama 3.3 70B | 30 | 1,000 | Free |
30+
| Provider | Model | RPM | RPD | TPM | Cost |
31+
| -------- | ------------- | ---- | ------ | --- | ---- |
32+
| Google | Gemma 3 27B | 30 | 14,400 | 15K | Free |
33+
| Groq | Llama 3.3 70B | 1000 | 14,400 | 12K | Free |
4134

42-
**Key detail about Google AI:** The free tier will **block** requests at the limit, never auto-charge. You cannot accidentally incur costs.
35+
Both providers block at their limits and never auto-charge. You cannot accidentally incur costs.
36+
37+
For the latest limits, see the official documentation:
38+
39+
- [Google AI rate limits](https://ai.google.dev/gemini-api/docs/rate-limits)
40+
- [Groq rate limits](https://console.groq.com/docs/rate-limits)
4341

4442
## Rate Limiting
4543

@@ -56,10 +54,11 @@ Adjust these values based on your expected traffic and API key limits.
5654

5755
## Timeouts
5856

59-
The default timeout for LLM requests is 60 seconds:
57+
Each provider has its own timeout. [Vercel Fluid Compute](https://vercel.com/docs/fluid-compute) is enabled by default and allows up to 300 seconds on the Hobby plan:
6058

6159
```typescript
62-
const PROVIDER_TIMEOUT_MS = 60_000;
60+
// Gemma: 90s, Groq: 30s → worst case total: 120s
61+
const PROVIDER_TIMEOUTS_MS = [90_000, 30_000];
6362
```
6463

65-
Increase this if you're experiencing timeouts with longer resumes.
64+
Gemma 3 27B typically takes 30-45 seconds for the full scoring prompt but can spike under load. The 90s timeout gives generous headroom. Groq responds in under 1 second but gets 30s for safety. If both providers fail, the system falls back to rule-based scoring on the client side.

docs/src/content/docs/self-hosting/deployment.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,7 @@ In the Vercel dashboard:
2424
1. Go to your project > **Settings** > **Environment Variables**
2525
2. Add your API keys:
2626
- `GEMINI_API_KEY` (required)
27-
- `GROQ_API_KEY` (optional fallback)
28-
- `CEREBRAS_API_KEY` (optional fallback)
27+
- `GROQ_API_KEY` (recommended fallback)
2928
3. Add your Firebase config (all `PUBLIC_FIREBASE_*` variables from `.env.example`)
3029

3130
:::tip

docs/src/content/docs/self-hosting/setup.md

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ATS Screener can be self-hosted for free. You'll need at least one LLM API key.
99

1010
- **Node.js** 18+ (20 recommended)
1111
- **pnpm** 8+ (package manager)
12-
- A free API key from [Google AI Studio](https://aistudio.google.com/apikey) (required for Gemma/Gemini models)
12+
- A free API key from [Google AI Studio](https://aistudio.google.com/apikey) (required for Gemma 3 27B)
1313

1414
## Installation
1515

@@ -33,20 +33,14 @@ cp .env.example .env
3333
2. Click "Create API Key"
3434
3. Add to `.env`: `GEMINI_API_KEY=your_key_here`
3535

36-
### Groq (Optional Fallback)
36+
### Groq (Recommended Fallback)
3737

3838
1. Go to [Groq Console](https://console.groq.com/keys)
3939
2. Create a new API key
4040
3. Add to `.env`: `GROQ_API_KEY=your_key_here`
4141

42-
### Cerebras (Optional Fallback)
43-
44-
1. Go to [Cerebras Cloud](https://cloud.cerebras.ai/)
45-
2. Generate an API key
46-
3. Add to `.env`: `CEREBRAS_API_KEY=your_key_here`
47-
4842
:::tip
49-
You only need the **Google AI (Gemini) API key** to run the app. It powers Gemma 3 27B (14,400 RPD) as the primary model plus Gemini models as fallbacks. Groq and Cerebras are optional for additional availability.
43+
You need the **Google AI API key** to run the app (Gemma 3 27B primary, 14,400 RPD). Adding a **Groq API key** is strongly recommended as it provides a completely independent fallback (Llama 3.3 70B, 14,400 RPD) so users never see failures during peak traffic.
5044
:::
5145

5246
## Run Locally

eslint.config.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,8 @@ export default ts.config(
4141
'playwright-report/',
4242
'test-results/',
4343
'docs/',
44-
'static/docs/'
44+
'static/docs/',
45+
'scripts/'
4546
]
4647
}
4748
);

scripts/test-gemma-json.mjs

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
import { readFileSync } from 'fs';
2+
3+
const envContent = readFileSync('.env', 'utf-8');
4+
const GEMINI_KEY = envContent.match(/GEMINI_API_KEY=(.+)/)?.[1]?.trim();
5+
6+
if (!GEMINI_KEY) {
7+
console.error('no GEMINI_API_KEY in .env');
8+
process.exit(1);
9+
}
10+
11+
// simulate the real scoring prompt (similar size to buildFullScoringPrompt)
12+
const resumeText =
13+
'Sunny Patel. (437) 216-1611. Software Engineer Intern at IBM. IT Technician at Canadas Wonderland. Supported tenant-to-tenant migration during Six Flags acquisition for 3000+ directory objects. Authored 10+ PowerShell/ConnectWise scripts automating workstation imaging. Built and deployed MDT task sequences. Managed Active Directory accounts, security groups, and GPOs. System Support Specialist at Mackenzie Health. Migrated 400+ Surface tablets to bedside iPads. Skills: Java, Python, Go, Scala, PowerShell, C++, C#, YAML, Kotlin, Assembly, Django, Ruby on Rails, MongoDB, PostgreSQL, MySQL, Express.js, ASP.NET Core, Spring Boot, Kafka, React.js, JavaScript, Flutter, TypeScript, WebGL, GraphQL, Tailwind CSS, Three.js, Vue.js, Git, Docker, Kubernetes, Azure, GCP, AWS, Jamf Pro, Datadog. Education: Ontario Tech University, Honours BSc Computer Science. Projects: Axelot collaborative document platform with Next.js 16 and WebRTC, Netdash Electron networking toolkit with 15+ tools, SecureBank CTF banking app for SQL injection training, Sunnify Spotify downloader with PyQt5. Certifications: Microsoft GH-300 GitHub Copilot Intermediate, MongoDB Python Developer Path, GitHub Foundations, ConnectWise Automate Certified Enterprise Scripting Architect, Google IT Automation with Python.';
14+
15+
const scoringPrompt = `You are a senior talent acquisition technology analyst. Analyze this resume from the perspective of 6 enterprise ATS platforms.
16+
17+
<RESUME>
18+
${resumeText}
19+
</RESUME>
20+
21+
MODE: general ATS readiness. Evaluate formatting, structure, and keyword density.
22+
23+
## PLATFORM SPECIFICATIONS
24+
### 1. WORKDAY RECRUITING - strict parser, skips headers/footers, penalizes creative formats
25+
### 2. ORACLE TALEO - literal exact keyword match, strictest matching
26+
### 3. iCIMS - semantic ML-based matching, most forgiving parser
27+
### 4. GREENHOUSE - LLM-based parser, no auto-scoring, human review focused
28+
### 5. LEVER - stemming-based matching, no ranking system
29+
### 6. SAP SUCCESSFACTORS - Textkernel parser, taxonomy normalization
30+
31+
Score each platform on: formatting (0-100), keywordMatch (0-100), sections (0-100), experience (0-100), education (0-100), overallScore (0-100).
32+
33+
Respond ONLY with valid JSON matching this structure:
34+
{
35+
"results": [
36+
{
37+
"system": "Workday",
38+
"vendor": "Workday Inc.",
39+
"overallScore": 75,
40+
"passesFilter": true,
41+
"breakdown": {
42+
"formatting": { "score": 80, "issues": [], "details": [] },
43+
"keywordMatch": { "score": 70, "matched": [], "missing": [], "synonymMatched": [] },
44+
"sections": { "score": 85, "present": [], "missing": [] },
45+
"experience": { "score": 75, "quantifiedBullets": 5, "totalBullets": 10, "actionVerbCount": 7, "highlights": [] },
46+
"education": { "score": 90, "notes": [] }
47+
},
48+
"suggestions": []
49+
}
50+
]
51+
}
52+
53+
Return exactly 6 results: Workday, Taleo, iCIMS, Greenhouse, Lever, SuccessFactors.`;
54+
55+
console.log('prompt length:', scoringPrompt.length, 'chars');
56+
console.log('estimated tokens:', Math.ceil(scoringPrompt.length / 3.5));
57+
console.log('');
58+
59+
async function test() {
60+
const start = Date.now();
61+
const res = await fetch(
62+
`https://generativelanguage.googleapis.com/v1beta/models/gemma-3-27b-it:generateContent?key=${GEMINI_KEY}`,
63+
{
64+
method: 'POST',
65+
headers: { 'Content-Type': 'application/json' },
66+
body: JSON.stringify({
67+
contents: [{ parts: [{ text: scoringPrompt }] }],
68+
generationConfig: { temperature: 0.3, topP: 0.85, maxOutputTokens: 16384 }
69+
})
70+
}
71+
);
72+
73+
const elapsed = Date.now() - start;
74+
console.log('status:', res.status, `(${elapsed}ms)`);
75+
76+
if (!res.ok) {
77+
const err = await res.text();
78+
console.log('ERROR:', err.slice(0, 500));
79+
return;
80+
}
81+
82+
const data = await res.json();
83+
const text = data.candidates?.[0]?.content?.parts?.[0]?.text ?? '';
84+
console.log('response length:', text.length, 'chars');
85+
86+
// try JSON parse (same logic as extractJSON in +server.ts)
87+
const trimmed = text.trim();
88+
89+
// attempt 1: direct parse
90+
try {
91+
JSON.parse(trimmed);
92+
console.log('JSON parse: DIRECT SUCCESS');
93+
return;
94+
} catch {
95+
/* continue */
96+
}
97+
98+
// attempt 2: strip markdown fences
99+
const cleaned = trimmed.replace(/```json\n?|\n?```/g, '').trim();
100+
try {
101+
JSON.parse(cleaned);
102+
console.log('JSON parse: SUCCESS (after fence strip)');
103+
return;
104+
} catch {
105+
/* continue */
106+
}
107+
108+
// attempt 3: find { ... } block
109+
const s = cleaned.indexOf('{');
110+
const e = cleaned.lastIndexOf('}');
111+
if (s !== -1 && e > s) {
112+
try {
113+
JSON.parse(cleaned.slice(s, e + 1));
114+
console.log('JSON parse: SUCCESS (extracted { } block)');
115+
return;
116+
} catch {
117+
/* continue */
118+
}
119+
}
120+
121+
console.log('JSON parse: FAILED - this is why Gemma falls through to Groq');
122+
console.log('--- first 500 chars of response ---');
123+
console.log(text.slice(0, 500));
124+
console.log('--- last 200 chars of response ---');
125+
console.log(text.slice(-200));
126+
}
127+
128+
test().catch(console.error);

0 commit comments

Comments
 (0)