Skip to content

Commit bebe395

Browse files
committed
feat(api): added Groq Llama 3.3 70B as cross-provider fallback
- replaced Google Flash/Flash-Lite fallbacks with Groq (independent quota, 14,400 RPD) - moved JSON validation inside retry loop so bad responses trigger next provider - per-provider timeouts [35s, 20s] fit within Vercel's 60s maxDuration - removed all Cerebras references from docs and .env.example - updated about page, README, and all Starlight docs with new provider chain - added Google/Groq rate limit doc links where RPM/RPD/TPM tables appear - added scripts/test-providers.mjs for dry-run provider testing
1 parent 89a6d62 commit bebe395

File tree

11 files changed

+236
-57
lines changed

11 files changed

+236
-57
lines changed

.env.example

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
# Get yours at https://aistudio.google.com/apikey
33
GEMINI_API_KEY=
44

5-
# Optional fallback providers (for self-hosting)
5+
# Groq API key (recommended fallback, Llama 3.3 70B)
6+
# Get yours at https://console.groq.com/keys
67
# GROQ_API_KEY=
7-
# CEREBRAS_API_KEY=
88

99
# Firebase configuration (required)
1010
# Get these from Firebase Console → Project Settings → Your Apps

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ Each profile is based on research into the platform's documented parsing and mat
8181
| **PDF Parsing** | pdfjs-dist (Web Worker) | Mozilla-maintained, fully client-side. |
8282
| **DOCX Parsing** | mammoth | Client-side Word to text extraction. |
8383
| **NLP** | Custom TF-IDF + tokenizer + skills taxonomy | Lightweight, browser-native, supports 8+ industries. |
84-
| **LLM** | Gemma 3 27B (primary), Gemini 2.5 Flash (fallback) | 14,400 RPD free tier via Google Generative Language API. Groq + Cerebras available for self-host. |
84+
| **LLM** | Gemma 3 27B (primary), Llama 3.3 70B via Groq (fallback) | Cross-provider fallback: Google (14,400 RPD) + Groq (14,400 RPD) on independent free tiers. |
8585
| **Auth** | Firebase Authentication | Google + email/password sign-in. Free Spark plan. |
8686
| **Storage** | Cloud Firestore | Scan history per user. Free Spark plan. |
8787
| **Hosting** | Vercel | Free hobby tier. Edge functions for API. |

docs/src/content/docs/api/rate-limits.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,16 @@ When you receive a `429` response:
5050

5151
When self-hosting, rate limits are configurable. The actual bottleneck becomes your LLM provider's free tier:
5252

53-
| Provider | Model | RPM | RPD |
54-
| -------- | --------------- | --- | ------ |
55-
| Gemma | 3 27B (primary) | 30 | 14,400 |
56-
| Gemini | 2.5 Flash | 5 | 20 |
57-
| Gemini | 2.5 Flash Lite | 10 | 20 |
58-
| Groq | Llama 3.3 70B | 30 | 14,400 |
59-
| Cerebras | Llama 3.3 70B | 30 | 1,000 |
53+
| Provider | Model | RPM | RPD | TPM |
54+
| -------- | ------------- | ---- | ------ | --- |
55+
| Google | Gemma 3 27B | 30 | 14,400 | 15K |
56+
| Groq | Llama 3.3 70B | 1000 | 14,400 | 12K |
57+
58+
For the latest limits, see the official documentation:
59+
60+
- [Google AI rate limits](https://ai.google.dev/gemini-api/docs/rate-limits)
61+
- [Groq rate limits](https://console.groq.com/docs/rate-limits)
6062

6163
:::tip
62-
The hosted version uses Gemma 3 27B as the primary model (14,400 RPD), giving roughly 14,000+ scans per day on the free tier. Groq and Cerebras are available as optional fallbacks for self-hosted instances.
64+
The hosted version uses Gemma 3 27B as the primary model with Llama 3.3 70B via Groq as fallback. Both run on independent free tiers. The binding constraint is TPM (tokens per minute), not RPD. Each scan uses ~8,000 tokens total (prompt + response), giving a realistic combined throughput of roughly 4,500 scans per day under sustained load.
6365
:::

docs/src/content/docs/getting-started/introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Built with performance and privacy in mind:
5656
- **SvelteKit 5** with Svelte 5 runes for the frontend
5757
- **pdfjs-dist** (Web Worker) for client-side PDF parsing
5858
- **mammoth** for client-side DOCX parsing
59-
- **Gemma 3 27B** (primary) with Gemini fallbacks for AI-powered analysis
59+
- **Gemma 3 27B** (primary) with **Llama 3.3 70B** via Groq as fallback for AI-powered analysis
6060
- **Firebase** for authentication and scan history
6161
- **Vercel** for hosting (free tier)
6262

docs/src/content/docs/self-hosting/configuration.md

Lines changed: 22 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -7,39 +7,37 @@ description: Environment variables and configuration options for self-hosted ins
77

88
All configuration is done through environment variables in the `.env` file.
99

10-
| Variable | Required | Description |
11-
| ------------------ | -------- | ---------------------------------------------------- |
12-
| `GEMINI_API_KEY` | Yes | Google AI API key (used for Gemma 3 + Gemini models) |
13-
| `GROQ_API_KEY` | Optional | Groq API key (optional fallback) |
14-
| `CEREBRAS_API_KEY` | Optional | Cerebras API key (optional fallback) |
10+
| Variable | Required | Description |
11+
| ---------------- | ----------- | ---------------------------------------------- |
12+
| `GEMINI_API_KEY` | Yes | Google AI API key (powers Gemma 3 27B primary) |
13+
| `GROQ_API_KEY` | Recommended | Groq API key (Llama 3.3 70B fallback) |
1514

1615
:::caution
1716
Never commit your `.env` file to version control. It's already in `.gitignore`, but double-check before pushing.
1817
:::
1918

2019
## Provider Priority
2120

22-
The LLM fallback chain follows this order:
21+
The LLM fallback chain uses cross-provider redundancy so quota limits on one provider don't cascade:
2322

24-
1. **Gemma 3 27B** (primary, 14,400 RPD via `GEMINI_API_KEY`)
25-
2. **Gemini 2.5 Flash** (fallback, 20 RPD via `GEMINI_API_KEY`)
26-
3. **Gemini 2.5 Flash Lite** (fallback, 20 RPD via `GEMINI_API_KEY`)
27-
4. **Groq Llama 3.3 70B** (if `GROQ_API_KEY` is set)
28-
5. **Cerebras Llama 3.3 70B** (if `CEREBRAS_API_KEY` is set)
23+
1. **Gemma 3 27B** via Google (primary, `GEMINI_API_KEY`)
24+
2. **Llama 3.3 70B** via Groq (fallback, `GROQ_API_KEY`)
2925

30-
If a provider fails (timeout, rate limit, malformed response), the system automatically tries the next one. All Google models (Gemma + Gemini) use the same API key.
26+
If a provider fails (timeout, rate limit, malformed response), the system automatically tries the next one. Because each provider uses a separate API key, their quotas are completely independent.
3127

3228
## Free Tier Limits
3329

34-
| Provider | Model | RPM | RPD | Cost |
35-
| -------- | --------------- | --- | ------ | ---------------------- |
36-
| Gemma | 3 27B (primary) | 30 | 14,400 | Free (blocks at limit) |
37-
| Gemini | 2.5 Flash | 5 | 20 | Free (blocks at limit) |
38-
| Gemini | 2.5 Flash Lite | 10 | 20 | Free (blocks at limit) |
39-
| Groq | Llama 3.3 70B | 30 | 14,400 | Free |
40-
| Cerebras | Llama 3.3 70B | 30 | 1,000 | Free |
30+
| Provider | Model | RPM | RPD | TPM | Cost |
31+
| -------- | ------------- | ---- | ------ | --- | ---- |
32+
| Google | Gemma 3 27B | 30 | 14,400 | 15K | Free |
33+
| Groq | Llama 3.3 70B | 1000 | 14,400 | 12K | Free |
4134

42-
**Key detail about Google AI:** The free tier will **block** requests at the limit, never auto-charge. You cannot accidentally incur costs.
35+
Both providers block at their limits and never auto-charge. You cannot accidentally incur costs.
36+
37+
For the latest limits, see the official documentation:
38+
39+
- [Google AI rate limits](https://ai.google.dev/gemini-api/docs/rate-limits)
40+
- [Groq rate limits](https://console.groq.com/docs/rate-limits)
4341

4442
## Rate Limiting
4543

@@ -56,10 +54,11 @@ Adjust these values based on your expected traffic and API key limits.
5654

5755
## Timeouts
5856

59-
The default timeout for LLM requests is 60 seconds:
57+
Each provider has its own timeout to ensure the total worst-case fits within Vercel's 60s function limit:
6058

6159
```typescript
62-
const PROVIDER_TIMEOUT_MS = 60_000;
60+
// Gemma: 35s, Groq: 20s → worst case total: 55s
61+
const PROVIDER_TIMEOUTS_MS = [35_000, 20_000];
6362
```
6463

65-
Increase this if you're experiencing timeouts with longer resumes.
64+
The Vercel function `maxDuration` is set to 60 seconds. If both providers timeout, the system falls back to rule-based scoring on the client side.

docs/src/content/docs/self-hosting/deployment.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,7 @@ In the Vercel dashboard:
2424
1. Go to your project > **Settings** > **Environment Variables**
2525
2. Add your API keys:
2626
- `GEMINI_API_KEY` (required)
27-
- `GROQ_API_KEY` (optional fallback)
28-
- `CEREBRAS_API_KEY` (optional fallback)
27+
- `GROQ_API_KEY` (recommended fallback)
2928
3. Add your Firebase config (all `PUBLIC_FIREBASE_*` variables from `.env.example`)
3029

3130
:::tip

docs/src/content/docs/self-hosting/setup.md

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,20 +33,14 @@ cp .env.example .env
3333
2. Click "Create API Key"
3434
3. Add to `.env`: `GEMINI_API_KEY=your_key_here`
3535

36-
### Groq (Optional Fallback)
36+
### Groq (Recommended Fallback)
3737

3838
1. Go to [Groq Console](https://console.groq.com/keys)
3939
2. Create a new API key
4040
3. Add to `.env`: `GROQ_API_KEY=your_key_here`
4141

42-
### Cerebras (Optional Fallback)
43-
44-
1. Go to [Cerebras Cloud](https://cloud.cerebras.ai/)
45-
2. Generate an API key
46-
3. Add to `.env`: `CEREBRAS_API_KEY=your_key_here`
47-
4842
:::tip
49-
You only need the **Google AI (Gemini) API key** to run the app. It powers Gemma 3 27B (14,400 RPD) as the primary model plus Gemini models as fallbacks. Groq and Cerebras are optional for additional availability.
43+
You need the **Google AI API key** to run the app (Gemma 3 27B primary, 14,400 RPD). Adding a **Groq API key** is strongly recommended as it provides a completely independent fallback (Llama 3.3 70B, 14,400 RPD) so users never see failures during peak traffic.
5044
:::
5145

5246
## Run Locally

eslint.config.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,8 @@ export default ts.config(
4141
'playwright-report/',
4242
'test-results/',
4343
'docs/',
44-
'static/docs/'
44+
'static/docs/',
45+
'scripts/'
4546
]
4647
}
4748
);

scripts/test-providers.mjs

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
/**
2+
* dry run: tests each LLM provider matching the fallback chain in +server.ts
3+
* reads keys from .env, never logs them.
4+
*
5+
* usage: node scripts/test-providers.mjs
6+
*/
7+
8+
import { readFileSync } from 'fs';
9+
10+
const envFile = readFileSync('.env', 'utf-8');
11+
const envVars = Object.fromEntries(
12+
envFile
13+
.split('\n')
14+
.filter((l) => l && !l.startsWith('#'))
15+
.map((l) => {
16+
const eq = l.indexOf('=');
17+
return eq > 0 ? [l.slice(0, eq).trim(), l.slice(eq + 1).trim()] : null;
18+
})
19+
.filter(Boolean)
20+
);
21+
22+
const GEMINI_KEY = envVars.GEMINI_API_KEY;
23+
const GROQ_KEY = envVars.GROQ_API_KEY;
24+
25+
function extractJSON(raw) {
26+
const trimmed = raw.trim();
27+
try { return JSON.parse(trimmed); } catch {}
28+
const cleaned = trimmed.replace(/```json\n?|\n?```/g, '').trim();
29+
try { return JSON.parse(cleaned); } catch {}
30+
const s = cleaned.indexOf('{'), e = cleaned.lastIndexOf('}');
31+
if (s !== -1 && e > s) { try { return JSON.parse(cleaned.slice(s, e + 1)); } catch {} }
32+
return null;
33+
}
34+
35+
const SMALL_PROMPT = 'Return ONLY valid JSON: {"test": true, "score": 85}';
36+
37+
// ~6K token resume prompt matching real usage
38+
const BIG_RESUME = (
39+
'Experienced software engineer with expertise in distributed systems, cloud computing, and full-stack development. ' +
40+
'Built scalable microservices handling 10M+ requests per day using Go, Kubernetes, and AWS. Led team of 5 engineers. '
41+
).repeat(60);
42+
const BIG_PROMPT = `You are an ATS scoring engine. Analyze this resume against 6 ATS platforms (Workday, Taleo, iCIMS, Greenhouse, Lever, SuccessFactors). Return ONLY valid JSON with a "results" array containing objects with "system", "overallScore", and "passesFilter" fields. Resume: ${BIG_RESUME}`;
43+
44+
const PROVIDERS = [
45+
{
46+
name: 'gemma-3-27b (Google)',
47+
key: GEMINI_KEY,
48+
build: (prompt) => ({
49+
url: `https://generativelanguage.googleapis.com/v1beta/models/gemma-3-27b-it:generateContent?key=${GEMINI_KEY}`,
50+
opts: {
51+
method: 'POST',
52+
headers: { 'Content-Type': 'application/json' },
53+
body: JSON.stringify({
54+
contents: [{ parts: [{ text: prompt }] }],
55+
generationConfig: { temperature: 0.3, topP: 0.85, maxOutputTokens: 16384 }
56+
})
57+
}
58+
}),
59+
extract: (d) => d.candidates?.[0]?.content?.parts?.[0]?.text ?? ''
60+
},
61+
{
62+
name: 'llama-3.3-70b (Groq)',
63+
key: GROQ_KEY,
64+
build: (prompt) => ({
65+
url: 'https://api.groq.com/openai/v1/chat/completions',
66+
opts: {
67+
method: 'POST',
68+
headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${GROQ_KEY}` },
69+
body: JSON.stringify({
70+
model: 'llama-3.3-70b-versatile',
71+
messages: [{ role: 'user', content: prompt }],
72+
temperature: 0.3, top_p: 0.85, max_tokens: 16384,
73+
response_format: { type: 'json_object' }
74+
})
75+
}
76+
}),
77+
extract: (d) => d.choices?.[0]?.message?.content ?? ''
78+
},
79+
{
80+
name: 'gemini-2.5-flash (Google)',
81+
key: GEMINI_KEY,
82+
build: (prompt) => ({
83+
url: `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=${GEMINI_KEY}`,
84+
opts: {
85+
method: 'POST',
86+
headers: { 'Content-Type': 'application/json' },
87+
body: JSON.stringify({
88+
contents: [{ parts: [{ text: prompt }] }],
89+
generationConfig: { temperature: 0.3, topP: 0.85, maxOutputTokens: 16384, responseMimeType: 'application/json' }
90+
})
91+
}
92+
}),
93+
extract: (d) => d.candidates?.[0]?.content?.parts?.[0]?.text ?? ''
94+
}
95+
];
96+
97+
async function callProvider(provider, prompt, timeoutMs = 30000) {
98+
if (!provider.key) return { status: 'SKIP', ms: 0, detail: 'no key' };
99+
100+
const { url, opts } = provider.build(prompt);
101+
const t = performance.now();
102+
try {
103+
const ctrl = new AbortController();
104+
const timer = setTimeout(() => ctrl.abort(), timeoutMs);
105+
const res = await fetch(url, { ...opts, signal: ctrl.signal });
106+
clearTimeout(timer);
107+
const ms = Math.round(performance.now() - t);
108+
109+
if (!res.ok) {
110+
const err = await res.text().catch(() => '');
111+
return { status: 'HTTP_ERR', ms, httpStatus: res.status, detail: err.slice(0, 150) };
112+
}
113+
114+
const data = await res.json();
115+
const text = provider.extract(data);
116+
if (!text) return { status: 'EMPTY', ms };
117+
118+
const parsed = extractJSON(text);
119+
if (!parsed || typeof parsed !== 'object') return { status: 'BAD_JSON', ms, detail: text.slice(0, 150) };
120+
121+
return { status: 'OK', ms, keys: Object.keys(parsed).slice(0, 5) };
122+
} catch (err) {
123+
const ms = Math.round(performance.now() - t);
124+
const isTimeout = err.name === 'AbortError';
125+
return { status: isTimeout ? 'TIMEOUT' : 'ERROR', ms, detail: err.message };
126+
}
127+
}
128+
129+
function log(name, r) {
130+
const tag = r.status === 'OK' ? 'OK' : r.status === 'SKIP' ? 'SKIP' : 'FAIL';
131+
const info = r.status === 'OK' ? `keys: [${r.keys}]` : (r.detail || r.httpStatus || '');
132+
console.log(` ${tag.padEnd(4)} ${name.padEnd(28)} ${String(r.ms).padStart(5)}ms ${info}`);
133+
}
134+
135+
console.log('=== test 1: small prompt (connectivity) ===\n');
136+
for (const p of PROVIDERS) log(p.name, await callProvider(p, SMALL_PROMPT));
137+
138+
console.log('\n=== test 2: large prompt (~6K tokens, realistic resume) ===\n');
139+
console.log(` prompt size: ${BIG_PROMPT.length} chars (~${Math.round(BIG_PROMPT.length / 4)} tokens)\n`);
140+
for (const p of PROVIDERS) log(p.name, await callProvider(p, BIG_PROMPT, 45000));
141+
142+
console.log('\n=== test 3: fallback chain simulation ===\n');
143+
let resolved = false;
144+
for (const p of PROVIDERS) {
145+
const r = await callProvider(p, BIG_PROMPT, 45000);
146+
if (r.status === 'OK') {
147+
console.log(` resolved: ${p.name} (${r.ms}ms)`);
148+
resolved = true;
149+
break;
150+
}
151+
console.log(` ${p.name}: ${r.status} (${r.ms}ms) → next`);
152+
}
153+
if (!resolved) console.log(' ALL FAILED → 503');
154+
155+
console.log('\n=== done ===');

src/routes/about/+page.svelte

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,8 @@
219219
<div class="tech-card">
220220
<h4>AI</h4>
221221
<ul>
222-
<li>Google Gemini 2.5 Flash-Lite (primary)</li>
222+
<li>Gemma 3 27B via Google (primary)</li>
223+
<li>Llama 3.3 70B via Groq (fallback)</li>
223224
<li>Rule-based fallback engine</li>
224225
<li>TF-IDF keyword matching</li>
225226
<li>Skills taxonomy (500+ terms)</li>

0 commit comments

Comments
 (0)