Skip to content

Commit 569f472

Browse files
committed
feat(api): added Groq Llama 3.3 70B as cross-provider fallback
- replaced Google Flash/Flash-Lite fallbacks with Groq (independent quota, 14,400 RPD) - moved JSON validation inside retry loop so bad responses trigger next provider - per-provider timeouts [35s, 20s] fit within Vercel's 60s maxDuration - removed all Cerebras references from docs and .env.example - updated about page, README, and all Starlight docs with new provider chain - added Google/Groq rate limit doc links where RPM/RPD/TPM tables appear - added scripts/test-providers.mjs for dry-run provider testing
1 parent 89a6d62 commit 569f472

File tree

10 files changed

+234
-56
lines changed

10 files changed

+234
-56
lines changed

.env.example

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
# Get yours at https://aistudio.google.com/apikey
33
GEMINI_API_KEY=
44

5-
# Optional fallback providers (for self-hosting)
5+
# Groq API key (recommended fallback, Llama 3.3 70B)
6+
# Get yours at https://console.groq.com/keys
67
# GROQ_API_KEY=
7-
# CEREBRAS_API_KEY=
88

99
# Firebase configuration (required)
1010
# Get these from Firebase Console → Project Settings → Your Apps

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ Each profile is based on research into the platform's documented parsing and mat
8181
| **PDF Parsing** | pdfjs-dist (Web Worker) | Mozilla-maintained, fully client-side. |
8282
| **DOCX Parsing** | mammoth | Client-side Word to text extraction. |
8383
| **NLP** | Custom TF-IDF + tokenizer + skills taxonomy | Lightweight, browser-native, supports 8+ industries. |
84-
| **LLM** | Gemma 3 27B (primary), Gemini 2.5 Flash (fallback) | 14,400 RPD free tier via Google Generative Language API. Groq + Cerebras available for self-host. |
84+
| **LLM** | Gemma 3 27B (primary), Llama 3.3 70B via Groq (fallback) | Cross-provider fallback: Google (14,400 RPD) + Groq (14,400 RPD) on independent free tiers. |
8585
| **Auth** | Firebase Authentication | Google + email/password sign-in. Free Spark plan. |
8686
| **Storage** | Cloud Firestore | Scan history per user. Free Spark plan. |
8787
| **Hosting** | Vercel | Free hobby tier. Edge functions for API. |

docs/src/content/docs/api/rate-limits.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,16 @@ When you receive a `429` response:
5050

5151
When self-hosting, rate limits are configurable. The actual bottleneck becomes your LLM provider's free tier:
5252

53-
| Provider | Model | RPM | RPD |
54-
| -------- | --------------- | --- | ------ |
55-
| Gemma | 3 27B (primary) | 30 | 14,400 |
56-
| Gemini | 2.5 Flash | 5 | 20 |
57-
| Gemini | 2.5 Flash Lite | 10 | 20 |
58-
| Groq | Llama 3.3 70B | 30 | 14,400 |
59-
| Cerebras | Llama 3.3 70B | 30 | 1,000 |
53+
| Provider | Model | RPM | RPD | TPM |
54+
| -------- | ------------- | ---- | ------ | --- |
55+
| Google | Gemma 3 27B | 30 | 14,400 | 15K |
56+
| Groq | Llama 3.3 70B | 1000 | 14,400 | 12K |
57+
58+
For the latest limits, see the official documentation:
59+
60+
- [Google AI rate limits](https://ai.google.dev/gemini-api/docs/rate-limits)
61+
- [Groq rate limits](https://console.groq.com/docs/rate-limits)
6062

6163
:::tip
62-
The hosted version uses Gemma 3 27B as the primary model (14,400 RPD), giving roughly 14,000+ scans per day on the free tier. Groq and Cerebras are available as optional fallbacks for self-hosted instances.
64+
The hosted version uses Gemma 3 27B as the primary model with Llama 3.3 70B via Groq as fallback. Both run on independent free tiers with 14,400 RPD each, giving roughly 28,000+ potential scans per day.
6365
:::

docs/src/content/docs/getting-started/introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Built with performance and privacy in mind:
5656
- **SvelteKit 5** with Svelte 5 runes for the frontend
5757
- **pdfjs-dist** (Web Worker) for client-side PDF parsing
5858
- **mammoth** for client-side DOCX parsing
59-
- **Gemma 3 27B** (primary) with Gemini fallbacks for AI-powered analysis
59+
- **Gemma 3 27B** (primary) with **Llama 3.3 70B** via Groq as fallback for AI-powered analysis
6060
- **Firebase** for authentication and scan history
6161
- **Vercel** for hosting (free tier)
6262

docs/src/content/docs/self-hosting/configuration.md

Lines changed: 22 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -7,39 +7,37 @@ description: Environment variables and configuration options for self-hosted ins
77

88
All configuration is done through environment variables in the `.env` file.
99

10-
| Variable | Required | Description |
11-
| ------------------ | -------- | ---------------------------------------------------- |
12-
| `GEMINI_API_KEY` | Yes | Google AI API key (used for Gemma 3 + Gemini models) |
13-
| `GROQ_API_KEY` | Optional | Groq API key (optional fallback) |
14-
| `CEREBRAS_API_KEY` | Optional | Cerebras API key (optional fallback) |
10+
| Variable | Required | Description |
11+
| ---------------- | ----------- | ---------------------------------------------- |
12+
| `GEMINI_API_KEY` | Yes | Google AI API key (powers Gemma 3 27B primary) |
13+
| `GROQ_API_KEY` | Recommended | Groq API key (Llama 3.3 70B fallback) |
1514

1615
:::caution
1716
Never commit your `.env` file to version control. It's already in `.gitignore`, but double-check before pushing.
1817
:::
1918

2019
## Provider Priority
2120

22-
The LLM fallback chain follows this order:
21+
The LLM fallback chain uses cross-provider redundancy so quota limits on one provider don't cascade:
2322

24-
1. **Gemma 3 27B** (primary, 14,400 RPD via `GEMINI_API_KEY`)
25-
2. **Gemini 2.5 Flash** (fallback, 20 RPD via `GEMINI_API_KEY`)
26-
3. **Gemini 2.5 Flash Lite** (fallback, 20 RPD via `GEMINI_API_KEY`)
27-
4. **Groq Llama 3.3 70B** (if `GROQ_API_KEY` is set)
28-
5. **Cerebras Llama 3.3 70B** (if `CEREBRAS_API_KEY` is set)
23+
1. **Gemma 3 27B** via Google (primary, `GEMINI_API_KEY`)
24+
2. **Llama 3.3 70B** via Groq (fallback, `GROQ_API_KEY`)
2925

30-
If a provider fails (timeout, rate limit, malformed response), the system automatically tries the next one. All Google models (Gemma + Gemini) use the same API key.
26+
If a provider fails (timeout, rate limit, malformed response), the system automatically tries the next one. Because each provider uses a separate API key, their quotas are completely independent.
3127

3228
## Free Tier Limits
3329

34-
| Provider | Model | RPM | RPD | Cost |
35-
| -------- | --------------- | --- | ------ | ---------------------- |
36-
| Gemma | 3 27B (primary) | 30 | 14,400 | Free (blocks at limit) |
37-
| Gemini | 2.5 Flash | 5 | 20 | Free (blocks at limit) |
38-
| Gemini | 2.5 Flash Lite | 10 | 20 | Free (blocks at limit) |
39-
| Groq | Llama 3.3 70B | 30 | 14,400 | Free |
40-
| Cerebras | Llama 3.3 70B | 30 | 1,000 | Free |
30+
| Provider | Model | RPM | RPD | TPM | Cost |
31+
| -------- | ------------- | ---- | ------ | --- | ---- |
32+
| Google | Gemma 3 27B | 30 | 14,400 | 15K | Free |
33+
| Groq | Llama 3.3 70B | 1000 | 14,400 | 12K | Free |
4134

42-
**Key detail about Google AI:** The free tier will **block** requests at the limit, never auto-charge. You cannot accidentally incur costs.
35+
Both providers block at their limits and never auto-charge. You cannot accidentally incur costs.
36+
37+
For the latest limits, see the official documentation:
38+
39+
- [Google AI rate limits](https://ai.google.dev/gemini-api/docs/rate-limits)
40+
- [Groq rate limits](https://console.groq.com/docs/rate-limits)
4341

4442
## Rate Limiting
4543

@@ -56,10 +54,11 @@ Adjust these values based on your expected traffic and API key limits.
5654

5755
## Timeouts
5856

59-
The default timeout for LLM requests is 60 seconds:
57+
Each provider has its own timeout to ensure the total worst-case fits within Vercel's 60s function limit:
6058

6159
```typescript
62-
const PROVIDER_TIMEOUT_MS = 60_000;
60+
// Gemma: 35s, Groq: 20s → worst case total: 55s
61+
const PROVIDER_TIMEOUTS_MS = [35_000, 20_000];
6362
```
6463

65-
Increase this if you're experiencing timeouts with longer resumes.
64+
The Vercel function `maxDuration` is set to 60 seconds. If both providers timeout, the system falls back to rule-based scoring on the client side.

docs/src/content/docs/self-hosting/deployment.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,7 @@ In the Vercel dashboard:
2424
1. Go to your project > **Settings** > **Environment Variables**
2525
2. Add your API keys:
2626
- `GEMINI_API_KEY` (required)
27-
- `GROQ_API_KEY` (optional fallback)
28-
- `CEREBRAS_API_KEY` (optional fallback)
27+
- `GROQ_API_KEY` (recommended fallback)
2928
3. Add your Firebase config (all `PUBLIC_FIREBASE_*` variables from `.env.example`)
3029

3130
:::tip

docs/src/content/docs/self-hosting/setup.md

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,20 +33,14 @@ cp .env.example .env
3333
2. Click "Create API Key"
3434
3. Add to `.env`: `GEMINI_API_KEY=your_key_here`
3535

36-
### Groq (Optional Fallback)
36+
### Groq (Recommended Fallback)
3737

3838
1. Go to [Groq Console](https://console.groq.com/keys)
3939
2. Create a new API key
4040
3. Add to `.env`: `GROQ_API_KEY=your_key_here`
4141

42-
### Cerebras (Optional Fallback)
43-
44-
1. Go to [Cerebras Cloud](https://cloud.cerebras.ai/)
45-
2. Generate an API key
46-
3. Add to `.env`: `CEREBRAS_API_KEY=your_key_here`
47-
4842
:::tip
49-
You only need the **Google AI (Gemini) API key** to run the app. It powers Gemma 3 27B (14,400 RPD) as the primary model plus Gemini models as fallbacks. Groq and Cerebras are optional for additional availability.
43+
You need the **Google AI API key** to run the app (Gemma 3 27B primary, 14,400 RPD). Adding a **Groq API key** is strongly recommended as it provides a completely independent fallback (Llama 3.3 70B, 14,400 RPD) so users never see failures during peak traffic.
5044
:::
5145

5246
## Run Locally

scripts/test-providers.mjs

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
/**
2+
* dry run: tests each LLM provider matching the fallback chain in +server.ts
3+
* reads keys from .env, never logs them.
4+
*
5+
* usage: node scripts/test-providers.mjs
6+
*/
7+
8+
import { readFileSync } from 'fs';
9+
10+
const envFile = readFileSync('.env', 'utf-8');
11+
const envVars = Object.fromEntries(
12+
envFile
13+
.split('\n')
14+
.filter((l) => l && !l.startsWith('#'))
15+
.map((l) => {
16+
const eq = l.indexOf('=');
17+
return eq > 0 ? [l.slice(0, eq).trim(), l.slice(eq + 1).trim()] : null;
18+
})
19+
.filter(Boolean)
20+
);
21+
22+
const GEMINI_KEY = envVars.GEMINI_API_KEY;
23+
const GROQ_KEY = envVars.GROQ_API_KEY;
24+
25+
function extractJSON(raw) {
26+
const trimmed = raw.trim();
27+
try { return JSON.parse(trimmed); } catch {}
28+
const cleaned = trimmed.replace(/```json\n?|\n?```/g, '').trim();
29+
try { return JSON.parse(cleaned); } catch {}
30+
const s = cleaned.indexOf('{'), e = cleaned.lastIndexOf('}');
31+
if (s !== -1 && e > s) { try { return JSON.parse(cleaned.slice(s, e + 1)); } catch {} }
32+
return null;
33+
}
34+
35+
const SMALL_PROMPT = 'Return ONLY valid JSON: {"test": true, "score": 85}';
36+
37+
// ~6K token resume prompt matching real usage
38+
const BIG_RESUME = (
39+
'Experienced software engineer with expertise in distributed systems, cloud computing, and full-stack development. ' +
40+
'Built scalable microservices handling 10M+ requests per day using Go, Kubernetes, and AWS. Led team of 5 engineers. '
41+
).repeat(60);
42+
const BIG_PROMPT = `You are an ATS scoring engine. Analyze this resume against 6 ATS platforms (Workday, Taleo, iCIMS, Greenhouse, Lever, SuccessFactors). Return ONLY valid JSON with a "results" array containing objects with "system", "overallScore", and "passesFilter" fields. Resume: ${BIG_RESUME}`;
43+
44+
const PROVIDERS = [
45+
{
46+
name: 'gemma-3-27b (Google)',
47+
key: GEMINI_KEY,
48+
build: (prompt) => ({
49+
url: `https://generativelanguage.googleapis.com/v1beta/models/gemma-3-27b-it:generateContent?key=${GEMINI_KEY}`,
50+
opts: {
51+
method: 'POST',
52+
headers: { 'Content-Type': 'application/json' },
53+
body: JSON.stringify({
54+
contents: [{ parts: [{ text: prompt }] }],
55+
generationConfig: { temperature: 0.3, topP: 0.85, maxOutputTokens: 16384 }
56+
})
57+
}
58+
}),
59+
extract: (d) => d.candidates?.[0]?.content?.parts?.[0]?.text ?? ''
60+
},
61+
{
62+
name: 'llama-3.3-70b (Groq)',
63+
key: GROQ_KEY,
64+
build: (prompt) => ({
65+
url: 'https://api.groq.com/openai/v1/chat/completions',
66+
opts: {
67+
method: 'POST',
68+
headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${GROQ_KEY}` },
69+
body: JSON.stringify({
70+
model: 'llama-3.3-70b-versatile',
71+
messages: [{ role: 'user', content: prompt }],
72+
temperature: 0.3, top_p: 0.85, max_tokens: 16384,
73+
response_format: { type: 'json_object' }
74+
})
75+
}
76+
}),
77+
extract: (d) => d.choices?.[0]?.message?.content ?? ''
78+
},
79+
{
80+
name: 'gemini-2.5-flash (Google)',
81+
key: GEMINI_KEY,
82+
build: (prompt) => ({
83+
url: `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=${GEMINI_KEY}`,
84+
opts: {
85+
method: 'POST',
86+
headers: { 'Content-Type': 'application/json' },
87+
body: JSON.stringify({
88+
contents: [{ parts: [{ text: prompt }] }],
89+
generationConfig: { temperature: 0.3, topP: 0.85, maxOutputTokens: 16384, responseMimeType: 'application/json' }
90+
})
91+
}
92+
}),
93+
extract: (d) => d.candidates?.[0]?.content?.parts?.[0]?.text ?? ''
94+
}
95+
];
96+
97+
async function callProvider(provider, prompt, timeoutMs = 30000) {
98+
if (!provider.key) return { status: 'SKIP', ms: 0, detail: 'no key' };
99+
100+
const { url, opts } = provider.build(prompt);
101+
const t = performance.now();
102+
try {
103+
const ctrl = new AbortController();
104+
const timer = setTimeout(() => ctrl.abort(), timeoutMs);
105+
const res = await fetch(url, { ...opts, signal: ctrl.signal });
106+
clearTimeout(timer);
107+
const ms = Math.round(performance.now() - t);
108+
109+
if (!res.ok) {
110+
const err = await res.text().catch(() => '');
111+
return { status: 'HTTP_ERR', ms, httpStatus: res.status, detail: err.slice(0, 150) };
112+
}
113+
114+
const data = await res.json();
115+
const text = provider.extract(data);
116+
if (!text) return { status: 'EMPTY', ms };
117+
118+
const parsed = extractJSON(text);
119+
if (!parsed || typeof parsed !== 'object') return { status: 'BAD_JSON', ms, detail: text.slice(0, 150) };
120+
121+
return { status: 'OK', ms, keys: Object.keys(parsed).slice(0, 5) };
122+
} catch (err) {
123+
const ms = Math.round(performance.now() - t);
124+
const isTimeout = err.name === 'AbortError';
125+
return { status: isTimeout ? 'TIMEOUT' : 'ERROR', ms, detail: err.message };
126+
}
127+
}
128+
129+
function log(name, r) {
130+
const tag = r.status === 'OK' ? 'OK' : r.status === 'SKIP' ? 'SKIP' : 'FAIL';
131+
const info = r.status === 'OK' ? `keys: [${r.keys}]` : (r.detail || r.httpStatus || '');
132+
console.log(` ${tag.padEnd(4)} ${name.padEnd(28)} ${String(r.ms).padStart(5)}ms ${info}`);
133+
}
134+
135+
console.log('=== test 1: small prompt (connectivity) ===\n');
136+
for (const p of PROVIDERS) log(p.name, await callProvider(p, SMALL_PROMPT));
137+
138+
console.log('\n=== test 2: large prompt (~6K tokens, realistic resume) ===\n');
139+
console.log(` prompt size: ${BIG_PROMPT.length} chars (~${Math.round(BIG_PROMPT.length / 4)} tokens)\n`);
140+
for (const p of PROVIDERS) log(p.name, await callProvider(p, BIG_PROMPT, 45000));
141+
142+
console.log('\n=== test 3: fallback chain simulation ===\n');
143+
let resolved = false;
144+
for (const p of PROVIDERS) {
145+
const r = await callProvider(p, BIG_PROMPT, 45000);
146+
if (r.status === 'OK') {
147+
console.log(` resolved: ${p.name} (${r.ms}ms)`);
148+
resolved = true;
149+
break;
150+
}
151+
console.log(` ${p.name}: ${r.status} (${r.ms}ms) → next`);
152+
}
153+
if (!resolved) console.log(' ALL FAILED → 503');
154+
155+
console.log('\n=== done ===');

src/routes/about/+page.svelte

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,8 @@
219219
<div class="tech-card">
220220
<h4>AI</h4>
221221
<ul>
222-
<li>Google Gemini 2.5 Flash-Lite (primary)</li>
222+
<li>Gemma 3 27B via Google (primary)</li>
223+
<li>Llama 3.3 70B via Groq (fallback)</li>
223224
<li>Rule-based fallback engine</li>
224225
<li>TF-IDF keyword matching</li>
225226
<li>Skills taxonomy (500+ terms)</li>

0 commit comments

Comments
 (0)