CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Remote EU job board aggregator. Next.js 16 frontend + GraphQL API backed by Cloudflare D1 (SQLite), with an AI/ML pipeline for job classification, skill extraction, and resume matching.

Common commands

# Dev
pnpm dev                          # Start Next.js dev server (localhost:3000)
pnpm build                        # Production build
pnpm lint                         # ESLint (next lint)

# GraphQL codegen — run after modifying any schema/**/*.graphql file
pnpm codegen

# Database
pnpm db:generate                  # Generate Drizzle migration files
pnpm db:migrate                   # Apply locally with Drizzle Kit
pnpm db:push                      # Apply migrations to remote D1
pnpm db:studio                    # Drizzle Studio

# Evaluation
pnpm eval:langfuse                # Run LLM classification eval with Langfuse Datasets

# Strategy enforcement
pnpm strategy:check               # Validate staged changes against optimization strategy
pnpm strategy:check:all           # Validate all tracked files

# Scripts
pnpm jobs:ingest                  # Ingest jobs from ATS platforms
pnpm jobs:enhance                 # Enhance all jobs with ATS data
pnpm jobs:status                  # Check ingestion status
pnpm jobs:extract-skills          # Extract skills during ingestion
pnpm skills:extract               # Extract skills from jobs
pnpm skills:seed                  # Seed skill taxonomy
pnpm boards:discover              # Discover Ashby boards
pnpm janitor:trigger              # Manually trigger janitor worker

# Workers
wrangler deploy --config wrangler.d1-gateway.toml   # Deploy D1 gateway
wrangler tail --config wrangler.d1-gateway.toml     # Stream gateway logs
cd workers/ashby-crawler && wrangler dev             # Ashby crawler local dev

# Deployment
pnpm deploy                       # Vercel deploy (runs scripts/deploy.ts)
wrangler deploy --config workers/ashby-crawler/wrangler.toml  # Ashby crawler

Architecture

Database access

Production: Next.js (Vercel) → D1 Gateway Worker (CF) → D1 Database (binding) Dev fallback: Next.js → Cloudflare REST API → D1 Database

The D1 Gateway Worker (workers/d1-gateway.ts) supports batched queries — prefer batching for multi-query operations. Authenticated via API_KEY secret. Database schema is in src/db/schema.ts (Drizzle ORM), migrations in migrations/.

Data flow

1. Discovery:      ATS Sources (D1) --[Cron Worker]--> Trigger Ingestion
2. Board Crawl:    Common Crawl CDX --[ashby-crawler (Rust)]--> Ashby boards → D1
3. Ingestion:      ATS APIs (Greenhouse/Lever/Ashby) --[Insert Worker]--> D1
4. Enhancement:    Job IDs --[Trigger.dev / GraphQL Mutation]--> ATS API --> D1
5. Classification: Unprocessed jobs --[process-jobs (Python) / DeepSeek]--> is_remote_eu --> D1
6. Skill Extract:  Job descriptions --[LLM pipeline]--> Skills → D1
7. Resume Match:   Resumes --[resume-rag (Python) / Vectorize]--> Vector search
8. Serving:        Browser --[Apollo Client]--> /api/graphql --[D1 HTTP]--> Gateway --> D1
9. Evaluation:     Langfuse / Vitest --[LLM calls]--> Accuracy scores

GraphQL codegen

Configuration in codegen.ts. Generates from schema/**/*.graphql into src/__generated__/:

Client preset (typed gql function, fragment masking)
hooks.tsx — React Apollo hooks
types.ts — TypeScript types (strict scalars)
resolvers-types.ts — Resolver types with GraphQLContext

Custom scalars: DateTime/URL/EmailAddress → string, Upload → File, JSON → any.

Workers

Worker	Config	Runtime	Key details
`janitor`	`wrangler.toml`	TypeScript	Daily midnight UTC, triggers ATS ingestion
`d1-gateway`	`wrangler.d1-gateway.toml`	TypeScript	On-demand HTTP, D1 binding
`insert-jobs`	`wrangler.insert-jobs.toml`	TypeScript	Queue-based, still uses Turso (legacy)
`process-jobs`	`workers/process-jobs/wrangler.jsonc`	Python/LangGraph	Every 6h + queue, DeepSeek classification
`ashby-crawler`	`workers/ashby-crawler/wrangler.toml`	Rust/WASM	Common Crawl → D1, rig_compat module
`resume-rag`	`workers/resume-rag/wrangler.jsonc`	Python	Vectorize + Workers AI

API routes

Route	Purpose
`/api/graphql`	Apollo Server GraphQL endpoint (main API)
`/api/text-to-sql`	Natural language → SQL query
`/api/enhance-greenhouse-jobs`	Trigger Greenhouse job enhancement
`/api/companies/bulk-import`	Bulk import companies
`/api/companies/enhance`	Enhance company data

GraphQL Playground: http://localhost:3000/api/graphql. Vercel routes have 60s max duration (vercel.json).

Tech stack

Layer	Technology
Frontend	Next.js 16, React 19, App Router
Language	TypeScript 5.9
Database	Cloudflare D1 (SQLite) via D1 Gateway Worker
ORM	Drizzle ORM
API	Apollo Server 5 (GraphQL)
Auth	Clerk
AI/ML	Vercel AI SDK, Anthropic Claude (+ Agent SDK), DeepSeek, OpenRouter
Background jobs	Trigger.dev, Cloudflare Workers (cron + queues)
Observability	Langfuse, LangSmith, OpenTelemetry (partially active)
Evaluation	Langfuse, Vitest
Deployment	Vercel (app), Cloudflare Workers (workers)
Package manager	pnpm 10.10
UI	Radix UI (Themes + Icons)

Key structural patterns

GraphQL schema lives in schema/ (by domain: base/, jobs/, companies/, applications/, prompts/). Query/mutation/fragment documents are in src/graphql/.
Resolvers are in src/apollo/resolvers/ — job resolvers in src/apollo/resolvers/job/.
ATS ingestion fetchers: src/ingestion/{greenhouse,lever,ashby}.ts — primary job discovery channel.
Skills subsystem: src/lib/skills/ — taxonomy, extraction, vector ops, filtering.
AI agents: src/agents/ (Vercel AI SDK — SQL, admin, strategy enforcer), src/anthropic/ (Claude client, MCP, sub-agents, architect).
Database tools for agents: src/tools/database/ (introspection + SQL execution).
Rust worker: workers/ashby-crawler/src/lib.rs — rig_compat module implements VectorStore, Pipeline, Tool patterns for WASM (swap to rig::* when rig-core ships wasm32 support).

Optimization Strategy (Two-Layer Model)

See OPTIMIZATION-STRATEGY.md for the full strategy document. Key constraints:

Meta Approach	Status	What It Guarantees
Eval-First	PRIMARY	Every prompt/model change tested against >= 80% accuracy bar
Grounding-First	PRIMARY	LLM outputs schema-constrained; skills validated against taxonomy
Multi-Model Routing	SECONDARY	Cheap model first (Workers AI), escalate on low confidence only
Spec-Driven	CROSS-CUTTING	GraphQL + Drizzle + Zod schemas as formal contracts
Observability	EMERGING	Langfuse scoring active; production tracing partial

The strategy enforcer (src/agents/strategy-enforcer.ts) is available as a plain async function.

Coding conventions

Files: kebab-case (jobs-query.ts). Components: PascalCase (JobsSearchBar.tsx).
DB columns: snake_case. GraphQL fields: camelCase. Variables: camelCase.
Path alias: @/* maps to ./src/* (tsconfig.json).
Module type: ES Modules ("type": "module").
Use Drizzle ORM for all DB queries — no raw SQL strings.
Mutations that modify production data must include isAdminEmail() guard (from src/lib/admin.ts).
Prefer generated types from src/__generated__/resolvers-types.ts over any.
React providers: *-provider.tsx in src/components/.
Bash scripts: Never create .sh files or bash scripts in repository. Use bash tool for simple commands only (e.g., git status, npm run build). Complex operations should use Task tool with agents.

Environment variables

Copy .env.example to .env.local. Key groups: D1 Gateway (or Cloudflare REST API for dev), Clerk auth, AI provider keys (Anthropic, DeepSeek, OpenAI, Gemini), Langfuse/LangSmith observability, admin email, app URL. See .env.example for full list.

Known issues

Performance

Full table scan in src/apollo/resolvers/job/enhance-job.ts — fetches all jobs to find one by external_id.
N+1 queries for skills, company, and ATS board sub-fields — no DataLoader.

Security

CORS on D1 Gateway is *.
No GraphQL query complexity/depth limiting.

Type safety

ignoreBuildErrors: true in next.config.ts masks TS errors in builds.
283+ any types in resolvers.

Dead code

scripts/ingest-jobs.ts still documents Turso env vars in its help text (stale).
@libsql/client and pg are likely unused after D1 migration — can be removed from dependencies.

Dependencies

@ai-sdk/anthropic pinned to "latest" — should use specific version.
@libsql/client and pg are likely unused after D1 migration.

ashby-crawler (Rust/WASM) reference

cargo install worker-build                    # Install WASM build tool (once)
cd workers/ashby-crawler && wrangler dev      # Local dev
wrangler deploy --config workers/ashby-crawler/wrangler.toml  # Deploy
wrangler d1 execute nomadically-work-db --remote \
  --file workers/ashby-crawler/migrations/0001_init/up.sql    # Apply migrations

Key endpoints: /crawl (paginated CC crawl), /boards (list/search), /search (TF-IDF vector search), /enrich / /enrich-all (enrichment pipeline), /tools (OpenAI function-calling schemas), /indexes, /progress, /stats.

Spec-Driven Development (SDD)

Uses agent-teams-lite for spec-driven development. The orchestrator (.claude/commands/sdd.md) delegates work to specialized sub-agents via the Task tool, using native agent teams (TeamCreate, shared task lists, messaging) for parallel phases.

Structure

Path	Contents
`.claude/commands/sdd.md`	SDD orchestrator — routing, DAG detection, team/subagent dispatch
`.claude/skills/sdd-*/SKILL.md`	9 SDD sub-agent skill files (explore, propose, spec, design, tasks, apply, verify, archive, init)
`.claude/skills/improve-*/SKILL.md`	6 job search self-improvement skills (mine, audit, evolve, apply, verify, meta)
`.claude/skills/codefix-*/SKILL.md`	6 codebase self-improvement skills (mine, audit, evolve, apply, verify, meta)
`openspec/`	Specs, change proposals, designs, and task breakdowns (created by `sdd-init`)
`.claude/commands/`	Project-specific commands (build-and-push, gql-agent, improve, codefix, sdd)

SDD Commands

Command	Action	Mode
`/sdd:init`	Bootstrap `openspec/` in current project	single subagent
`/sdd:explore <topic>`	Investigate an idea (no files created)	single subagent
`/sdd:new <change-name>`	Start a new change (creates proposal)	single subagent
`/sdd:continue`	Create next artifact in dependency chain	auto-detect
`/sdd:ff <change-name>`	Fast-forward: create all planning artifacts	agent team (specs+design parallel)
`/sdd:apply`	Implement tasks	agent team if multi-phase
`/sdd:verify`	Validate implementation against specs	single subagent
`/sdd:archive`	Sync specs + archive completed change	single subagent

Orchestrator Rules

The lead agent NEVER executes phase work inline — always delegate to sub-agents
Use subagents for single sequential phases (explore, propose, verify, archive)
Use native agent teams when specs+design or multi-phase apply can parallelize
Between phases, show the user what was done and ask to proceed
Keep orchestrator context minimal — pass file paths, not file contents
Require plan approval for apply teammates (they write code)
Do NOT force SDD on small tasks (single file edits, quick fixes, questions)

Dependency Graph

proposal → specs ──→ tasks → apply → verify → archive
              ↕
           design

Specs and design run in parallel (agent team in /sdd:ff); tasks depends on both; verify is optional but recommended before archive.

Autonomous Self-Improvement Team

Goal-driven team of 6 specialists focused on helping find a fully remote EU AI engineering role. Grounded in autonomous agent research (AutoRefine, Meta Context Engineering, CASTER, ROMA, Phase Transition theory).

Structure

Path	Agent	Mission
`.claude/skills/improve-mine/SKILL.md`	Pipeline Monitor	Is the pipeline healthy? Are AI jobs flowing?
`.claude/skills/improve-audit/SKILL.md`	Discovery Expander	Find more companies hiring AI engineers remotely in EU
`.claude/skills/improve-evolve/SKILL.md`	Classifier Tuner	Reduce missed opportunities in remote EU classification
`.claude/skills/improve-apply/SKILL.md`	Skill Optimizer	Better AI/ML skill taxonomy, extraction, and matching
`.claude/skills/improve-verify/SKILL.md`	Application Coach	Learn from application patterns, improve interview prep
`.claude/skills/improve-meta/SKILL.md`	Strategy Brain	Coordinate toward the goal: get hired
`.claude/commands/improve.md`	Orchestrator	Entry point and team coordination

Commands

Command	Action
`/improve`	Full autonomous cycle (Strategy Brain decides what to do)
`/improve status`	Pipeline health check
`/improve discover`	Find new AI engineering job sources
`/improve classify`	Tune classification accuracy
`/improve skills`	Optimize AI/ML skill matching
`/improve coach`	Application coaching and prep improvement

Goal Phases

Phase	Focus	Trigger
BUILDING	Discovery + classification	< 5 AI jobs/week
OPTIMIZING	Classifier + skills	Jobs flowing but low relevance
APPLYING	Coaching + prep	Good jobs surfacing, need to convert
INTERVIEWING	Deep prep + research	Applications converting to interviews

Integration

stop_hook.py + improvement_agent.py → session scoring and learning
Langfuse → observability and score trends
Strategy Enforcer → validates changes align with optimization strategy
State files in ~/.claude/state/ → continuity across sessions

Autonomous Codebase Self-Improvement Team

Separate team focused purely on code quality, performance, type safety, security, and dead code — independent of business goals.

Structure

Path	Agent	Role
`.claude/skills/codefix-mine/SKILL.md`	Trajectory Miner	Mine session transcripts for code quality patterns
`.claude/skills/codefix-audit/SKILL.md`	Codebase Auditor	Deep code investigation with file:line findings
`.claude/skills/codefix-evolve/SKILL.md`	Skill Evolver	Improve skills, prompts, CLAUDE.md
`.claude/skills/codefix-apply/SKILL.md`	Code Improver	Implement fixes (perf, types, security, dead code)
`.claude/skills/codefix-verify/SKILL.md`	Verification Gate	Validate changes, run builds, catch regressions
`.claude/skills/codefix-meta/SKILL.md`	Meta-Optimizer	Coordinate, prioritize, track progress
`.claude/commands/codefix.md`	Orchestrator	Entry point

Commands

Command	Action
`/codefix`	Full autonomous cycle
`/codefix audit [target]`	Targeted audit (resolvers, workers, security, types, etc.)
`/codefix apply`	Implement pending findings
`/codefix verify`	Verify recent changes
`/codefix status`	Show meta-state

Pipeline: `mine → audit → evolve/apply → verify`

Safety: Max 3 code changes + 2 skill evolutions per cycle. Phase detection (IMPROVEMENT/SATURATION/COLLAPSE_RISK). Mandatory verification. State in ~/.claude/state/codefix-*.json.

Additional resources

SKILLS-REMOTE-WORK-EU.md — Curated agent skills and subagents for remote EU job market focus.
OPTIMIZATION-STRATEGY.md — Full Two-Layer Model strategy document.

Domain-specific patterns

An MCPDoc MCP server is configured in .claude/settings.json with docs for Drizzle, Next.js, Vercel AI SDK, Trigger.dev, Cloudflare Workers. Call list_doc_sources to see available sources, then fetch_docs on specific URLs when you need deeper detail on any API.

Drizzle ORM + D1

Setup — always cast D1HttpClient as any (it implements a subset of the CF binding interface):

import { drizzle } from "drizzle-orm/d1";
import { createD1HttpClient } from "@/db/d1-http";

const db = drizzle(createD1HttpClient() as any);

Querying — use Drizzle expressions, never raw SQL template literals:

import { eq, and, or, like, inArray, desc, count, sql } from "drizzle-orm";
import { jobs, jobSkillTags } from "@/db/schema";

// Paginate with hasMore trick (avoids extra COUNT on first page)
const rows = await db.select().from(jobs).where(eq(jobs.is_remote_eu, true))
  .orderBy(desc(jobs.posted_at)).limit(limit + 1).offset(offset);
const hasMore = rows.length > limit;

// Subquery
const skillFilter = inArray(
  jobs.id,
  db.select({ job_id: jobSkillTags.job_id }).from(jobSkillTags)
    .where(inArray(jobSkillTags.tag, skills))
    .groupBy(jobSkillTags.job_id)
    .having(sql`count(distinct ${jobSkillTags.tag}) = ${skills.length}`)
);

Types — always derive from schema, never hand-write:

import type { Job, NewJob, Company } from "@/db/schema";
// typeof jobs.$inferSelect  →  Job
// typeof jobs.$inferInsert  →  NewJob

Migration workflow — schema change → generate → apply:

pnpm db:generate   # creates migration file in migrations/
pnpm db:migrate    # applies locally
pnpm db:push       # applies to remote D1

Anti-patterns:

Never write raw SQL strings in resolvers — use Drizzle ORM methods.
Never use db.execute(sql\...`)` for application queries — use typed builder.
Never import from drizzle-orm/pg-core or drizzle-orm/mysql-core — use drizzle-orm/sqlite-core.

Docs: fetch_docs on https://orm.drizzle.team/docs/overview

Apollo Server 5 resolver patterns

Context — always type context as GraphQLContext:

import type { GraphQLContext } from "../../context";
import type { QueryJobsArgs, JobResolvers } from "@/__generated__/resolvers-types";

// Query resolver
async function jobsQuery(_parent: unknown, args: QueryJobsArgs, context: GraphQLContext) {
  return context.db.select().from(jobs)...;
}

// Field resolver — parent type is the raw Drizzle row (Job)
const Job: JobResolvers<GraphQLContext, Job> = {
  async skills(parent, _args, context) {
    return context.loaders.jobSkills.load(parent.id); // always use DataLoaders
  },
  async company(parent, _args, context) {
    if (!parent.company_id) return null;
    return context.loaders.company.load(parent.company_id);
  },
};

JSON column pattern — D1 stores JSON as text; always parse in field resolvers:

departments(parent) {
  if (!parent.departments) return [];
  try { return JSON.parse(parent.departments); }
  catch { return []; }
},

Boolean columns — D1 returns 0/1 for SQLite integers. Fields defined with { mode: "boolean" } in schema are auto-coerced by Drizzle. Fields without it need manual coercion in resolvers:

is_remote_eu(parent) {
  return (parent.is_remote_eu as unknown) === 1 || parent.is_remote_eu === true;
},

Admin guard — any mutation that modifies production data must check:

import { isAdminEmail } from "@/lib/admin";

if (!context.userId || !isAdminEmail(context.userEmail)) {
  throw new Error("Forbidden");
}

Anti-patterns:

Never query the DB directly inside field resolvers — always go through context.loaders.* DataLoaders to avoid N+1.
Never use any for context — use the generated GraphQLContext type.
Never edit files in src/__generated__/ — they are overwritten by pnpm codegen.
Prefer generated types from @/__generated__/resolvers-types.ts over any in resolver signatures.

GraphQL codegen

Run pnpm codegen after any change to schema/**/*.graphql. Generates into src/__generated__/:

File	Contents
`types.ts`	TS types for schema (strict scalars)
`resolvers-types.ts`	Resolver types with `GraphQLContext`
`hooks.tsx`	React Apollo hooks
`typeDefs.ts`	Merged type definitions

Custom scalar mappings (in codegen.ts): DateTime/URL/EmailAddress → string, JSON → any, Upload → File.

Anti-patterns:

Never skip codegen after schema changes — stale types cause silent runtime mismatches.
Never manually edit src/__generated__/ files.

Trigger.dev tasks

Tasks live in src/trigger/ and must be registered. Pattern:

import { task, logger } from "@trigger.dev/sdk/v3";
import { drizzle } from "drizzle-orm/d1";
import { createD1HttpClient } from "../db/d1-http";

// Lazy DB init — don't create at module level
function getDb() {
  return drizzle(createD1HttpClient() as any);
}

export const myTask = task({
  id: "my-task",           // unique kebab-case, matches trigger.config.ts registration
  maxDuration: 120,        // seconds
  retry: {
    maxAttempts: 3,
    minTimeoutInMs: 2000,
    maxTimeoutInMs: 30000,
    factor: 2,
  },
  queue: { concurrencyLimit: 5 },

  run: async (payload: MyPayload) => {
    logger.info("Starting task", { ...payload });  // use logger, not console
    const db = getDb();
    // ... do work
    return { success: true };
  },

  handleError: async (payload, error) => {
    const msg = error instanceof Error ? error.message : String(error);
    if (msg.includes("404")) {
      logger.info("Resource not found, skipping retry");
      return { skipRetrying: true };  // prevents retry for known terminal errors
    }
    logger.error("Task failed", { error: msg });
    // return nothing = allow retry
  },
});

Anti-patterns:

Never import from @trigger.dev/sdk — use @trigger.dev/sdk/v3.
Never create the DB client at module level — always lazy-init inside run or a factory function.
Never use console.log inside tasks — use logger.* so logs appear in the Trigger.dev dashboard.
Never forget to export the task — unregistered tasks silently fail to trigger.

Docs: fetch_docs on https://trigger.dev/docs/tasks-overview

D1 Gateway / batch queries

Prefer batching when making multiple independent queries in one request:

// Single exec (via Drizzle — preferred for typed queries)
const result = await context.db.select().from(jobs).where(eq(jobs.id, id));

// Raw batch (for admin scripts / multi-statement operations)
const client = createD1HttpClient();
const [jobsResult, companiesResult] = await client.batch([
  "SELECT count(*) FROM jobs",
  "SELECT count(*) FROM companies",
]);

The D1HttpClient singleton is cached (_cachedClient) — do not re-instantiate per request.

Anti-patterns:

Never make N sequential fetch() calls to the D1 gateway when a batch would work.
Never bypass createD1HttpClient() by constructing D1HttpClient directly — the factory handles env var selection and caching.

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Common commands

Architecture

Database access

Data flow

GraphQL codegen

Workers

API routes

Tech stack

Key structural patterns

Optimization Strategy (Two-Layer Model)

Coding conventions

Environment variables

Known issues

Performance

Security

Type safety

Dead code

Dependencies

ashby-crawler (Rust/WASM) reference

Spec-Driven Development (SDD)

Structure

SDD Commands

Orchestrator Rules

Dependency Graph

Autonomous Self-Improvement Team

Structure

Commands

Goal Phases

Integration

Autonomous Codebase Self-Improvement Team

Structure

Commands

Pipeline: mine → audit → evolve/apply → verify

Additional resources

Domain-specific patterns

Drizzle ORM + D1

Apollo Server 5 resolver patterns

GraphQL codegen

Trigger.dev tasks

D1 Gateway / batch queries

Pipeline: `mine → audit → evolve/apply → verify`