Skip to content

Latest commit

 

History

History
570 lines (425 loc) · 23 KB

File metadata and controls

570 lines (425 loc) · 23 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Remote EU job board aggregator. Next.js 16 frontend + GraphQL API backed by Cloudflare D1 (SQLite), with an AI/ML pipeline for job classification, skill extraction, and resume matching.


Common commands

# Dev
pnpm dev                          # Start Next.js dev server (localhost:3000)
pnpm build                        # Production build
pnpm lint                         # ESLint (next lint)

# GraphQL codegen — run after modifying any schema/**/*.graphql file
pnpm codegen

# Database
pnpm db:generate                  # Generate Drizzle migration files
pnpm db:migrate                   # Apply locally with Drizzle Kit
pnpm db:push                      # Apply migrations to remote D1
pnpm db:studio                    # Drizzle Studio

# Evaluation
pnpm eval:langfuse                # Run LLM classification eval with Langfuse Datasets

# Strategy enforcement
pnpm strategy:check               # Validate staged changes against optimization strategy
pnpm strategy:check:all           # Validate all tracked files

# Scripts
pnpm jobs:ingest                  # Ingest jobs from ATS platforms
pnpm jobs:enhance                 # Enhance all jobs with ATS data
pnpm jobs:status                  # Check ingestion status
pnpm jobs:extract-skills          # Extract skills during ingestion
pnpm skills:extract               # Extract skills from jobs
pnpm skills:seed                  # Seed skill taxonomy
pnpm boards:discover              # Discover Ashby boards
pnpm janitor:trigger              # Manually trigger janitor worker

# Workers
wrangler deploy --config wrangler.d1-gateway.toml   # Deploy D1 gateway
wrangler tail --config wrangler.d1-gateway.toml     # Stream gateway logs
cd workers/ashby-crawler && wrangler dev             # Ashby crawler local dev

# Deployment
pnpm deploy                       # Vercel deploy (runs scripts/deploy.ts)
wrangler deploy --config workers/ashby-crawler/wrangler.toml  # Ashby crawler

Architecture

Database access

Production: Next.js (Vercel) → D1 Gateway Worker (CF) → D1 Database (binding) Dev fallback: Next.js → Cloudflare REST API → D1 Database

The D1 Gateway Worker (workers/d1-gateway.ts) supports batched queries — prefer batching for multi-query operations. Authenticated via API_KEY secret. Database schema is in src/db/schema.ts (Drizzle ORM), migrations in migrations/.

Data flow

1. Discovery:      ATS Sources (D1) --[Cron Worker]--> Trigger Ingestion
2. Board Crawl:    Common Crawl CDX --[ashby-crawler (Rust)]--> Ashby boards → D1
3. Ingestion:      ATS APIs (Greenhouse/Lever/Ashby) --[Insert Worker]--> D1
4. Enhancement:    Job IDs --[Trigger.dev / GraphQL Mutation]--> ATS API --> D1
5. Classification: Unprocessed jobs --[process-jobs (Python) / DeepSeek]--> is_remote_eu --> D1
6. Skill Extract:  Job descriptions --[LLM pipeline]--> Skills → D1
7. Resume Match:   Resumes --[resume-rag (Python) / Vectorize]--> Vector search
8. Serving:        Browser --[Apollo Client]--> /api/graphql --[D1 HTTP]--> Gateway --> D1
9. Evaluation:     Langfuse / Vitest --[LLM calls]--> Accuracy scores

GraphQL codegen

Configuration in codegen.ts. Generates from schema/**/*.graphql into src/__generated__/:

  • Client preset (typed gql function, fragment masking)
  • hooks.tsx — React Apollo hooks
  • types.ts — TypeScript types (strict scalars)
  • resolvers-types.ts — Resolver types with GraphQLContext

Custom scalars: DateTime/URL/EmailAddressstring, UploadFile, JSONany.

Workers

Worker Config Runtime Key details
janitor wrangler.toml TypeScript Daily midnight UTC, triggers ATS ingestion
d1-gateway wrangler.d1-gateway.toml TypeScript On-demand HTTP, D1 binding
insert-jobs wrangler.insert-jobs.toml TypeScript Queue-based, still uses Turso (legacy)
process-jobs workers/process-jobs/wrangler.jsonc Python/LangGraph Every 6h + queue, DeepSeek classification
ashby-crawler workers/ashby-crawler/wrangler.toml Rust/WASM Common Crawl → D1, rig_compat module
resume-rag workers/resume-rag/wrangler.jsonc Python Vectorize + Workers AI

API routes

Route Purpose
/api/graphql Apollo Server GraphQL endpoint (main API)
/api/text-to-sql Natural language → SQL query
/api/enhance-greenhouse-jobs Trigger Greenhouse job enhancement
/api/companies/bulk-import Bulk import companies
/api/companies/enhance Enhance company data

GraphQL Playground: http://localhost:3000/api/graphql. Vercel routes have 60s max duration (vercel.json).


Tech stack

Layer Technology
Frontend Next.js 16, React 19, App Router
Language TypeScript 5.9
Database Cloudflare D1 (SQLite) via D1 Gateway Worker
ORM Drizzle ORM
API Apollo Server 5 (GraphQL)
Auth Clerk
AI/ML Vercel AI SDK, Anthropic Claude (+ Agent SDK), DeepSeek, OpenRouter
Background jobs Trigger.dev, Cloudflare Workers (cron + queues)
Observability Langfuse, LangSmith, OpenTelemetry (partially active)
Evaluation Langfuse, Vitest
Deployment Vercel (app), Cloudflare Workers (workers)
Package manager pnpm 10.10
UI Radix UI (Themes + Icons)

Key structural patterns

  • GraphQL schema lives in schema/ (by domain: base/, jobs/, companies/, applications/, prompts/). Query/mutation/fragment documents are in src/graphql/.
  • Resolvers are in src/apollo/resolvers/ — job resolvers in src/apollo/resolvers/job/.
  • ATS ingestion fetchers: src/ingestion/{greenhouse,lever,ashby}.ts — primary job discovery channel.
  • Skills subsystem: src/lib/skills/ — taxonomy, extraction, vector ops, filtering.
  • AI agents: src/agents/ (Vercel AI SDK — SQL, admin, strategy enforcer), src/anthropic/ (Claude client, MCP, sub-agents, architect).
  • Database tools for agents: src/tools/database/ (introspection + SQL execution).
  • Rust worker: workers/ashby-crawler/src/lib.rsrig_compat module implements VectorStore, Pipeline, Tool patterns for WASM (swap to rig::* when rig-core ships wasm32 support).

Optimization Strategy (Two-Layer Model)

See OPTIMIZATION-STRATEGY.md for the full strategy document. Key constraints:

Meta Approach Status What It Guarantees
Eval-First PRIMARY Every prompt/model change tested against >= 80% accuracy bar
Grounding-First PRIMARY LLM outputs schema-constrained; skills validated against taxonomy
Multi-Model Routing SECONDARY Cheap model first (Workers AI), escalate on low confidence only
Spec-Driven CROSS-CUTTING GraphQL + Drizzle + Zod schemas as formal contracts
Observability EMERGING Langfuse scoring active; production tracing partial

The strategy enforcer (src/agents/strategy-enforcer.ts) is available as a plain async function.


Coding conventions

  • Files: kebab-case (jobs-query.ts). Components: PascalCase (JobsSearchBar.tsx).
  • DB columns: snake_case. GraphQL fields: camelCase. Variables: camelCase.
  • Path alias: @/* maps to ./src/* (tsconfig.json).
  • Module type: ES Modules ("type": "module").
  • Use Drizzle ORM for all DB queries — no raw SQL strings.
  • Mutations that modify production data must include isAdminEmail() guard (from src/lib/admin.ts).
  • Prefer generated types from src/__generated__/resolvers-types.ts over any.
  • React providers: *-provider.tsx in src/components/.
  • Bash scripts: Never create .sh files or bash scripts in repository. Use bash tool for simple commands only (e.g., git status, npm run build). Complex operations should use Task tool with agents.

Environment variables

Copy .env.example to .env.local. Key groups: D1 Gateway (or Cloudflare REST API for dev), Clerk auth, AI provider keys (Anthropic, DeepSeek, OpenAI, Gemini), Langfuse/LangSmith observability, admin email, app URL. See .env.example for full list.


Known issues

Performance

  • Full table scan in src/apollo/resolvers/job/enhance-job.ts — fetches all jobs to find one by external_id.
  • N+1 queries for skills, company, and ATS board sub-fields — no DataLoader.

Security

  • CORS on D1 Gateway is *.
  • No GraphQL query complexity/depth limiting.

Type safety

  • ignoreBuildErrors: true in next.config.ts masks TS errors in builds.
  • 283+ any types in resolvers.

Dead code

  • scripts/ingest-jobs.ts still documents Turso env vars in its help text (stale).
  • @libsql/client and pg are likely unused after D1 migration — can be removed from dependencies.

Dependencies

  • @ai-sdk/anthropic pinned to "latest" — should use specific version.
  • @libsql/client and pg are likely unused after D1 migration.

ashby-crawler (Rust/WASM) reference

cargo install worker-build                    # Install WASM build tool (once)
cd workers/ashby-crawler && wrangler dev      # Local dev
wrangler deploy --config workers/ashby-crawler/wrangler.toml  # Deploy
wrangler d1 execute nomadically-work-db --remote \
  --file workers/ashby-crawler/migrations/0001_init/up.sql    # Apply migrations

Key endpoints: /crawl (paginated CC crawl), /boards (list/search), /search (TF-IDF vector search), /enrich / /enrich-all (enrichment pipeline), /tools (OpenAI function-calling schemas), /indexes, /progress, /stats.


Spec-Driven Development (SDD)

Uses agent-teams-lite for spec-driven development. The orchestrator (.claude/commands/sdd.md) delegates work to specialized sub-agents via the Task tool, using native agent teams (TeamCreate, shared task lists, messaging) for parallel phases.

Structure

Path Contents
.claude/commands/sdd.md SDD orchestrator — routing, DAG detection, team/subagent dispatch
.claude/skills/sdd-*/SKILL.md 9 SDD sub-agent skill files (explore, propose, spec, design, tasks, apply, verify, archive, init)
.claude/skills/improve-*/SKILL.md 6 job search self-improvement skills (mine, audit, evolve, apply, verify, meta)
.claude/skills/codefix-*/SKILL.md 6 codebase self-improvement skills (mine, audit, evolve, apply, verify, meta)
openspec/ Specs, change proposals, designs, and task breakdowns (created by sdd-init)
.claude/commands/ Project-specific commands (build-and-push, gql-agent, improve, codefix, sdd)

SDD Commands

Command Action Mode
/sdd:init Bootstrap openspec/ in current project single subagent
/sdd:explore <topic> Investigate an idea (no files created) single subagent
/sdd:new <change-name> Start a new change (creates proposal) single subagent
/sdd:continue Create next artifact in dependency chain auto-detect
/sdd:ff <change-name> Fast-forward: create all planning artifacts agent team (specs+design parallel)
/sdd:apply Implement tasks agent team if multi-phase
/sdd:verify Validate implementation against specs single subagent
/sdd:archive Sync specs + archive completed change single subagent

Orchestrator Rules

  1. The lead agent NEVER executes phase work inline — always delegate to sub-agents
  2. Use subagents for single sequential phases (explore, propose, verify, archive)
  3. Use native agent teams when specs+design or multi-phase apply can parallelize
  4. Between phases, show the user what was done and ask to proceed
  5. Keep orchestrator context minimal — pass file paths, not file contents
  6. Require plan approval for apply teammates (they write code)
  7. Do NOT force SDD on small tasks (single file edits, quick fixes, questions)

Dependency Graph

proposal → specs ──→ tasks → apply → verify → archive
              ↕
           design

Specs and design run in parallel (agent team in /sdd:ff); tasks depends on both; verify is optional but recommended before archive.


Autonomous Self-Improvement Team

Goal-driven team of 6 specialists focused on helping find a fully remote EU AI engineering role. Grounded in autonomous agent research (AutoRefine, Meta Context Engineering, CASTER, ROMA, Phase Transition theory).

Structure

Path Agent Mission
.claude/skills/improve-mine/SKILL.md Pipeline Monitor Is the pipeline healthy? Are AI jobs flowing?
.claude/skills/improve-audit/SKILL.md Discovery Expander Find more companies hiring AI engineers remotely in EU
.claude/skills/improve-evolve/SKILL.md Classifier Tuner Reduce missed opportunities in remote EU classification
.claude/skills/improve-apply/SKILL.md Skill Optimizer Better AI/ML skill taxonomy, extraction, and matching
.claude/skills/improve-verify/SKILL.md Application Coach Learn from application patterns, improve interview prep
.claude/skills/improve-meta/SKILL.md Strategy Brain Coordinate toward the goal: get hired
.claude/commands/improve.md Orchestrator Entry point and team coordination

Commands

Command Action
/improve Full autonomous cycle (Strategy Brain decides what to do)
/improve status Pipeline health check
/improve discover Find new AI engineering job sources
/improve classify Tune classification accuracy
/improve skills Optimize AI/ML skill matching
/improve coach Application coaching and prep improvement

Goal Phases

Phase Focus Trigger
BUILDING Discovery + classification < 5 AI jobs/week
OPTIMIZING Classifier + skills Jobs flowing but low relevance
APPLYING Coaching + prep Good jobs surfacing, need to convert
INTERVIEWING Deep prep + research Applications converting to interviews

Integration

  • stop_hook.py + improvement_agent.py → session scoring and learning
  • Langfuse → observability and score trends
  • Strategy Enforcer → validates changes align with optimization strategy
  • State files in ~/.claude/state/ → continuity across sessions

Autonomous Codebase Self-Improvement Team

Separate team focused purely on code quality, performance, type safety, security, and dead code — independent of business goals.

Structure

Path Agent Role
.claude/skills/codefix-mine/SKILL.md Trajectory Miner Mine session transcripts for code quality patterns
.claude/skills/codefix-audit/SKILL.md Codebase Auditor Deep code investigation with file:line findings
.claude/skills/codefix-evolve/SKILL.md Skill Evolver Improve skills, prompts, CLAUDE.md
.claude/skills/codefix-apply/SKILL.md Code Improver Implement fixes (perf, types, security, dead code)
.claude/skills/codefix-verify/SKILL.md Verification Gate Validate changes, run builds, catch regressions
.claude/skills/codefix-meta/SKILL.md Meta-Optimizer Coordinate, prioritize, track progress
.claude/commands/codefix.md Orchestrator Entry point

Commands

Command Action
/codefix Full autonomous cycle
/codefix audit [target] Targeted audit (resolvers, workers, security, types, etc.)
/codefix apply Implement pending findings
/codefix verify Verify recent changes
/codefix status Show meta-state

Pipeline: mine → audit → evolve/apply → verify

Safety: Max 3 code changes + 2 skill evolutions per cycle. Phase detection (IMPROVEMENT/SATURATION/COLLAPSE_RISK). Mandatory verification. State in ~/.claude/state/codefix-*.json.


Additional resources


Domain-specific patterns

An MCPDoc MCP server is configured in .claude/settings.json with docs for Drizzle, Next.js, Vercel AI SDK, Trigger.dev, Cloudflare Workers. Call list_doc_sources to see available sources, then fetch_docs on specific URLs when you need deeper detail on any API.


Drizzle ORM + D1

Setup — always cast D1HttpClient as any (it implements a subset of the CF binding interface):

import { drizzle } from "drizzle-orm/d1";
import { createD1HttpClient } from "@/db/d1-http";

const db = drizzle(createD1HttpClient() as any);

Querying — use Drizzle expressions, never raw SQL template literals:

import { eq, and, or, like, inArray, desc, count, sql } from "drizzle-orm";
import { jobs, jobSkillTags } from "@/db/schema";

// Paginate with hasMore trick (avoids extra COUNT on first page)
const rows = await db.select().from(jobs).where(eq(jobs.is_remote_eu, true))
  .orderBy(desc(jobs.posted_at)).limit(limit + 1).offset(offset);
const hasMore = rows.length > limit;

// Subquery
const skillFilter = inArray(
  jobs.id,
  db.select({ job_id: jobSkillTags.job_id }).from(jobSkillTags)
    .where(inArray(jobSkillTags.tag, skills))
    .groupBy(jobSkillTags.job_id)
    .having(sql`count(distinct ${jobSkillTags.tag}) = ${skills.length}`)
);

Types — always derive from schema, never hand-write:

import type { Job, NewJob, Company } from "@/db/schema";
// typeof jobs.$inferSelect  →  Job
// typeof jobs.$inferInsert  →  NewJob

Migration workflow — schema change → generate → apply:

pnpm db:generate   # creates migration file in migrations/
pnpm db:migrate    # applies locally
pnpm db:push       # applies to remote D1

Anti-patterns:

  • Never write raw SQL strings in resolvers — use Drizzle ORM methods.
  • Never use db.execute(sql\...`)` for application queries — use typed builder.
  • Never import from drizzle-orm/pg-core or drizzle-orm/mysql-core — use drizzle-orm/sqlite-core.

Docs: fetch_docs on https://orm.drizzle.team/docs/overview


Apollo Server 5 resolver patterns

Context — always type context as GraphQLContext:

import type { GraphQLContext } from "../../context";
import type { QueryJobsArgs, JobResolvers } from "@/__generated__/resolvers-types";

// Query resolver
async function jobsQuery(_parent: unknown, args: QueryJobsArgs, context: GraphQLContext) {
  return context.db.select().from(jobs)...;
}

// Field resolver — parent type is the raw Drizzle row (Job)
const Job: JobResolvers<GraphQLContext, Job> = {
  async skills(parent, _args, context) {
    return context.loaders.jobSkills.load(parent.id); // always use DataLoaders
  },
  async company(parent, _args, context) {
    if (!parent.company_id) return null;
    return context.loaders.company.load(parent.company_id);
  },
};

JSON column pattern — D1 stores JSON as text; always parse in field resolvers:

departments(parent) {
  if (!parent.departments) return [];
  try { return JSON.parse(parent.departments); }
  catch { return []; }
},

Boolean columns — D1 returns 0/1 for SQLite integers. Fields defined with { mode: "boolean" } in schema are auto-coerced by Drizzle. Fields without it need manual coercion in resolvers:

is_remote_eu(parent) {
  return (parent.is_remote_eu as unknown) === 1 || parent.is_remote_eu === true;
},

Admin guard — any mutation that modifies production data must check:

import { isAdminEmail } from "@/lib/admin";

if (!context.userId || !isAdminEmail(context.userEmail)) {
  throw new Error("Forbidden");
}

Anti-patterns:

  • Never query the DB directly inside field resolvers — always go through context.loaders.* DataLoaders to avoid N+1.
  • Never use any for context — use the generated GraphQLContext type.
  • Never edit files in src/__generated__/ — they are overwritten by pnpm codegen.
  • Prefer generated types from @/__generated__/resolvers-types.ts over any in resolver signatures.

GraphQL codegen

Run pnpm codegen after any change to schema/**/*.graphql. Generates into src/__generated__/:

File Contents
types.ts TS types for schema (strict scalars)
resolvers-types.ts Resolver types with GraphQLContext
hooks.tsx React Apollo hooks
typeDefs.ts Merged type definitions

Custom scalar mappings (in codegen.ts): DateTime/URL/EmailAddressstring, JSONany, UploadFile.

Anti-patterns:

  • Never skip codegen after schema changes — stale types cause silent runtime mismatches.
  • Never manually edit src/__generated__/ files.

Trigger.dev tasks

Tasks live in src/trigger/ and must be registered. Pattern:

import { task, logger } from "@trigger.dev/sdk/v3";
import { drizzle } from "drizzle-orm/d1";
import { createD1HttpClient } from "../db/d1-http";

// Lazy DB init — don't create at module level
function getDb() {
  return drizzle(createD1HttpClient() as any);
}

export const myTask = task({
  id: "my-task",           // unique kebab-case, matches trigger.config.ts registration
  maxDuration: 120,        // seconds
  retry: {
    maxAttempts: 3,
    minTimeoutInMs: 2000,
    maxTimeoutInMs: 30000,
    factor: 2,
  },
  queue: { concurrencyLimit: 5 },

  run: async (payload: MyPayload) => {
    logger.info("Starting task", { ...payload });  // use logger, not console
    const db = getDb();
    // ... do work
    return { success: true };
  },

  handleError: async (payload, error) => {
    const msg = error instanceof Error ? error.message : String(error);
    if (msg.includes("404")) {
      logger.info("Resource not found, skipping retry");
      return { skipRetrying: true };  // prevents retry for known terminal errors
    }
    logger.error("Task failed", { error: msg });
    // return nothing = allow retry
  },
});

Anti-patterns:

  • Never import from @trigger.dev/sdk — use @trigger.dev/sdk/v3.
  • Never create the DB client at module level — always lazy-init inside run or a factory function.
  • Never use console.log inside tasks — use logger.* so logs appear in the Trigger.dev dashboard.
  • Never forget to export the task — unregistered tasks silently fail to trigger.

Docs: fetch_docs on https://trigger.dev/docs/tasks-overview


D1 Gateway / batch queries

Prefer batching when making multiple independent queries in one request:

// Single exec (via Drizzle — preferred for typed queries)
const result = await context.db.select().from(jobs).where(eq(jobs.id, id));

// Raw batch (for admin scripts / multi-statement operations)
const client = createD1HttpClient();
const [jobsResult, companiesResult] = await client.batch([
  "SELECT count(*) FROM jobs",
  "SELECT count(*) FROM companies",
]);

The D1HttpClient singleton is cached (_cachedClient) — do not re-instantiate per request.

Anti-patterns:

  • Never make N sequential fetch() calls to the D1 gateway when a batch would work.
  • Never bypass createD1HttpClient() by constructing D1HttpClient directly — the factory handles env var selection and caching.