Cursor Character Prefix Conditioning Challenge

A technical interview template for implementing efficient character prefix conditioning algorithms for language model code completion.

🎯 Challenge Overview

This project simulates a real technical interview at Cursor, focusing on a core algorithmic challenge in language model-based code completion: character prefix conditioning.

The Problem

When using language models for code completion, we need to produce completions that begin with what the user has typed. However, language models operate on tokens, not characters. If the user's cursor doesn't lie on a token boundary, naive tokenization produces incorrect results.

Your task: Design and implement an algorithm that samples token sequences conditional on a character prefix, ensuring the concatenated token representations start with the given prefix.

Mathematical Formulation

Sample sequence s = t₁, t₂, ..., tₙ from distribution p(s) where:

p(s) = p(t₁, t₂, ..., tₙ) = ∏ₖ₌₁ⁿ p(tₖ | t₁, ..., tₖ₋₁)
Constraint: P is a prefix of repr(t₁) + repr(t₂) + ... + repr(tₙ)
Goal: Sample from q(s) = p(s | s starts with P)

🏗️ Project Structure

src/
├── lib/
│   ├── prefixConditioning.ts    # 🎯 Main algorithm implementation
│   ├── tokenFilter.ts           # 🔍 Token filtering and constraint logic
│   ├── probabilitySampler.ts    # 📊 Probability sampling with normalization
│   └── mockLanguageModel.ts     # 🤖 Mock tokenizer and language model
└── app/
    └── page.tsx                 # 🖥️ Interactive demo interface

🚀 Getting Started

Install dependencies:
```
npm install
```
Start the development server:
```
npm run dev
```
Open the challenge: Navigate to http://localhost:3000

📋 Interview Structure

Phase 1: System Design (30 minutes)

Design an efficient system architecture focusing on:

🏛️ Core Components

Tokenization Strategy: Map between character positions and token boundaries
Token Filtering Architecture: Efficiently identify valid candidate tokens
Probability Distribution Management: Handle constraint-based filtering effects
Optimization & Caching: Minimize language model API calls

🎯 Key Design Questions

How do you efficiently find tokens that could start with prefix P?
How do you track which token combinations still satisfy the constraint?
How do you maintain probability distributions while filtering?
What edge cases exist (empty prefix, no valid tokens, Unicode)?

Phase 2: Implementation (30 minutes)

Implement the core algorithm in the provided file structure:

📝 Implementation Tasks

Core Algorithm (src/lib/prefixConditioning.ts)
- Main character prefix conditioning function
- Autoregressive sampling with character constraints
- Integration with mock language model interface
Token Filtering (src/lib/tokenFilter.ts)
- Find tokens compatible with character prefix
- Validate token sequences against prefix constraint
- Efficient prefix matching algorithms
Probability Sampling (src/lib/probabilitySampler.ts)
- Sample from filtered probability distributions
- Implement proper normalization after filtering
- Handle edge cases (no valid tokens, zero probabilities)
Mock Infrastructure (src/lib/mockLanguageModel.ts)
- Simple tokenizer implementation
- Mock language model with probability distributions
- Test data for algorithm validation
Interactive Demo (src/app/page.tsx)
- Input field for character prefix
- Real-time algorithm execution
- Display sampled tokens and resulting text

🧮 Algorithm Reference

function sampleWithCharacterPrefix(prefix: string, maxTokens: number) {
  const result = [];
  let currentPrefix = prefix;

  for (let position = 1; position <= maxTokens; position++) {
    let candidates;
    if (position === 1) {
      candidates = findFirstTokenCandidates(currentPrefix);
    } else {
      candidates = findNextTokenCandidates(result, currentPrefix);
    }

    if (candidates.length === 0) break;

    const probabilities = getFilteredProbabilities(candidates, result);
    const token = sampleFromDistribution(candidates, probabilities);
    result.push(token);

    const tokenText = getTokenRepresentation(token);
    if (tokenText.startsWith(currentPrefix)) {
      currentPrefix = tokenText.slice(currentPrefix.length);
      if (currentPrefix.length === 0) {
        // Prefix fully satisfied, switch to normal sampling
        return [...result, ...sampleNormally(maxTokens - position)];
      }
    }
  }

  return result;
}

🎯 Implementation Tips

Getting Started

Start with a simple tokenizer (space-separated words)
Implement exact string matching before optimization
Add comprehensive test cases for edge conditions
Consider time complexity - aim for O(V) per token where V is vocab size

Key Considerations

Handle the transition from constrained to unconstrained sampling
Ensure proper probability normalization after filtering
Consider Unicode characters and multi-byte sequences
Implement efficient caching for repeated prefix queries

Testing Strategy

Create deterministic test cases with known outcomes
Test edge cases: empty prefix, no valid tokens, partial matches
Validate probability distributions sum to 1 after filtering
Test with various temperature and sampling parameters

📚 Resources

🏆 Evaluation Criteria

Correctness: Character prefix constraint satisfaction
Efficiency: Token filtering and probability sampling performance
Edge Cases: Proper handling of boundary conditions
Code Quality: Clear structure, testing, and documentation
Understanding: Demonstrated knowledge of language model sampling principles

🛠️ Development Commands

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

# Run linter
npm run lint

# Run type checking
npm run type-check

📝 Notes

This is a template for conducting technical interviews. The implementation should focus on:

Algorithmic thinking: How to efficiently solve the constraint satisfaction problem
System design: Scalable architecture for real-world usage
Code quality: Clean, maintainable, and well-documented code
Testing: Comprehensive coverage of edge cases and scenarios

Good luck with the implementation! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
public		public
src		src
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cursor Character Prefix Conditioning Challenge

🎯 Challenge Overview

The Problem

Mathematical Formulation

🏗️ Project Structure

🚀 Getting Started

📋 Interview Structure

Phase 1: System Design (30 minutes)

🏛️ Core Components

🎯 Key Design Questions

Phase 2: Implementation (30 minutes)

📝 Implementation Tasks

🧮 Algorithm Reference

🎯 Implementation Tips

Getting Started

Key Considerations

Testing Strategy

📚 Resources

🏆 Evaluation Criteria

🛠️ Development Commands

📝 Notes

About

Uh oh!

Releases

Packages

Languages

team-headstart/cursor-interview-challenge

Folders and files

Latest commit

History

Repository files navigation

Cursor Character Prefix Conditioning Challenge

🎯 Challenge Overview

The Problem

Mathematical Formulation

🏗️ Project Structure

🚀 Getting Started

📋 Interview Structure

Phase 1: System Design (30 minutes)

🏛️ Core Components

🎯 Key Design Questions

Phase 2: Implementation (30 minutes)

📝 Implementation Tasks

🧮 Algorithm Reference

🎯 Implementation Tips

Getting Started

Key Considerations

Testing Strategy

📚 Resources

🏆 Evaluation Criteria

🛠️ Development Commands

📝 Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages