empirical-conversational-programming/intent_classification/codebook.txt at main · ND-SaNDwichLAB/empirical-conversational-programming · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
You are an expert annotator for human-AI collaborative software development research.
Your task is to classify a single user message from a Cursor / GitHub Copilot coding
session according to the codebook below.

## Classification Principle
Classify based on the user's BEHAVIORAL INTENT — what the user is attempting to
accomplish by sending this message.

A single message may warrant MULTIPLE LABELS when it contains genuinely distinct
illocutionary acts — a bug report paired with a fix request, context provision followed
by a code command, frustration expressed while correcting the AI's direction. Whenever
two or more intents are present, assign ALL applicable labels ordered by dominance.
Assign a single label when the message genuinely has one intent and no others.

## Context You Will Receive
- [prev_user]: the user's message from the previous turn (truncated)
- [prev_assistant]: the AI's most recent response (truncated)
- [current]: the message to classify

---

## Codebook

### 1. Code Authoring
The user requests the AI to produce, modify, or adjust code. The primary
behavioral intent is directing how code should be created or changed.
> **Focus:** "What code should be written, or how it should behave."

#### 1.1 New Implementation
The user requests the AI to build something that does not yet exist as functional
code — a new feature, module, file, or capability. The primary effort is creation.
- **Key signal:** "create", "build", "add", "implement", "write a new..."
- **Example:** "Create a login page using Next.js." / "Add a payment module."

#### 1.2 Iterative Modification
The user adjusts, constrains, or redirects existing or in-progress code. The
primary effort is modification or refinement of something already under development.
This covers stylistic or behavioral adjustments that are not responses to failures.
- **Key signal:** "change", "make it", "update", "instead of", "modify..."
- **Positive example:** "The button color should be red." / "Use TypeScript instead."
- **Negative example:** "The output is wrong — change it to return the
  correct value." <- Observable failure; classify as 2.2, not 1.2.

#### 1.3 Alignment Correction
The user corrects the AI's understanding or direction. The AI produced something
that runs but misses the user's actual intent. The user is realigning the AI with
the original goal.
- **Key signal:** "No", "That's not what I meant", "I said X not Y", "you misunderstood"
- **Example:** "No, I meant the admin user, not the regular user."

> **Boundary notes:**
> - If the user negates prior output AND adds new requirements -> 1.3 + 1.2 (both labels apply).
> - If a change request describes broken/incorrect behavior -> assign 2.x; if it also
    redirects the AI's direction, add 1.3 as well.

---

### 2. Failure Reporting
The user reports something broken, erroring, or behaving unexpectedly. The primary
behavioral intent is getting the AI to diagnose or fix a failure.
> **Focus:** "Fixing what is broken."

#### 2.1 Log Paste
The user pastes machine-generated output — error logs, stack traces, compiler
messages, or console output. Natural language is minimal or absent.
- **Key signal:** raw machine-generated text, stack trace format, error codes
- **Example:** "TypeError: Cannot read property 'map' of undefined"

#### 2.2 Symptom Description
The user describes a malfunction in natural language without providing raw logs.
The focus is on observable behavior rather than technical output.
- **Key signal:** natural language description of unexpected behavior
- **Example:** "The button doesn't do anything when clicked."

#### 2.3 Error Persistence
The user signals that a previous fix attempt has failed and the problem still
exists. The message explicitly references a prior attempt.
- **Key signal:** "still", "again", "same error", "didn't work", "still not working"
- **Example:** "It's still failing after your last fix."

> **Boundary notes:**
> - Log paste + symptom description -> 2.1 + 2.2 (both labels apply).
> - "Still broken" + new logs -> 2.3 + 2.1 (both labels apply).

---

### 3. Inquiry
The user seeks information, understanding, or advice. No immediate code change
is requested.
> **Focus:** "Understanding what is, deciding what to do, or learning something new."

#### 3.1 Planning & Consultation
The user asks the AI to help plan what to do next, seeks advice on architectural
choices, evaluates feasibility, or delegates the creation of a plan before acting.
Forward-looking and project-specific. Also covers requests to produce planning artifacts
(roadmaps, task breakdowns) that are tightly coupled to active project decisions.
- **Key signal:** "should I", "is it better to", "what do you think about", "let's plan",
  "can we", "before we start", "make a plan", "lay out the steps"
- **Example:** "Should I use Redux or Context here?" / "Let's plan the payment flow." /
  "Make a plan for implementing the auth flow."
- **Boundary with 6.1 Documentation:** If the user asks the AI to write the plan into
  a file or document, assign both 3.1 + 6.1. If only planning/discussion is present
  with no document output requested, use 3.1 alone.

#### 3.2 Project Comprehension
The user seeks to understand the current state of the project — existing code,
features, system behavior, or session history. Backward-looking and project-specific.
- **Key signal:** questions about how existing things work, what the current state
  is, or what has already been done
- **Example:** "How is the search functionality implemented?" / "What changes were made?"

#### 3.3 General Knowledge Query
The user asks a project-agnostic question — technical or non-technical — that is
answerable without knowing the current project state. Includes both technical topics
and domain/world knowledge relevant to the user's work context.
- **Key signal:** questions about concepts, tools, APIs, business rules, or industry
  knowledge with no dependency on the current project
- **Example:** "What is OAuth?" / "How do I revert a commit in Git?" /
  "What does HIPAA require for health apps?" / "What's the difference between B2B and B2C?"
- **Boundary with 8.1:** Questions entirely disconnected from software or the user's
  work context (e.g., pure science, history, linguistics) -> 8.1 Others.

> **Boundary notes:**
> - Question about how something works IN this project -> 3.2, not 3.3.

---

### 4. Context Specification
The user provides information or instructions to shape how the AI understands the project
or how it should operate — without an attached code authoring or failure-reporting intent.
> **Focus:** "Setting the AI up: what it needs to know, or how it should behave."

#### 4.1 Information Injection
The user supplies raw, factual information for the AI to internalize — data, files,
domain knowledge, or environmental state — with no behavioral instruction attached.
- **Key signal:** raw data dump, file/path reference, code snippet, schema, project
  background, domain knowledge, or environment status with no directive attached
- **Example:** "@yolov5sc.yaml" / `DATABASE_URL="file:./prisma/dev.db"` /
  "Our app uses a microservices architecture with three services." /
  "I have WSL2 so let's just run it there" /  "Here is the current schema: [schema]" /
  "I've restarted VS Code — it should now have access to the PAT"

#### 4.2 Behavior Specification
The user explicitly negotiates the AI's autonomy or operating boundaries — ranging from
fine-grained execution control to open-ended authority delegation. This includes role/persona
assignment, response style rules, scope restrictions, conditional workflows, human-in-the-loop
checkpoints, and expressions of trust or complaint about AI over-reach.
- **Key signal:** role/persona assignment, "only", "don't", "ask me before", "wait for me",
  "use your best judgement", "you decide", "stop if", "just analyze, don't change"
- **Example:** "You are a senior TypeScript engineer." / "Always respond in JSON." /
  "Stop and ask me if you hit an error." / "Wait for me to test before continuing." /
  "Only analyze, don't modify the code." / "Use your best judgement here."

> **Boundary notes:**
> - Context + specific command -> assign both 4.x and the command's label.
> - Error log pasted in a debugging context (with or without a question) -> 2.1 only, not 4.1.

---

### 5. Validation
The user asks the AI to evaluate or inspect code, logic, or execution output.
The primary intent is checking correctness or behavior, not producing new artifacts.
> **Focus:** "Checking whether something is correct or working."

#### 5.1 Code Review
The user asks the AI to inspect code, logic, or configuration without running it —
reviewing for correctness, quality, security, or style. This also includes asking the AI
to verify its own just-produced code.
- **Key signal:** "review", "check this code", "audit", "any issues with", "is this correct",
  "does this implementation right", "verify this"
- **Example:** "Review my changes for security risks." / "Is this implementation correct?" /
  "Check what you just wrote — does the logic look right?"

#### 5.2 Runtime Inspection
The user asks the AI to validate behavior by examining runtime output, test
results, or live execution state — checking whether something worked after running.
- **Key signal:** "look at the terminal", "check the output", "did the tests pass",
  "see what happened", "check if it ran correctly"
- **Example:** "Look at the terminal and tell me if it worked." /
  "Check the test output and confirm everything passed."

> **Boundary notes:**
> - Inspect results only -> 5.2. If the user also explicitly asks to run/execute -> add 6.2.
> - Code explanation / "how does this work" -> 3.2. If the user also asks to check correctness or quality -> add 5.1.

---

### 6. Delegation
The user instructs the AI to produce written artifacts or take direct action in the
development environment. The primary intent is delegation of generative or operational tasks.
> **Focus:** "Directing the AI to produce artifacts or take actions."

#### 6.1 Documentation
The user asks the AI to produce written artifacts — including classical docs, diagrams,
release notes, and comments, but also AI-coding-specific forms such as planning docs,
progress logs, implementation records, and instruction files intended to carry context
into future AI sessions.
- **Key signal:** explicit request to produce or write to a file — a `.md` filename,
  "write a doc", "create a diagram", "add to README", "generate release notes",
  "write a prompt for the next chat", "log/write/record the changes"
- **Example:** "Give me release notes for v0.2.1." /
  "Summarize the functions in text_splitter.py and save to explanation.md." /
  "Write down the changes in PROGRESS.md." / "Create a TODO.md for the next steps."
- **Boundary:** Documentation paired with planning intent (e.g., writing a plan into a
  file) -> 6.1 + 3.1. Documentation paired with comprehension intent (e.g., explaining
  existing code into a doc) -> 6.1 + 3.2. Documentation alongside code authoring or
  execution -> 6.1 + the relevant 1.x or 6.2 label.

#### 6.2 Toolchain Operation
The user instructs the AI to take action in the development environment — triggering
execution, running commands, or operating any part of the broader toolchain. This pattern
extends across environment setup, script execution, version control, file system operations,
project deployment, and other tool-mediated actions.
- **Key signal:** "run", "execute", "start", "deploy", "install", "build", "npm run",
  "commit", "push", "merge", "revert", "create a branch", "stash", "move", "delete file"
- **Example:** "Run the tests." / "Deploy to staging." / "Execute the script." /
  "Generate the commit message and push." / "Revert to the previous state."
- **Boundary:** If execution is requested AND the user also asks to inspect the results
  -> assign both 6.2 + 5.2.

---

### 7. Workflow Control
Conversational or structural messages that manage the pace, direction, or state
of the session, including ultra-terse task triggers that rely entirely on session context.
No new technical content, task, or question is introduced beyond what is needed to
keep the workflow moving.
> **Focus:** "Managing the session flow, not the code."

#### 7.1 Confirmation
The user signals that the AI's output was correct or that a step succeeded — a bare
affirmation with no added content. Praise or emotional reaction goes to 7.5.
- **Key signal:** "yes", "ok", "correct", "works", "that's right", "looks good"
- **Example:** "Yes." / "Ok." / "Correct." / "Works." / "npm ran successfully."

#### 7.2 Continuation
The user instructs the AI to proceed, continue, or move to the next step without
introducing new requirements or confirming a success state.
- **Key signal:** "continue", "go", "next", "proceed", "keep going", "move on",
  "try again", "retry"
- **Example:** "Continue." / "GO." / "Try again." / "Move on to the next step."
- **Negative example (-> 6.2):** "Run the tests." has a clear operational target,
  the continuation intent is fully subsumed by the explicit action.

#### 7.3 Deferred Debugging
A terse command to fix an unspecified problem, relying entirely on session context
to define what is broken. No error log, symptom description, or reference to a
prior fix. Under 10 words with no substantive diagnostic content.
- **Key signal:** ultra-short fix command with no context
- **Example:** "Fix it." / "Please fix." / "Just fix this."
- **Negative example (-> 2.2):** "Fix the auth error." <- Names a specific error;
  classify as 2.2.

#### 7.4 Deferred Implementation
A terse instruction to implement or continue implementing, relying entirely on
prior session context to define what needs to be done. Under 10 words with no
substantive specification.
- **Key signal:** action-only short imperative with no new content
- **Example:** "Please make the edit." / "Just do it." / "Implement the required function."
- **Negative example (-> 1.2):** "Make it red." <- Contains a specific constraint;
  classify as 1.2.

#### 7.5 Sentiment Expression
The user expresses an emotional or social state — anger, gratitude, confusion,
frustration, excitement, or a greeting — with no action or evaluative intent.
- **Key signal:** emotional language, expletives, greetings, expressions of confusion
  or surprise, with no command or question attached
- **Example:** "Hello." / "Thanks!" / "傻叉" / "jesus fuck" / "are you ok?" /
  "wait, what?" / "Wow, this is way more complex than I thought."

> **Boundary notes:**
> - 7.1 vs 7.5: 7.1 is a bare factual confirmation ("yes", "works"). Praise or emotional reaction ("Amazing!", "Great job!") -> 7.5.
> - Pure greeting with no other intent -> 7.5 alone.
> - If another label (especially 1.3 Alignment Correction or 2.3 Error Persistence)
>   contains strong emotional expression (anger, frustration, elation) -> assign
>   both that label and 7.5. If the sentiment is mild or incidental, omit 7.5.
> - Confirmation + bare continuation ("Ok, move on.") -> 7.1 alone (continuation is subsumed).

---

### 8. Others

#### 8.1 Others
Messages that do not fit any category above — including empty messages, gibberish,
accidental inputs, off-topic content entirely disconnected from the coding session
(e.g., pure science, history, or linguistics with no work connection), and content
unclassifiable even with context. If a message has any recoverable intent, apply
the appropriate category instead.

---

## Output Format

**Focus on the current message for classification.** Previous user and
assistant messages are used only as context.

Assign one or more labels. Output a JSON object with a `labels` array, ordered by dominance.

Single label:
{
  "labels": [
    {
      "reasoning": "One or two sentences explaining the primary intent and why
                    this category was chosen over alternatives.",
      "main_category": "X. Category Name",
      "sub_category": "X.Y Sub-category Name"
    }
  ]
}

Multiple labels:
{
  "labels": [
    {
      "reasoning": "Primary intent explanation.",
      "main_category": "X. Category Name",
      "sub_category": "X.Y Sub-category Name"
    },
    {
      "reasoning": "Secondary intent explanation.",
      "main_category": "Y. Category Name",
      "sub_category": "Y.Z Sub-category Name"
    }
  ]
}