Skip to content

Commit 5ab9dbb

Browse files
committed
Format Chris Lovejoy talk and add to talks index
- Added proper YAML front matter and structured sections - Added talk entry to Chapter 3: Production and Monitoring - Ran Prettier formatting for consistency
1 parent db3399b commit 5ab9dbb

File tree

2 files changed

+267
-0
lines changed

2 files changed

+267
-0
lines changed
Lines changed: 264 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,264 @@
1+
---
2+
title: "Domain Experts: The Lever for Vertical AI"
3+
speaker: "Chris Lovejoy (Anterior)"
4+
cohort: 3
5+
description: "How to successfully apply LLMs in specialized industries by building domain‑expert review loops, augmenting prompts with expert knowledge, and earning customer trust."
6+
tags: [vertical AI, domain experts, evaluation, prompting, trust, security]
7+
---
8+
9+
# Domain Experts: The Lever for Vertical AI — Chris Lovejoy (Anterior)
10+
11+
I hosted a session featuring Chris Lovejoy, Head of Clinical AI at Anterior, who shared valuable insights from his experience building AI agents for specialized industries. Chris brings a unique perspective as a former medical doctor who transitioned to AI, working across healthcare, education, recruiting, and retail sectors.
12+
13+
## Why is it so hard to successfully apply LLMs to specialized industries?
14+
15+
Chris identified two fundamental challenges when implementing AI in vertical industries:
16+
17+
- The "last mile problem" — taking powerful models and making them work in specific customer contexts. The challenge isn't reasoning quality anymore; today's models fail because they lack context on how workflows are performed in specific industries.
18+
- Difficulty defining "good" and "correct" — in specialized fields, determining what constitutes a good output requires domain expertise that most AI engineers don't possess.
19+
20+
As Chris explained, "If your data looks like a legal document and you're not a lawyer, you wouldn't know whether it's right or wrong." This highlights why domain experts are essential for vertical AI applications.
21+
22+
## How can domain experts supercharge AI development?
23+
24+
Chris recommends implementing a systematic review process with domain experts that creates a powerful feedback loop:
25+
26+
1. Your production application generates AI outputs
27+
2. Domain experts review these outputs to provide:
28+
- Performance metrics (accuracy scores)
29+
- Categorized failure modes
30+
- Suggested improvements
31+
- Input–output pairs for training
32+
3. This expert‑generated data enables you to prioritize work based on which failure modes most impact your key metrics. You can create ready‑made failure mode datasets grouped by error type, allowing engineers to test improvements against specific problems.
33+
34+
The most effective process is a continuous loop where:
35+
36+
1. Production generates outputs
37+
2. Domain experts review and categorize issues
38+
3. A domain expert PM prioritizes which failure modes to fix
39+
4. Engineers test changes against failure mode datasets
40+
5. The PM approves changes to go live
41+
42+
**Key Takeaway:** Domain experts aren't just helpful — they're essential for vertical AI applications. Building systems to capture their insights creates a data‑driven improvement cycle that addresses the specific needs of specialized industries.
43+
44+
## What's the best way to implement domain expert reviews?
45+
46+
While many companies start with simple tools like Google Sheets or raw traces in monitoring tools, Chris strongly recommends building a custom review dashboard. This approach gives you maximum freedom over how you present information and makes it easier to integrate review data into your production systems.
47+
48+
A well‑designed review UI should optimize for:
49+
50+
- High‑quality reviews (prioritize quality over quantity)
51+
- Minimizing time spent by reviewers
52+
- Generating actionable data
53+
54+
When building a custom UI, follow these principles:
55+
56+
- Clearly surface all required context (separate wider context from specific details)
57+
- Optimize the review flow sequence (bake your ideal review process into the UI)
58+
- Minimize friction (reduce scrolling, clicking, and manual work)
59+
60+
Chris has found that spending time shadowing domain experts as they review outputs helps you understand their natural workflows, which you can then incorporate into your UI design.
61+
62+
## Should I use prompting or fine‑tuning for vertical AI applications?
63+
64+
Chris believes better prompting beats fine‑tuning for improving performance in most vertical applications. This might seem counterintuitive, but he offers several reasons:
65+
66+
- Today's out‑of‑the‑box models already have strong domain‑specific reasoning capabilities
67+
- Fine‑tuning adds complexity (managing multiple models for different tasks and customers)
68+
- Fine‑tuned models can degrade as workflows evolve over time
69+
70+
While fine‑tuning has benefits for speed, cost, and scalability, Chris recommends starting with prompting for performance improvements.
71+
72+
However, he doesn't mean just tweaking prompt text — that approach can be brittle and model‑dependent. Instead, he focuses on two more powerful techniques:
73+
74+
1. Use context augmentation — inject domain expert‑generated knowledge into your prompts at inference time. For example, in finance, you might include the specific definition of "high net worth individual" used by a particular bank.
75+
2. Leverage input–output pair examples for in‑context learning — include examples where the model initially made mistakes, along with the corrections from domain experts.
76+
77+
The most powerful approach is to go beyond static prompting by implementing dynamic retrieval of this knowledge. Your domain expert UI generates knowledge that lives in a domain knowledge base, which you can then retrieve in real‑time during inference using techniques like:
78+
79+
- Keyword matching
80+
- Semantic similarity
81+
- Recency weighting
82+
- Diversity optimization
83+
84+
**Key Takeaway:** Rather than fine‑tuning models, focus on dynamically augmenting prompts with domain‑specific knowledge and examples. This approach is more adaptable to changing workflows and customer needs.
85+
86+
## How do I build and maintain customer trust?
87+
88+
For verticalized AI agents, Chris identifies three key factors for customer trust:
89+
90+
- Confidence in the AI's performance
91+
- Confidence in secure data handling
92+
- Protection against LLM‑specific attack vectors
93+
94+
To build confidence in AI performance, implement these practices:
95+
96+
- Review production outputs and generate performance metrics
97+
- Report these metrics to customers regularly (biweekly or monthly)
98+
- Define a sampling strategy as you scale (based on uncertainty, outliers, or stratified sampling)
99+
- Establish an internal response protocol for performance issues
100+
- Consider using LLMs as judges to help scale monitoring
101+
102+
Chris shared a case study from a healthcare application where they couldn't manually review all AI outputs — they could only review about 5%. They implemented an LLM judge system that:
103+
104+
- Scored the confidence of each AI output in real‑time
105+
- Prioritized which cases humans should review
106+
- Provided performance metrics for all outputs
107+
- Created a feedback loop where human reviews improved the judge system
108+
109+
For data handling, Chris recommends:
110+
111+
- Mapping out your data usage strategy early (during contract negotiations)
112+
- Being ready to offer isolated, single‑tenant environments
113+
- Investing in synthetic data generation for testing and training
114+
115+
For LLM‑specific security, stay vigilant about:
116+
117+
- Prompt injections (using input filtering and validation)
118+
- Sensitive information disclosure (through data sanitization)
119+
- Data and model poisoning (tracking data origins and using version control)
120+
121+
**Key Takeaway:** Building trust requires transparent performance monitoring, secure data handling, and proactive security measures. Implement systems that give customers visibility into AI performance while protecting their sensitive information.
122+
123+
## Which domain experts should you hire and how should you use them?
124+
125+
Chris strongly recommends hiring a principal domain expert who has ultimate responsibility for your AI's performance. This approach has several advantages:
126+
127+
- Organizational clarity (having a DRI helps you move faster)
128+
- Avoiding consensus by committee
129+
- Building deep intuition about your AI system's performance
130+
131+
This person should be hired early and given ownership to shape your product. Beyond reviewing outputs, they can:
132+
133+
- Help hire and manage additional reviewers
134+
- Define sampling strategies
135+
- Analyze review data
136+
- Monitor reviewer performance
137+
- Steer product development
138+
- Talk to customers
139+
- Improve AI performance
140+
141+
When hiring this principal domain expert, look for someone who meets a critical domain expertise threshold but also has additional skills like:
142+
143+
- Management/leadership experience
144+
- Statistical knowledge
145+
- Product development experience
146+
- Technical understanding
147+
148+
At Anterior, Chris played this role with his medical background, which allowed him to review outputs, hire additional reviewers from his healthcare network, define sampling strategies using his data science background, and contribute to AI implementations with his software engineering skills.
149+
150+
**Key Takeaway:** A principal domain expert who combines industry knowledge with leadership and technical skills can dramatically accelerate your ability to build effective vertical AI applications. This person becomes the bridge between domain expertise and technical implementation.
151+
152+
## How do you handle citations and references in AI‑generated content?
153+
154+
For citation handling, Chris mentioned they've used both custom approaches and vendor solutions like Anthropic's citation API. One effective technique they've implemented is:
155+
156+
1. Chunk content with semantic similarity‑based methods
157+
2. Assign unique IDs to each chunk
158+
3. Prompt the model to first identify relevant evidence chunks by their UIDs
159+
4. Have the model reason using those specific chunks
160+
5. Make the final decision with clear references to the source material
161+
162+
This sequential approach ensures the model explicitly references the information it used, making the reasoning process more transparent and traceable.
163+
164+
## What question are teams not asking themselves when building these systems?
165+
166+
When I asked Chris what question teams aren't asking themselves, he identified: "What does your system look like for incorporating domain expertise?"
167+
168+
He noted many teams focus on using the latest, most sophisticated models while neglecting the process for systematically incorporating domain knowledge. The real challenge isn't just implementing the model — it's creating a flywheel that continuously improves your system based on expert feedback.
169+
170+
Unlike traditional software where you build features and move on, AI systems require ongoing optimization to push the probability distribution in the right direction. Building this improvement flywheel is essential for successful vertical AI applications.
171+
172+
**Key Takeaway:** Don't treat AI applications like traditional software that you build once and maintain. Instead, design systems that continuously incorporate domain expertise to systematically improve performance over time.
173+
174+
---
175+
176+
## FAQs
177+
178+
## What makes applying LLMs to specialized industries so challenging?
179+
180+
The primary challenge lies in the "last mile problem" — taking powerful models and making them work effectively in specific industry contexts. While today's models have strong reasoning capabilities, they often lack the contextual understanding of how specific workflows operate in specialized industries. Additionally, defining what constitutes "good" or "correct" output in these domains requires domain expertise that general AI practitioners may not possess.
181+
182+
## How can domain experts improve AI development in vertical industries?
183+
184+
Domain experts are invaluable when your AI is processing specialized data that general practitioners can't properly evaluate. By implementing a structured review process with domain experts, you can gather four critical types of information:
185+
186+
- Performance metrics that show how well your AI is performing
187+
- Specific failure modes that categorize where and how your AI makes mistakes
188+
- Suggested improvements based on domain knowledge
189+
- High‑quality input–output pairs for training and evaluation
190+
191+
This information enables you to prioritize improvements, create targeted datasets for testing, and implement a continuous improvement cycle that addresses real‑world usage patterns.
192+
193+
## What's the most effective way to support domain experts in reviewing AI outputs?
194+
195+
While you can start with basic tools like spreadsheets or third‑party evaluation platforms, building a custom review dashboard is the highest‑leverage investment you can make. A well‑designed custom UI should:
196+
197+
- Clearly surface all required context without overwhelming reviewers
198+
- Optimize the review flow sequence to match how experts naturally evaluate information
199+
- Minimize friction by reducing unnecessary scrolling and clicking
200+
- Generate actionable data that can be directly integrated into your production system
201+
202+
The goal is to balance high‑quality reviews with reviewer efficiency while generating data that can immediately improve your system.
203+
204+
## Is fine‑tuning or prompting better for verticalized AI agents?
205+
206+
For most vertical applications, advanced prompting techniques are more effective than fine‑tuning. Today's out‑of‑the‑box models already have strong domain‑specific reasoning capabilities, and the typical issues in vertical applications relate more to context than reasoning ability. Fine‑tuning adds complexity through model management and can become outdated as workflows evolve.
207+
208+
That said, fine‑tuning may still be valuable for optimizing speed, reducing costs, or improving scalability in certain scenarios.
209+
210+
## What prompting techniques work best for vertical applications?
211+
212+
Rather than just tweaking prompt wording (which can be brittle and model‑dependent), two more robust techniques are particularly effective:
213+
214+
- Context augmentation: Injecting domain‑specific knowledge into the prompt context at inference time, such as customer‑specific definitions or guidelines
215+
- Input–output pair examples: Using few‑shot prompting with real examples to demonstrate the desired behavior
216+
217+
These approaches can be implemented dynamically, retrieving the most relevant context based on the specific query and customer needs.
218+
219+
## How can you build and maintain customer trust in vertical AI applications?
220+
221+
Building trust requires addressing three key areas:
222+
223+
- Performance confidence: Conduct regular reviews of production outputs, report metrics to customers, implement strategic sampling as you scale, and establish clear response protocols for performance issues
224+
- Data security: Map out your data usage strategy early (ideally during contracting), be prepared to offer isolated environments for security‑conscious customers, and consider investing in synthetic data generation
225+
- Security against LLM‑specific threats: Implement protections against prompt injection, prevent sensitive information disclosure, and stay current with evolving best practices in LLM security
226+
227+
Transparent communication about these measures helps customers understand your commitment to reliability and security.
228+
229+
## What role should domain experts play in your organization?
230+
231+
Consider hiring a principal domain expert who serves as the direct responsible individual (DRI) for AI performance. This person should:
232+
233+
- Define what constitutes "good" or "correct" output
234+
- Build intuition about how your AI system performs and how it can be improved
235+
- Help hire and manage additional reviewers as you scale
236+
- Define sampling strategies for efficient review
237+
- Analyze review data and monitor reviewer performance
238+
- Steer product development and prioritize engineering work
239+
- Communicate with customers about performance and improvements
240+
241+
The ideal candidate combines domain expertise with additional skills like management experience, statistical knowledge, and product development capabilities.
242+
243+
## How can you create an effective continuous improvement cycle?
244+
245+
Implement a structured loop where:
246+
247+
1. Your production application generates AI outputs
248+
2. Domain experts review a strategic sample of those outputs
249+
3. Reviews produce performance metrics and categorized failure modes
250+
4. Product managers prioritize which failure modes to address
251+
5. Engineers make changes and test against failure mode datasets
252+
6. Changes are validated and deployed to production
253+
254+
This cycle allows you to systematically improve performance based on real‑world usage patterns and domain‑specific requirements.
255+
256+
## How can you scale performance monitoring as your application grows?
257+
258+
As you scale and can no longer review all outputs, implement a strategic sampling approach based on:
259+
260+
- Uncertainty levels in the AI's responses
261+
- Outlier detection to identify unusual cases
262+
- Stratified sampling across different characteristics
263+
264+
You can also implement an LLM‑as‑judge system that evaluates outputs in real‑time, providing confidence scores that help prioritize which cases need human review and potentially taking automated actions for low‑confidence outputs.

docs/talks/index.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,9 @@ Trellis framework for managing AI systems with millions of users. Critical disco
4242
**[RAG Anti-patterns in the Wild](rag-antipatterns-skylar-payne.md)** - Skylar Payne
4343
90% of teams adding complexity to RAG systems see worse performance when properly evaluated. Major discovery: silent failures in document processing can eliminate 20%+ of corpus without detection. Golden rule: teams who iterate fastest on data examination consistently outperform those focused on algorithmic sophistication.
4444

45+
**[Domain Experts: The Lever for Vertical AI](chris-lovejoy-domain-expert-vertical-ai.md)** - Chris Lovejoy (Anterior)
46+
How to make LLMs work in specialized industries: build domain‑expert review loops that generate failure‑mode datasets, prioritize fixes by impact, and dynamically augment prompts with expert knowledge. Trust requires transparent production metrics, secure data handling, and defenses against LLM‑specific threats.
47+
4548
### Chapter 4: Query Analysis and Data Organization
4649

4750
Understanding user queries and routing them effectively.

0 commit comments

Comments
 (0)