feat(core): support searchArea for extract-type methods to improve accuracy on small elements by Postroggy · Pull Request #2439 · web-infra-dev/midscene

Postroggy · 2026-05-07T03:35:02Z

Summary

When aiAssert / aiBoolean / aiQuery evaluate a full-screen screenshot, small target elements (e.g. a 40×40px icon) occupy too little of the image for the model to judge accurately.

Reproduce data:

Model	Image	Accuracy
doubao-seed-2.0	full 1280×800	10% (1/10)
doubao-seed-2.0	cropped top 12.5% (1280×100)	70% (7/10)

Same model, same prompt, same image — cropping to the target region alone raised accuracy from 10% to 70%.

This PR adds an optional searchArea field to ServiceExtractOption, letting extract-type methods (aiAssert, aiBoolean, aiQuery, aiNumber, aiString) crop the screenshot to a sub-region before sending it to the model. A companion locatePrompt field on AgentAssertOpt and MidsceneYamlFlowItemAIAssert lets users describe the target in natural language; aiLocate resolves it to a rect automatically. This is especially useful in YAML scripts where passing coordinates directly is error-prone — users only need to describe what they're looking for, and the model figures out where it is.

Motivation

Vision models struggle when the target element is small relative to the full screenshot. The issue is not the model's capability, but the signal-to-noise ratio: a 40×40px element on a 1280×800 screen takes up only ~0.15% of the image area, making subtle visual details (e.g. the direction an icon is facing) nearly impossible to judge reliably.

The fix is simple: crop the screenshot to the relevant region before sending it to the model. This PR makes that possible without any changes to prompts or the model-calling layer.

Changes

New field: `searchArea` on `ServiceExtractOption`

All extract-type methods (aiAssert, aiBoolean, aiQuery, aiNumber, aiString) accept an optional searchArea: Rect. When provided, the screenshot is cropped to that region before being sent to the model. When screenshotIncluded is false, the crop is skipped because there is no image to forward.

Scenario: you already know the coordinates (TypeScript)

// Assert on a small icon at a known position
await agent.aiAssert('眼睛朝向右侧', 'IP眼睛未朝右', {
  searchArea: { left: 580, top: 20, width: 120, height: 80 },
});

// Boolean check on a badge counter
await agent.aiBoolean('购物车角标数字大于 0', {
  searchArea: { left: 1100, top: 0, width: 180, height: 80 },
});

// Query within a specific panel
await agent.aiQuery('提取当前价格', {
  searchArea: { left: 0, top: 600, width: 400, height: 200 },
});

New field: `locatePrompt` on `AgentAssertOpt`

For cases where the exact rect is unknown, locatePrompt accepts a natural-language description. Internally, aiAssert calls aiLocate (which supports deepLocate for small or hard-to-find elements) to resolve the rect, then uses it as searchArea. This avoids the need for users to manually calculate screenshot-space coordinates.

Scenario: you don't know the coordinates, describe the target instead (TypeScript)

// aiLocate resolves the element rect automatically
await agent.aiAssert('眼睛朝向右侧', 'IP眼睛未朝右', {
  locatePrompt: '绿色 IP 吉祥物形象',
});

Scenario: YAML scripts — locatePrompt is the only practical option

In YAML there is no way to run aiLocate and pass its result to a subsequent step, and writing raw screenshot-space coordinates by hand is error-prone. locatePrompt solves both problems:

- aiAssert: 小 IP 形象的眼睛朝向右侧方向
  locatePrompt: 绿色 IP 吉祥物形象
  errorMessage: IP 眼睛未朝右转动

Implementation

File	Change
`packages/core/src/yaml.ts`	`ServiceExtractOption` adds `searchArea?: Rect`; `MidsceneYamlFlowItemAIAssert` adds `locatePrompt?: string`
`packages/core/src/types.ts`	`AgentAssertOpt` adds `locatePrompt?: string`
`packages/core/src/agent/agent.ts`	`aiAssert` resolves `locatePrompt` → `aiLocate` → `searchArea`; both are skipped when `screenshotIncluded` is `false`
`packages/core/src/ai-model/inspect.ts`	`AiExtractElementInfo` crops screenshot via existing `cropByRect` when `searchArea` is set and `screenshotIncluded !== false`

No breaking changes — all new fields are optional.

Validation

pnpm run lint
npx nx build @midscene/core
npx nx test @midscene/core — 825 passed

New test file packages/core/tests/unit-test/aiassert-search-area.test.ts covers:

No crop when searchArea is absent
cropByRect called with correct rect when searchArea is provided
No crop when screenshotIncluded: false even if searchArea is set
Cropped imageBase64 forwarded to model instead of original full screenshot

Add optional `region` (Rect) to `ServiceExtractOption` and `focusLocate` (string) to `AgentAssertOpt` / `MidsceneYamlFlowItemAIAssert`. When a region is provided the screenshot is cropped before being sent to the model, improving accuracy on small target elements from ~10% to ~70%. - `region` is forwarded through the extract pipeline and applied in `AiExtractElementInfo` via the existing `cropByRect` utility. - `focusLocate` triggers an `aiLocate` call first to derive the rect automatically from a natural-language description. - No breaking changes: all new options are optional. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions Bot added the change: feat label May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): support searchArea for extract-type methods to improve accuracy on small elements#2439

feat(core): support searchArea for extract-type methods to improve accuracy on small elements#2439
Postroggy wants to merge 1 commit into
web-infra-dev:mainfrom
Postroggy:feat/aiassert-region-focus

Postroggy commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Postroggy commented May 7, 2026

Summary

Motivation

Changes

New field: searchArea on ServiceExtractOption

New field: locatePrompt on AgentAssertOpt

Implementation

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New field: `searchArea` on `ServiceExtractOption`

New field: `locatePrompt` on `AgentAssertOpt`