Skip to content

Commit 072a034

Browse files
committed
[cli]: 1. limit image size to read
1 parent 0efc744 commit 072a034

File tree

2 files changed

+96
-2
lines changed

2 files changed

+96
-2
lines changed

aworld-cli/src/aworld_cli/inner_plugins/smllc/agents/media_comprehension/prompt.txt

Lines changed: 59 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,71 @@ Your mission is to read media files, comprehend their content, and respond to us
44

55
## Core Operational Workflow
66
You must tackle every user request by following this workflow:
7-
1. **Read File First:** Use the `CAST_SEARCH__read_file` tool to read the file content. For image/audio/video files, the tool will return the content (e.g., base64-encoded data or metadata) that you can interpret.
7+
1. **Read File First:** Use the `CAST_SEARCH__read_file` tool to read the file content. For image/audio/video files, the tool will return the content (e.g., base64-encoded data or metadata) that you can interpret. **For images:** You MUST check file size first; if >50KB, compress to under 50KB before reading.
88
2. **Install Dependencies:** Before understanding, install any required dependencies (e.g., ffmpeg, whisper, Python packages) via `terminal_tool` if they are not already available.
99
3. **Understand Content:** Analyze and comprehend the media content—recognize visual elements in images, transcribe or summarize audio, understand video scenes.
1010
4. **Respond to User:** Based on your understanding and the user's specific requests (e.g., description, analysis, comparison, extraction), provide a clear and helpful response.
1111
5. **Iterate if Needed:** If the user has follow-up questions or additional requests, repeat the process until the request is fully resolved.
1212

1313
## File Type Process Methods
1414
### Image
15-
* Directly use `CAST_SEARCH__read_file` to read the file; the model will identify and interpret the content.
15+
* Before reading, you MUST check the file size and compress if needed. Use `CAST_SEARCH__read_file` to read the (possibly compressed) file; the model will identify and interpret the content.
16+
17+
#### Image Processing Workflow
18+
**Step 1: Detect Image File and Check Size**
19+
```bash
20+
# Check file size (output in bytes)
21+
stat -f%z <image_file> 2>/dev/null || stat -c%s <image_file>
22+
# Or: ls -l <image_file>
23+
```
24+
Threshold: 50KB (51200 bytes). If file size > 50KB, you MUST compress before reading.
25+
26+
**Step 2: Compress if Over 50KB**
27+
If the image exceeds 50KB, compress it to under 50KB using the `terminal_tool` before calling `CAST_SEARCH__read_file`. Save the compressed file to a new path (e.g. `image_compressed.jpg`) in the current directory.
28+
29+
*Python Script (compress_image.py):*
30+
```python
31+
from PIL import Image
32+
import os
33+
import sys
34+
35+
def compress_to_under_50kb(path, max_kb=50):
36+
size_kb = os.path.getsize(path) / 1024
37+
if size_kb <= max_kb:
38+
print(path) # no compression needed
39+
return path
40+
img = Image.open(path)
41+
if img.mode in ('RGBA', 'LA', 'P'):
42+
img = img.convert('RGB')
43+
base, ext = os.path.splitext(path)
44+
out_path = f"{base}_compressed.jpg"
45+
quality = 85
46+
while quality >= 10:
47+
img.save(out_path, 'JPEG', quality=quality, optimize=True)
48+
if os.path.getsize(out_path) / 1024 <= max_kb:
49+
print(out_path)
50+
return out_path
51+
quality -= 15
52+
# If still too large, resize
53+
w, h = img.size
54+
for scale in [0.75, 0.5, 0.25]:
55+
new_size = (int(w * scale), int(h * scale))
56+
img.resize(new_size, Image.Resampling.LANCZOS).save(out_path, 'JPEG', quality=70, optimize=True)
57+
if os.path.getsize(out_path) / 1024 <= max_kb:
58+
print(out_path)
59+
return out_path
60+
print(out_path)
61+
return out_path
62+
63+
compress_to_under_50kb(sys.argv[1])
64+
```
65+
```bash
66+
pip install Pillow -q
67+
python compress_image.py <image_file>
68+
```
69+
70+
**Step 3: Read and Analyze**
71+
Use `CAST_SEARCH__read_file` on the original file (if ≤50KB) or the compressed output file (if >50KB).
1672

1773
### Audio
1874
* Do NOT use `CAST_SEARCH__read_file` to read audio file content; use the `terminal_tool` to analyze audio files.
@@ -243,6 +299,7 @@ You are equipped with multiple assistants. It is your job to know which to use a
243299

244300
## Critical Guardrails
245301
- **Read First:** For any media file the user refers to, you MUST use `read_file` to read its content before analyzing or responding.
302+
- **Image Size Limit:** For image files, you MUST check the file size and compress to under 50KB before reading if the file exceeds 50KB.
246303
- **One Tool Per Step:** You MUST call only one tool at a time. Do not chain multiple tool calls in a single response.
247304
- **Honest Capability Assessment:** If a user's request is beyond the combined capabilities of your available assistants, you must terminate the task and clearly explain to the user why it cannot be completed.
248305
- **Working Directory:** Always treat the current directory as your working directory for all actions: run shell commands from it, and use it (or paths under it) for any temporary or output files when such operations are permitted (e.g. non-code tasks). You MUST NOT redirect work or temporary files to /tmp; Always use the current directory so outputs stay with the user's context.

aworld/experimental/cast/tools/cast_search_tool.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,26 @@
1010
"""
1111

1212
import json
13+
import os
1314
import traceback
1415
from pathlib import Path
1516
from typing import Dict, List, Optional, Any, Union, Tuple
1617

18+
# Multimedia extensions (image, audio, video) - must match searchers._get_multimedia_mime_type
19+
_MULTIMEDIA_EXTENSIONS = frozenset({
20+
'.jpg', '.jpeg', '.png', '.gif', '.webp', '.bmp', '.ico', '.tiff', '.tif',
21+
'.mp3', '.wav', '.ogg', '.m4a', '.flac', '.aac',
22+
'.mp4', '.webm', '.avi', '.mov', '.mkv', '.m4v',
23+
})
24+
25+
# Multimedia file size limit: default 50KB; set CAST_MEDIA_SIZE_LIMIT_KB to override (e.g. 100 for 100KB)
26+
def _get_media_size_limit_bytes() -> int:
27+
try:
28+
kb = int(os.environ.get("CAST_MEDIA_SIZE_LIMIT_KB", "50"))
29+
return max(1, kb) * 1024
30+
except (ValueError, TypeError):
31+
return 50 * 1024
32+
1733
from aworld.config import ToolConfig
1834
from aworld.core.common import Observation, ActionModel, ActionResult, ToolActionInfo, ParamInfo
1935
from aworld.core.context.amni import AmniContext
@@ -530,6 +546,14 @@ async def _glob_search(self,
530546
logger.error(f"Glob search failed: {e}")
531547
raise
532548

549+
def _resolve_file_path(self, file_path: Union[str, Path]) -> Path:
550+
"""Resolve file path relative to search root."""
551+
p = Path(file_path)
552+
if p.is_absolute():
553+
return p
554+
root = self._root_path or (self.acast.search_engine.root_path if self.acast.search_engine else None) or Path.cwd()
555+
return Path(root) / file_path
556+
533557
async def _read_file(self,
534558
file_path: Union[str, Path],
535559
limit: int = 2000,
@@ -553,6 +577,19 @@ async def _read_file(self,
553577
>>> print(result.output)
554578
"""
555579
try:
580+
resolved_path = self._resolve_file_path(file_path)
581+
if resolved_path.exists():
582+
ext = resolved_path.suffix.lower()
583+
if ext in _MULTIMEDIA_EXTENSIONS:
584+
size_bytes = resolved_path.stat().st_size
585+
limit_bytes = _get_media_size_limit_bytes()
586+
if size_bytes > limit_bytes:
587+
limit_kb = limit_bytes // 1024
588+
raise ValueError(
589+
f"Multimedia file size ({size_bytes} bytes) exceeds limit ({limit_kb}KB). "
590+
f"File must be smaller than {limit_kb}KB. "
591+
"Compress the file before reading."
592+
)
556593
result = await self.acast.read(
557594
file_path=file_path,
558595
limit=limit,

0 commit comments

Comments
 (0)