LLM-first vision toolkit for GUI grounding, OCR, and more. Built with LangChain and Qwen3-VL. Outputs minimal plain text optimized for token efficiency.
Choose your setup method:
| Method | Use Case |
|---|---|
| Skills Setup | Claude Code, DeepAgents, or other agent frameworks (recommended) |
| CLI Installation | Standalone command-line usage |
Skills are self-contained and auto-download the binary on first run. No Python or dependencies required.
| Skill | Description |
|---|---|
computer-use |
GUI grounding for desktop screenshots |
mobile-use |
GUI grounding for mobile screenshots |
ocr |
Text extraction from screenshots |
Choose ONE backend option:
Option A: Ollama Cloud (recommended, no local setup)
- Create an Ollama account at ollama.com
- Go to ollama.com/settings/keys
- Click "Create new key" and copy the API key
You'll use these settings:
LOOKIT_API_KEY=your-api-key-here
LOOKIT_MODEL=qwen3-vl:235b-cloud
LOOKIT_BASE_URL=https://ollama.com/v1
Option B: Ollama Local (requires local setup)
- Install Ollama: ollama.com/download
- Pull the model:
ollama pull qwen3-vl
- Start Ollama (runs automatically after install, or run
ollama serve)
You'll use these settings:
LOOKIT_API_KEY=ollama
LOOKIT_MODEL=qwen3-vl
LOOKIT_BASE_URL=http://localhost:11434/v1
Option C: LM Studio (local GUI app)
- Download LM Studio: lmstudio.ai
- Search and download a Qwen3-VL model (e.g.,
qwen/qwen3-vl-8b) - Start the local server (Server tab → Start Server)
You'll use these settings (model name uses owner/model format):
LOOKIT_API_KEY=lmstudio
LOOKIT_MODEL=qwen/qwen3-vl-8b
LOOKIT_BASE_URL=http://127.0.0.1:1234/v1
Claude Code
Step 1: Download skills
# Set version to download
VERSION="0.1.1"
# Create skill directories
mkdir -p ~/.claude/skills/{computer-use,mobile-use,ocr}/{bin,config}
# Download computer-use skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/SKILL.md" -o ~/.claude/skills/computer-use/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/bin/lookit" -o ~/.claude/skills/computer-use/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/config/lookit.env.example" -o ~/.claude/skills/computer-use/config/lookit.env.example
chmod +x ~/.claude/skills/computer-use/bin/lookit
# Download mobile-use skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/SKILL.md" -o ~/.claude/skills/mobile-use/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/bin/lookit" -o ~/.claude/skills/mobile-use/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/config/lookit.env.example" -o ~/.claude/skills/mobile-use/config/lookit.env.example
chmod +x ~/.claude/skills/mobile-use/bin/lookit
# Download ocr skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/SKILL.md" -o ~/.claude/skills/ocr/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/bin/lookit" -o ~/.claude/skills/ocr/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/config/lookit.env.example" -o ~/.claude/skills/ocr/config/lookit.env.example
chmod +x ~/.claude/skills/ocr/bin/lookit
# Create config files from examples
cp ~/.claude/skills/computer-use/config/lookit.env.example ~/.claude/skills/computer-use/config/lookit.env
cp ~/.claude/skills/mobile-use/config/lookit.env.example ~/.claude/skills/mobile-use/config/lookit.env
cp ~/.claude/skills/ocr/config/lookit.env.example ~/.claude/skills/ocr/config/lookit.envStep 2: Configure API settings
Edit each config file with your API settings from the Prerequisites section:
# Edit each config (use any text editor: nano, vim, code, etc.)
nano ~/.claude/skills/computer-use/config/lookit.env
nano ~/.claude/skills/mobile-use/config/lookit.env
nano ~/.claude/skills/ocr/config/lookit.envExample config for Ollama Cloud:
LOOKIT_API_KEY=your-api-key-here
LOOKIT_MODEL=qwen3-vl:235b-cloud
LOOKIT_BASE_URL=https://ollama.com/v1Step 3: Verify setup
# Test the skill (downloads binary on first run)
~/.claude/skills/computer-use/bin/lookit --helpDeepAgents CLI
DeepAgents is an agent framework built on LangChain and LangGraph.
Step 1: Download skills
# Install deepagents CLI
pip install deepagents-cli
# Set version to download
VERSION="0.1.1"
# Create skill directories
mkdir -p ~/.deepagents/default/skills/{computer-use,mobile-use,ocr}/{bin,config}
# Download computer-use skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/SKILL.md" -o ~/.deepagents/default/skills/computer-use/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/bin/lookit" -o ~/.deepagents/default/skills/computer-use/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/config/lookit.env.example" -o ~/.deepagents/default/skills/computer-use/config/lookit.env.example
chmod +x ~/.deepagents/default/skills/computer-use/bin/lookit
# Download mobile-use skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/SKILL.md" -o ~/.deepagents/default/skills/mobile-use/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/bin/lookit" -o ~/.deepagents/default/skills/mobile-use/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/config/lookit.env.example" -o ~/.deepagents/default/skills/mobile-use/config/lookit.env.example
chmod +x ~/.deepagents/default/skills/mobile-use/bin/lookit
# Download ocr skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/SKILL.md" -o ~/.deepagents/default/skills/ocr/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/bin/lookit" -o ~/.deepagents/default/skills/ocr/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/config/lookit.env.example" -o ~/.deepagents/default/skills/ocr/config/lookit.env.example
chmod +x ~/.deepagents/default/skills/ocr/bin/lookit
# Create config files from examples
cp ~/.deepagents/default/skills/computer-use/config/lookit.env.example ~/.deepagents/default/skills/computer-use/config/lookit.env
cp ~/.deepagents/default/skills/mobile-use/config/lookit.env.example ~/.deepagents/default/skills/mobile-use/config/lookit.env
cp ~/.deepagents/default/skills/ocr/config/lookit.env.example ~/.deepagents/default/skills/ocr/config/lookit.envStep 2: Configure API settings
Edit each config file with your API settings from the Prerequisites section:
# Edit each config (use any text editor: nano, vim, code, etc.)
nano ~/.deepagents/default/skills/computer-use/config/lookit.env
nano ~/.deepagents/default/skills/mobile-use/config/lookit.env
nano ~/.deepagents/default/skills/ocr/config/lookit.envExample config for Ollama Cloud:
LOOKIT_API_KEY=your-api-key-here
LOOKIT_MODEL=qwen3-vl:235b-cloud
LOOKIT_BASE_URL=https://ollama.com/v1Step 3: Verify setup
# Test the skill (downloads binary on first run)
~/.deepagents/default/skills/computer-use/bin/lookit --help
# Verify skills are detected
deepagents skills listProgrammatic Integration
For integrating skills into your own LangChain agents, see deepagents PR #611 (WIP).
# Install in your project
pip install lookit
# Create .env file with your API settings from Prerequisites section
cat << 'EOF' > .env
LOOKIT_API_KEY=your-api-key-here
LOOKIT_MODEL=qwen3-vl:235b-cloud
LOOKIT_BASE_URL=https://ollama.com/v1
EOFfrom deepagents import create_deep_agent
from deepagents.backends.filesystem import FilesystemBackend
from deepagents.middleware import SkillsMiddleware
# Create backend and skills middleware
backend = FilesystemBackend()
skills_middleware = SkillsMiddleware(
backend=backend,
registries=[
{"path": "/skills/user/", "name": "user"},
{"path": "/skills/project/", "name": "project"},
],
)
# Create agent with skills middleware
agent = create_deep_agent(
model="openai:gpt-4o",
middleware=[skills_middleware],
)
# Agent will automatically discover and use lookit skills
result = agent.invoke({
"messages": [{"role": "user", "content": "Click the submit button in screenshot.png"}]
})For standalone command-line usage (requires Python). First complete the Prerequisites to get your API settings.
Step 1: Install
pip install lookitStep 2: Configure
Add to your shell profile (~/.zshrc on macOS, ~/.bashrc on Linux):
For Ollama Cloud (recommended, see Prerequisites for API key):
export LOOKIT_API_KEY="your-api-key-here"
export LOOKIT_MODEL="qwen3-vl:235b-cloud"
export LOOKIT_BASE_URL="https://ollama.com/v1"For Ollama Local (see Prerequisites for setup):
export LOOKIT_API_KEY="ollama"
export LOOKIT_MODEL="qwen3-vl"
export LOOKIT_BASE_URL="http://localhost:11434/v1"Then reload: source ~/.zshrc
Step 3: Verify
lookit --helpSame screenshot, different modes and prompts = different results:
left_click 960,324
type "hello world"
swipe 500,800 to 500,200
key Control+c
scroll -100
Returns extracted text directly.
| Argument | Description |
|---|---|
query |
Natural language instruction |
-s, --screenshot |
Path to screenshot (required) |
-m, --mode |
computer, mobile, or ocr (required) |
--debug |
Debug mode (for humans): print info to stderr, save annotated image |
left_click, right_click, double_click, type, key, scroll, mouse_move
click, long_press, swipe, type, system_button
MIT

