Skip to content

A local browser automation agent based on Microsoft Fara-7B model optimized for LM Studio inference.

License

Notifications You must be signed in to change notification settings

pmbstyle/fara-agent

Repository files navigation

Fara Browser Automation Agent

A local browser automation agent based on Microsoft Fara-7B model optimized for LM Studio inference.

Run browser automation locally on a consumer-grade GPU with a variation of quantized models.

Features

  • ✅ 100% local AI browser agent
  • ✅ Quantized models support
  • ✅ Completely self-contained (no external dependencies)
  • ✅ Optimized for LM Studio
  • ✅ Browser automation via Playwright
image

Setup

1. Install Dependencies

pip install -r requirements.txt
playwright install firefox

2. Setup LM Studio

  1. Download and install LM Studio
  2. Download the Fara-7B model (GGUF format):
    • Search for: microsoft_fara-7b
    • Recommended: Q5_K_M quantization (6GB)
  3. Load the model in LM Studio
  4. Start the local server (default port: 1234)
  5. In model settings:
    • Context Length: 8192+
    • Temperature: 0.0
    • Top P: 0.9

3. Run the Agent

python run_agent.py --task "Go to wikipedia.org and search for cats" --headful

Optional debug flags (enabled by default in headful mode):

  • headful: displays a browser window
  • show_overlay: bottom-right HUD with latest model responses (hidden during screenshots)
  • show_click_markers: transient markers for clicks/hover/type coordinates (hidden during screenshots)

Configuration

Edit config.json to change:

  • Model endpoint (default: http://localhost:1234/v1)
  • Model name
  • Max rounds
  • Screenshot settings
  • Max images to keep in context (max_n_images, default 1)
  • Downloads folder for saving files
  • Debug overlay and click markers (show_overlay, show_click_markers)

How It Works

  1. Browser Control: Uses Playwright to control Firefox
  2. Vision: Takes screenshots and sends them to the model
  3. Actions: Model returns tool calls (click, type, scroll, navigate, hover, keypress, wait, memorize facts)
  4. Single-Image Mode: Only sends the latest screenshot to LM Studio (better compatibility)
  5. Loop Guard: Tracks scroll position and warns the model when it oscillates up/down

Limitations

  • Quantized models have reduced capabilities vs full model
  • LM Studio has issues with multiple images in the conversation history
  • Some complex tasks may cause loops (scrolling, navigation)

Troubleshooting

Browser not visible?

  • Make sure you're using --headful flag

Model not responding?

  • Check LM Studio server is running on port 1234
  • Verify model is loaded in LM Studio

Agent looping?

  • Try reducing the temperature in LM Studio to 0.0
  • Reduce max_rounds in config.json

License

MIT License - Based on Microsoft Fara-7B

About

A local browser automation agent based on Microsoft Fara-7B model optimized for LM Studio inference.

Topics

Resources

License

Stars

Watchers

Forks

Languages