goldeneye is a simple and growing unified interface for geospatial vision-language models. Run any supported geospatial VLM with just a few lines of code.
pip install goldeneyeimport goldeneye
# List available agents (models)
print(goldeneye.assets())
# Dispatch an agent for collecting intel
model = goldeneye.dispatch_agent("DescribeEarth")
report = model.recon("assets/sample.jpg", "Describe this image.")
print(report)
# Report(
# image='assets/sample.jpg',
#
# prompt='Describe this image.',
#
# response='The image depicts an aerial view of a
# residential area surrounded by dense greenery,
# likely trees and shrubs. The houses are
# scattered across the landscape, with varying
# sizes and designs, some featuring pitched roofs
# and others flat-roofed structures. The roads
# are visible as light-colored lines
# crisscrossing the area, connecting'
# )Click to expand model list (7 models)
- 3B models (GeoR1, ZoomEarth, DescribeEarth): ~6GB VRAM at fp16
- 4B models (EarthDial): ~8GB VRAM at fp16
- 7B models (GeoChat, GeoLLaVA): ~14GB VRAM at fp16
- 8B models (GeoZero): ~16GB VRAM at fp16
import torch
import goldeneye
from PIL import Image
from transformers import BitsAndBytesConfig
# Load a model (auto-detects device)
model = goldeneye.dispatch_agent("DescribeEarth")
# Or specify device/dtype
model = goldeneye.dispatch_agent("DescribeEarth", device="cuda", dtype=torch.bfloat16)
# Or use quantization for larger models
config = BitsAndBytesConfig(load_in_8bit=True)
model = goldeneye.dispatch_agent("GeoChat", quantization_config=config)
# Run inference with file path or PIL Image
report = model.recon("satellite_image.jpg", "Describe this image.")
report = model.recon(Image.open("satellite_image.jpg"), "Describe this image.", max_new_tokens=256)Click to expand dataset list (2 datasets)
| Dataset | Samples | Paper | HuggingFace |
|---|---|---|---|
| DE-Dataset | ~321k | DescribeEarth: A Global Vision-Language Dataset for Aerial and Satellite Image Captioning | earth-insights/DE-Dataset |
| XLRS-Bench-lite | ~2.8k | XLRS-Bench: A Benchmark for Cross-Lingual Visual Reasoning in Remote Sensing | initiacms/XLRS-Bench-lite |
from goldeneye.datasets import stream_de_dataset
# Stream samples from geospatial benchmarks
for sample in stream_de_dataset(split="train"):
report = model.recon(sample["image"], "Describe this satellite image.")
breakSee CONTRIBUTING.md for development setup and guidelines.

