Skip to content

GoldenEye is a library of geospatial vision-language models -- run any geospatial VLM in three lines of code

License

Notifications You must be signed in to change notification settings

isaaccorley/goldeneye

Repository files navigation

goldeneye logo

PyPI version Python 3.13+ License: MIT

goldeneye is a simple and growing unified interface for geospatial vision-language models. Run any supported geospatial VLM with just a few lines of code.

Installation

pip install goldeneye

Quick Start

import goldeneye

# List available agents (models)
print(goldeneye.assets())

# Dispatch an agent for collecting intel
model = goldeneye.dispatch_agent("DescribeEarth")
report = model.recon("assets/sample.jpg", "Describe this image.")
print(report)

# Report(
#    image='assets/sample.jpg',
#
#    prompt='Describe this image.',
#
#    response='The image depicts an aerial view of a
#    residential area surrounded by dense greenery,
#    likely trees and shrubs. The houses are
#    scattered across the landscape, with varying
#    sizes and designs, some featuring pitched roofs
#    and others flat-roofed structures. The roads
#    are visible as light-colored lines
#    crisscrossing the area, connecting'
# )

sample satellite image

Supported Models

Click to expand model list (7 models)
Model Size Paper Code
DescribeEarth 3B DescribeEarth: A Global Vision-Language Dataset for Aerial and Satellite Image Captioning github
ZoomEarth 3B ZoomEarth: A Unified Remote Sensing Framework for Multi-scale Vision-Language Tasks github
EarthDial 4B EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues github
GeoChat 7B GeoChat: Grounded Large Vision-Language Model for Remote Sensing github
GeoLLaVA-8K 7B GeoLLaVA-8K: A Large Vision-Language Model for High-Resolution Remote Sensing Applications github
GeoZero 8B GeoZero: Zero-shot Geospatial Reasoning with Multimodal LLMs github
Geo-R1 (8 variants) 3B Geo-R1: Unleashing the Power of Reinforcement Learning in Generalist Geospatial Foundation Model github

Memory Requirements

  • 3B models (GeoR1, ZoomEarth, DescribeEarth): ~6GB VRAM at fp16
  • 4B models (EarthDial): ~8GB VRAM at fp16
  • 7B models (GeoChat, GeoLLaVA): ~14GB VRAM at fp16
  • 8B models (GeoZero): ~16GB VRAM at fp16

Usage

import torch
import goldeneye
from PIL import Image
from transformers import BitsAndBytesConfig

# Load a model (auto-detects device)
model = goldeneye.dispatch_agent("DescribeEarth")

# Or specify device/dtype
model = goldeneye.dispatch_agent("DescribeEarth", device="cuda", dtype=torch.bfloat16)

# Or use quantization for larger models
config = BitsAndBytesConfig(load_in_8bit=True)
model = goldeneye.dispatch_agent("GeoChat", quantization_config=config)

# Run inference with file path or PIL Image
report = model.recon("satellite_image.jpg", "Describe this image.")
report = model.recon(Image.open("satellite_image.jpg"), "Describe this image.", max_new_tokens=256)

Benchmark Datasets

Click to expand dataset list (2 datasets)
Dataset Samples Paper HuggingFace
DE-Dataset ~321k DescribeEarth: A Global Vision-Language Dataset for Aerial and Satellite Image Captioning earth-insights/DE-Dataset
XLRS-Bench-lite ~2.8k XLRS-Bench: A Benchmark for Cross-Lingual Visual Reasoning in Remote Sensing initiacms/XLRS-Bench-lite
from goldeneye.datasets import stream_de_dataset

# Stream samples from geospatial benchmarks
for sample in stream_de_dataset(split="train"):
    report = model.recon(sample["image"], "Describe this satellite image.")
    break

Contributing

See CONTRIBUTING.md for development setup and guidelines.

About

GoldenEye is a library of geospatial vision-language models -- run any geospatial VLM in three lines of code

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages