Kathy sid/gpt5.5 grounded spatial reasoning#2694
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 38c2eb4baf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "id": "grounded-spatial-00", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Evaluating Grounded Spatial Reasoning with GPT-5.5" |
There was a problem hiding this comment.
Add the notebook to registry.yaml
This commit adds a new cookbook notebook, but registry.yaml is unchanged, and a repo-wide search for grounded_spatial_reasoning_layouts only finds this notebook. Because the static site is generated from registry.yaml, this page will not appear on cookbook.openai.com until it has a registry entry with title/path/date/tags/authors, which the repo instructions call out as required for new content.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
fixed in follow-up commit 6d59f1f; the branch now includes the required registry.yaml entry and authors.yaml metadata
There was a problem hiding this comment.
Pull request overview
Adds a new multimodal notebook describing an evaluation workflow for “grounded spatial reasoning” on office layout generation, using GPT-5.5 to produce a machine-checkable layout spec and separating deterministic validity checks from semantic/reference scoring.
Changes:
- Introduces a new notebook narrative covering task setup (floorplan + SOP + catalog), output spec, and a 3-part eval scheme.
- Includes embedded figures (from
examples/multimodal/images/) illustrating the workflow and selected hillclimb runs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "\n", | ||
| "The generation step separates the problem into three inputs: 1) visual evidence from the floorplan, 2) planning policy from the SOP, and 3) physical scale from the furniture catalog.\n", | ||
| "\n", | ||
| "The visible evidence is the empty floorplan image, `empty.png`. The model infers walls, doors, stairs, restrooms, partitions, openings, and usable room shapes from the image itself. It does not see the filled reference image or the gold JSON during generation; those artifacts are held back for evaluation.\n", |
| "</p>\n", | ||
| "<p align=\"center\"><em>Figure 1: The model sees only the empty floorplan during generation. The filled reference is reserved for evaluation and qualitative comparison.</em></p>\n", | ||
| "\n", | ||
| "The reusable SOP in `constraints.json` defines the planning policy. It describes the functional spaces the layout should include, the anchor furniture required for those spaces, and the spatial rules the plan should try to satisfy.\n", |
| "source": [ | ||
| "# Evaluating Grounded Spatial Reasoning with GPT-5.5" | ||
| ] |
| "source": [ | ||
| "## Eval setup\n", | ||
| "\n", | ||
| "Generation and evaluation are handled as separate stages. GPT-5.5 first produces a candidate layout spec `raw_layout.json` candidate. Promptfoo then runs the eval pass over that saved spec. The config uses an echo provider because the layout has already been generated; Promptfoo is not creating the plan, only coordinating checks over the candidate JSON.\n", |
Summary
Briefly describe the changes and the goal of this PR. Make sure the PR title summarizes the changes effectively.
Motivation
Why are these changes necessary? How do they improve the cookbook?
For new content
When contributing new content, read through our contribution guidelines, and mark the following action items as completed:
We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.