Skip to content

Structured output support #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Structured output support #40

wants to merge 3 commits into from

Conversation

dhicks
Copy link

@dhicks dhicks commented Mar 23, 2025

This PR provides a minimal implementation of the approach to structured outputs I sketched in #39. Here's a not-quite-minimal working example of using structured outputs:

library(mall)

library(rjson) # For parsing structured JSON output

library(dplyr) # For efficiently wrangling parsed JSON columns
library(tidyr)
library(purrr)

# Define a JSON schema as a list to constrain a model's output
format <- list(
  type = "object",
  properties = list(
    name = list(type = "string"),
    capital = list(type = "string"),
    languages = list(type = "array",
                     items = list(type = "string")
    )
  ),
  required = list("name", "capital", "languages")
)

# Set up the model, passing `output` and `format`
llm_use('ollama', 'llama3.2', 
        output = 'structured', 
        format = format, 
        seed = 2025-03-23)

# Vectorized version can be piped directly into JSON parser
llm_vec_custom('Canada', 'tell me about the following country') |> 
  fromJSON()
# $name
# [1] "Canada"
# 
# $capital
# [1] "Ottawa"
# 
# $languages
# [1] "English"              "French"              
# [3] "indigenous languages"

# Data frame version requires more wrangling
dataf = data.frame(country = c('Canada', 'Mexico', 'United States'))

llm_custom(dataf, country, 'tell me about the following country') |>
  mutate(output = map(.pred, fromJSON)) |> 
  unnest_wider(output) |> 
  str()
# tibble [3 × 5] (S3: tbl_df/tbl/data.frame)
# $ country  : chr [1:3] "Canada" "Mexico" "United States"
# $ .pred    : chr [1:3] "{ \"name\": \"Canada\", \"capital\": \"Ottawa\", \"languages\": [\"English\",\"French\" , \"indigenous language"| __truncated__ "{\"name\": \"Mexico\", \"capital\": \"Mexico City\", \"languages\": [\"Spanish\", \"Maya\", \"Nahua\", \"Zapote"| __truncated__ "{\n\"name\": \"United States of America\",\n\"capital\": \"Washington D.C.\",\n\"languages\": [\"English\", \"S"| __truncated__
# $ name     : chr [1:3] "Canada" "Mexico" "United States of America"
# $ capital  : chr [1:3] "Ottawa" "Mexico City" "Washington D.C."
# $ languages:List of 3
# ..$ : chr [1:3] "English" "French" "indigenous languages"
# ..$ : chr [1:5] "Spanish" "Maya" "Nahua" "Zapotec" ...
# ..$ : chr [1:14] "English" "Spanish" "French" "Chinese" ...

Text outputs appear to be working as before. Four tests fail, three due to small differences in a code snapshot, eg, order of arguments:

  • Failure (test-llm-classify.R:38:3): Preview works
  • Failure (test-llm-use.R:28:3): Stops cache
  • Failure (test-llm-verify.R:36:3): Preview works

The fourth test appears to relate to the number of objects in Ollama's cache:

  • Failure (test-zzz-cache.R:3:3): Ollama cache exists and delete (actual is 61.0 vs. expected 59.0)

Since I'm not sure how these tests work, at this time I'm not going to update the snapshots or make other changes to the tests themselves. I'm also not adding a test for the structured output use case. It seems like it would be desirable to define functions llm_structured and llm_structured_vec specifically for vectored output, and then write tests for those.

@edgararuiz
Copy link
Collaborator

Hi @dhicks , thank you for this PR. I'm wondering if llm_use() is really the best way for this feature. Maybe it's something we should add to llm_custom() to start with, maybe something like this: llm_vec_custom('Canada', 'tell me about the following country', format = format). We can default format to NULL, so if you pass format then the output will be inferred to be a structured object. Thoughts?

@dhicks
Copy link
Author

dhicks commented May 14, 2025

I'm not tied to any particular implementation — this was the result of me playing around with the package to see if it would work for my needs — so llm_custom() would probably be fine.

It did occur to me a couple weeks ago, though, that structured outputs would provide a better way of controlling output throughout the whole package. IIRC currently invalid responses are detected after they're returned and converted to NAs. Structured outputs can be used to ensure valid responses on the model side, as in this script:

library(tidyverse)

library(readxl)
library(here)
library(tictoc)

library(mall)

source('/Users/danhicks/Google Drive/Coding/*ST text mining/R/load_coding.R')

data_dir = here('data')
out_dir = here('out')

## Load comment text and manual coding ----
dataf = load_manual_coding() |> 
    select(comment_id, support) |> 
    left_join({read_excel(here(out_dir, '06_docs_2024-01-16.xlsx')) |> 
            select(comment_id, text)}, 
            by = 'comment_id')

## Values are `oppose`, `support`, and `NA`
count(dataf, support)
# # A tibble: 3 × 2
# support     n
# <chr>   <int>
# 1 oppose    625
# 2 support   156
# 3 NA         33

## Set up LLM ----
schema = list(
    type = 'object', 
    properties = list(
        think = list(type = 'string'), 
        support = list(enum = c('support', 
                                'oppose', 
                                'unclear'))
    ),
    required = list('think', 'support')
)

prompt = 'This is a comment on Strengthening Transparency in Regulatory Science, a rule proposed by the Environmental Protection Agency in the first Trump Administration. The rule would have introduced a strong open data requirement at EPA. Classify the comment as supporting the rule (`support`) or opposing it (`oppose`), or as `unclear` if you are uncertain whether the comment supports or opposes the rule.'

llm_use('ollama', 'gemma3:12b', seed = 2025-03-26, .cache = '', 
        output = 'structured', 
        format = schema)


tic()
llm_vec_custom(dataf$text[1], prompt = prompt)
toc()

tic()
out_df = dataf |> 
    # slice(1:100) |>
    llm_custom(text, 
               prompt = prompt)
toc()

out_df |> 
    rename(gt = support) |> 
    mutate(output = map(.pred, rjson::fromJSON)) |> 
    unnest_wider(output) |> 
    count(gt, support)

## llama3.2 ----
# # A tibble: 8 × 3
# support .classify     n
# <chr>   <chr>     <int>
# 1 oppose  oppose      620
# 2 oppose  NA            5
# 3 support oppose      145
# 4 support support       7
# 5 support NA            4
# 6 NA      oppose       31
# 7 NA      support       1
# 8 NA      NA            1

(Sorry for just dumping that; it's finals week here and I need to finish my grading!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants