llm-order-eval is a benchmark study comparing the performance of 11 large language models (LLMs) in parsing natural-language shopping-cart commands into structured JSON. This evaluation explores how prompt design and model architecture influence task performance in structured information extraction.
To evaluate LLMs on their ability to convert free-text shopping-cart instructions into a structured JSON format with three fields:
{
"action": "add" | "remove",
"product": "string",
"quantity": integer (default = 1)
}