Experimental comparison of agents querying heterogeneous MCP APIs versus agents querying NLWeb websites #390
Aaron9812
started this conversation in
Show and tell
Replies: 1 comment
-
|
Absolutely fascinating. Would love to chat with you about this.
Could you email me at ***@***.***?
…On Thu, Dec 4, 2025, 8:27 AM Aaron Steiner ***@***.***> wrote:
Hi all,
we have performed an experimental comparison of agents that perform
shopping tasks on websites that expose product offers via heterogeneous MCP
APIs and on the other hand websites that expose product offers via NLWeb
and represent product descriptions using schema.org terms.
We benchmark the agents in a local testbed environment that consists of
four e-shops. The agents are allowed to ask as many queries as they want
against each e-shop and decide autonomously when they have gathered enough
information for the task at hand. More information about our experimental
setup if found at https://wbsg-uni-mannheim.github.io/WebMall-Interfaces/
Averaged over all models that we test (GPT5, GPT5-mini, GPT4.1, Sonnet 4),
the NLWeb agent showed a slightly better task completion rate and F1 score
than the MCP API agent. Interestingly the NLWeb agent consumed way less
Input Tokens compared to the MPC agent.
Interface CR P R F1 Input Tokens Output Tokens Cost ($) Runtime (s)
MCP Agent 0.62 0.73 0.81 0.75 119,710 1,915 $0.25 57
NLWeb Agent 0.63 0.75 0.81 0.76 55,657 2,183 $0.08 49
The detailed results of our experiments are found here:
https://wbsg-uni-mannheim.github.io/WebMall-Interfaces/#results
We were surprised that the heterogeneity of the JSON formats returned by
the different shops via their MCP endpoints did only reduce the task
completion rate and F1 score by 1 percentage point compared to the NLWeb
endpoints that all return the same nicely formatted schema.org data.
Below, we provide examples of product descriptions from the MCP and NLWeb
responses which illustrate the structural differences.
------------------------------
Heterogeneous Answers by MCP Endpoints Agent query:
"tool_arguments": {
"query": "Canon EOS R5 Mark II",
"per_page": 20,
"page": 1,
"include_descriptions": true
}
Example Product Description from MCP Answer of Shop 1:
{
"ID": "1963",
"label": "EOS R5 Mark II Body",
"desc": {
"longVersion": "<p>The EOS R5 Mark II is a worthy successor to the incredibly […]</p>",
"quickPitch": "<p>The EOS R5 Mark II […]</p>"
},
"stock": {
"itemCode": "1963",
"status": "In stock",
"leftOverCount": 0
},
"labels": {
"categories": ["Canon"]
},
"priceInfo": {
"current": "1500.0",
"usual": ""
},
"addresses": {
"selfLink": "https://webmall-1.de/mark-ii-body/",
"shareLink": "https://webmall-1.de/r5-mark-ii-body/"
}
}
Example Product Description from MCP Answer of Shop 2:
{
"catalog_entry_id": "3526",
"merchandise_title": "Canon EOS R5 II",
"financial_details": {
"cost_amount": "1544.54",
"standard_rate": ""
},
"content_sections": {
"detailed_info": "<p>Full Frame Stacked BSI CMOS [...] </p>",
"visual_assets": ["https://webmall-2.de/wp-content/uploads/0-scaled.jpg"]
},
"inventory_tracking": {
"product_identifier": "3526",
"availability_state": "On the shelf"
},
"classification_tags": ["Cameras"],
"direct_link": "https://webmall-2.informatik.uni-mannheim.de/product/canon-eos-r5-ii/"
}
------------------------------
Example Product Descriptions from NLWeb Endpoint using schema.org Terms Agent
query:
"mcp_server": "E-Store Athletes","tool_arguments": { "query": "Canon EOS R5 Mark II", "top_k": 10 }
Example Product Description from NLWeb Answer of Shop 1:
{
***@***.***": "https://schema.org/",
***@***.***": "Product",
"name": "EOS R5 Mark II Body",
"image": ["https://webmall-1.de/wp-content/uploads/2025/05/3119.jpg"],
"description": "<p>The EOS R5 Mark […].</p>",
"category": "Canon",
"offers": {
***@***.***": "Offer",
"url": "https://webmall-1.de/product/eos-r5-mark-ii-body/",
"priceCurrency": "EUR",
"price": "1500.0",
"availability": "https://schema.org/InStock",
"itemCondition": "https://schema.org/NewCondition"
},
"url": "https://webmall-1.de/product/eos-r5-mark-ii-body/",
"identifier": {
***@***.***": "PropertyValue",
"name": "WooCommerce ID",
"value": "1963"
}
}
Example Product Description from NLWeb Answer of Shop 2:
{
***@***.***": "https://schema.org/",
***@***.***": "Product",
"name": "Canon EOS R5 II",
"image": ["https://webmall-2.de/wp-content/uploads/2025/05/3120.jpg"],
"description": "<p>Pixel Intelligent AF, 60 Raw/4K 60 SRAW/4K 120 10-Bit […]</p>",
"category": "Cameras",
"offers": {
***@***.***": "Offer",
"url": "https://webmall-2.de/product/canon-eos-r5-ii/",
"priceCurrency": "EUR",
"price": "1544.54",
"availability": "https://schema.org/InStock",
"itemCondition": "https://schema.org/NewCondition"
},
"url": "https://webmall-2.de/product/canon-eos-r5-ii/",
"identifier": {
***@***.***": "PropertyValue",
"name": "WooCommerce ID",
"value": "3526"
}
}
We would be very happy to hear your opinion about our results as well as
on our NLWeb response format.
Do you have any suggestions how we could widen the performance gap between
our agent working against the MCP APIs and the agent using the NLWeb
endpoints?
Kind regards,
Aaron Steiner, Ralph Peters and Chris Bizer
—
Reply to this email directly, view it on GitHub
<#390>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABICKCSCNBI6GMVXJ3Z2GVL4ABOHLAVCNFSM6AAAAACOBZ5ASGVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZZGIYDQNRZGQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
we have performed an experimental comparison of agents that perform shopping tasks on websites that expose product offers via heterogeneous MCP APIs and on the other hand websites that expose product offers via NLWeb and represent product descriptions using schema.org terms.
We benchmark the agents in a local testbed environment that consists of four e-shops. The agents are allowed to ask as many queries as they want against each e-shop and decide autonomously when they have gathered enough information for the task at hand. More information about our experimental setup if found at https://wbsg-uni-mannheim.github.io/WebMall-Interfaces/
Averaged over all models that we test (GPT5, GPT5-mini, GPT4.1, Sonnet 4), the NLWeb agent showed a slightly better task completion rate and F1 score than the MCP API agent. Interestingly the NLWeb agent consumed way less Input Tokens compared to the MPC agent.
The detailed results of our experiments are found here: https://wbsg-uni-mannheim.github.io/WebMall-Interfaces/#results
We were surprised that the heterogeneity of the JSON formats returned by the different shops via their MCP endpoints did only reduce the task completion rate and F1 score by 1 percentage point compared to the NLWeb endpoints that all return the same nicely formatted schema.org data. Below, we provide examples of product descriptions from the MCP and NLWeb responses which illustrate the structural differences.
Heterogeneous Answers by MCP Endpoints
Agent query:
Example Product Description from MCP Answer of Shop 1:
{ "ID": "1963", "label": "EOS R5 Mark II Body", "desc": { "longVersion": "<p>The EOS R5 Mark II is a worthy successor to the incredibly […]</p>", "quickPitch": "<p>The EOS R5 Mark II […]</p>" }, "stock": { "itemCode": "1963", "status": "In stock", "leftOverCount": 0 }, "labels": { "categories": ["Canon"] }, "priceInfo": { "current": "1500.0", "usual": "" }, "addresses": { "selfLink": "https://webmall-1.de/mark-ii-body/", "shareLink": "https://webmall-1.de/r5-mark-ii-body/" } }Example Product Description from MCP Answer of Shop 2:
{ "catalog_entry_id": "3526", "merchandise_title": "Canon EOS R5 II", "financial_details": { "cost_amount": "1544.54", "standard_rate": "" }, "content_sections": { "detailed_info": "<p>Full Frame Stacked BSI CMOS [...] </p>", "visual_assets": ["https://webmall-2.de/wp-content/uploads/0-scaled.jpg"] }, "inventory_tracking": { "product_identifier": "3526", "availability_state": "On the shelf" }, "classification_tags": ["Cameras"], "direct_link": "https://webmall-2.informatik.uni-mannheim.de/product/canon-eos-r5-ii/" }Example Product Descriptions from NLWeb Endpoint using schema.org Terms
Agent query:
Example Product Description from NLWeb Answer of Shop 1:
{ "@context": "https://schema.org/", "@type": "Product", "name": "EOS R5 Mark II Body", "image": ["https://webmall-1.de/wp-content/uploads/2025/05/3119.jpg"], "description": "<p>The EOS R5 Mark […].</p>", "category": "Canon", "offers": { "@type": "Offer", "url": "https://webmall-1.de/product/eos-r5-mark-ii-body/", "priceCurrency": "EUR", "price": "1500.0", "availability": "https://schema.org/InStock", "itemCondition": "https://schema.org/NewCondition" }, "url": "https://webmall-1.de/product/eos-r5-mark-ii-body/", "identifier": { "@type": "PropertyValue", "name": "WooCommerce ID", "value": "1963" } }Example Product Description from NLWeb Answer of Shop 2:
{ "@context": "https://schema.org/", "@type": "Product", "name": "Canon EOS R5 II", "image": ["https://webmall-2.de/wp-content/uploads/2025/05/3120.jpg"], "description": "<p>Pixel Intelligent AF, 60 Raw/4K 60 SRAW/4K 120 10-Bit […]</p>", "category": "Cameras", "offers": { "@type": "Offer", "url": "https://webmall-2.de/product/canon-eos-r5-ii/", "priceCurrency": "EUR", "price": "1544.54", "availability": "https://schema.org/InStock", "itemCondition": "https://schema.org/NewCondition" }, "url": "https://webmall-2.de/product/canon-eos-r5-ii/", "identifier": { "@type": "PropertyValue", "name": "WooCommerce ID", "value": "3526" } }We would be very happy to hear your opinion about our results as well as on our NLWeb response format.
Do you have any suggestions how we could widen the performance gap between our agent working against the MCP APIs and the agent using the NLWeb endpoints?
Kind regards,
Aaron Steiner, Ralph Peters and Chris Bizer
Beta Was this translation helpful? Give feedback.
All reactions