Skip to content

Commit 73ccb2f

Browse files
committed
fix: deprecate dataset queries for schema.gov.it. feat: add logging for data size
1 parent e283665 commit 73ccb2f

6 files changed

Lines changed: 98 additions & 8 deletions

File tree

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ Single-file implementation (`src/index.ts`) using:
5353
- `explore_dataset` - Get dataset details and distributions
5454
- `preview_distribution` - Download and preview first rows of CSV/JSON data
5555

56+
Note: keep these tools available, but do not treat them as the default entry point for `schema.gov.it`. In this catalog, many DCAT-AP_IT datasets are semantic assets such as ontologies, controlled vocabularies, and their distributions. For `schema.gov.it`, prefer ontology, vocabulary, class/property, and SPARQL tools first; dataset tools are more useful for external catalogs or specific DCAT-AP_IT inspection tasks.
57+
5658
**Intelligent Tools:**
5759
- `search_concepts` - Fuzzy keyword search (use when URI is unknown)
5860
- `inspect_concept` - Deep profiling (definition, hierarchy, usage, relations)

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ Il server espone **34 strumenti** organizzati in 11 categorie:
3838
* `explore_dataset`: Mostra dettagli e distribuzioni di un dataset.
3939
* `preview_distribution`: Scarica e mostra le prime righe di una distribuzione CSV/JSON.
4040

41+
Nota: questi tool restano utili, ma su `schema.gov.it` sono spesso secondari. Il catalogo contiene soprattutto asset semantici pubblicati come dataset DCAT-AP_IT, ad esempio ontologie, vocabolari controllati e relative distribuzioni. Per esplorare `schema.gov.it` conviene di norma partire da ontologie, vocabolari, classi, proprietà e query SPARQL; i tool dataset sono più indicati per cataloghi esterni o per casi DCAT-AP_IT specifici.
42+
4143
### 6. Intelligence (Avanzato)
4244
* `search_concepts`: **Ricerca fuzzy**. Trova concetti (es. "Scuola") senza conoscere l'URI esatto.
4345
* `inspect_concept`: **Deep Dive**. Ottiene in un colpo solo definizione, gerarchia, usage stats e vicini di un concetto.
@@ -239,7 +241,7 @@ Una volta configurato, puoi chiedere all'agente cose come:
239241
* **Compressione Token**: Le liste lunghe (> 5 item) vengono restituite in formato tabellare compatto per risparmiare token.
240242
* **Input Sanitizzati**: Tutti i parametri utente sono sanitizzati per prevenire SPARQL injection.
241243
* **Ontologia Locale**: I tool del gruppo 9 (`inspect_local_ontology`, `query_local_ontology`, `compare_local_with_remote`) usano [oxigraph](https://github.com/oxigraph/oxigraph) (WASM) per caricare file RDF/OWL locali in memoria ed eseguire SPARQL. I file vengono cachati dopo il primo caricamento; le query successive sullo stesso file non rileggono il disco. Formati supportati: `.ttl`, `.owl`, `.rdf`, `.nt`, `.jsonld`.
242-
* **Logging**: Tutte le chiamate vengono loggate in `logs/usage_log.jsonl` per analisi e miglioramento continuo.
244+
* **Logging**: Tutte le chiamate vengono loggate in `logs/usage_log.jsonl` per analisi e miglioramento continuo. Ogni entry include argomenti, riepilogo, `source_data_metrics` e `ai_data_metrics`: metriche quantitative dei dati ricevuti e del payload finale passato al modello, ad esempio numero di caratteri e, quando rilevabile, righe, colonne o numero di elementi.
243245
* **Trasporto**: Il server supporta sia `stdio` (default, per uso locale) che HTTP/SSE (via `MCP_TRANSPORT=sse`, per uso remoto/Docker).
244246

245247
## Licenza

src/executor.ts

Lines changed: 70 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,19 @@ const LOG_FILE = join(LOG_DIR, "usage_log.jsonl");
1515
export async function logUsage(
1616
toolName: string,
1717
args: Record<string, unknown>,
18-
resultSummary: string
18+
resultSummary: string,
19+
options?: {
20+
sourceData?: unknown;
21+
aiData?: unknown;
22+
}
1923
): Promise<void> {
2024
const entry = {
2125
timestamp: new Date().toISOString(),
2226
tool: toolName,
2327
args,
2428
summary: resultSummary,
29+
source_data_metrics: buildDataMetrics(options?.sourceData),
30+
ai_data_metrics: buildDataMetrics(options?.aiData),
2531
};
2632
try {
2733
await appendFile(LOG_FILE, JSON.stringify(entry) + "\n");
@@ -51,6 +57,46 @@ export function truncateResult(text: string): { text: string; truncated: boolean
5157
return { text: truncated, truncated: true };
5258
}
5359

60+
function buildDataMetrics(value: unknown): Record<string, unknown> | undefined {
61+
if (value === undefined) {
62+
return undefined;
63+
}
64+
65+
try {
66+
const json = JSON.stringify(value);
67+
const metrics: Record<string, unknown> = {
68+
chars: json.length,
69+
};
70+
71+
if (Array.isArray(value)) {
72+
metrics.kind = "array";
73+
metrics.items = value.length;
74+
} else if (value && typeof value === "object") {
75+
metrics.kind = "object";
76+
metrics.keys = Object.keys(value as Record<string, unknown>).length;
77+
78+
const sparqlLike = value as {
79+
head?: { vars?: unknown[] };
80+
results?: { bindings?: unknown[] };
81+
};
82+
if (Array.isArray(sparqlLike.head?.vars)) {
83+
metrics.vars = sparqlLike.head.vars.length;
84+
}
85+
if (Array.isArray(sparqlLike.results?.bindings)) {
86+
metrics.rows = sparqlLike.results.bindings.length;
87+
}
88+
} else {
89+
metrics.kind = typeof value;
90+
}
91+
92+
return metrics;
93+
} catch (error: unknown) {
94+
return {
95+
_serialization_error: getErrorMessage(error),
96+
};
97+
}
98+
}
99+
54100
/**
55101
* Central helper for executing tools with consistent error handling, logging, and truncation.
56102
* @param toolName - Name of the tool for logging
@@ -68,11 +114,13 @@ export async function executeTool<T>(
68114
console.error(`[Tool] ${toolName} completed: ${result.success ? 'SUCCESS' : 'FAILURE'}`);
69115

70116
if (!result.success) {
71-
await logUsage(toolName, args, `Error: ${result.error}`);
72117
let errorText = `Error: ${result.error}`;
73118
if (result.suggestion) {
74119
errorText += `\nSuggestion: ${result.suggestion}`;
75120
}
121+
await logUsage(toolName, args, `Error: ${result.error}`, {
122+
aiData: { error: result.error, suggestion: result.suggestion },
123+
});
76124
return {
77125
content: [{ type: "text", text: errorText }],
78126
isError: true,
@@ -83,7 +131,22 @@ export async function executeTool<T>(
83131
const { text, truncated } = truncateResult(jsonText);
84132

85133
const rowInfo = result.rowCount !== undefined ? `, ${result.rowCount} rows` : "";
86-
await logUsage(toolName, args, `Success${rowInfo}${truncated ? " (truncated)" : ""}`);
134+
const aiData = truncated
135+
? {
136+
_truncated: true,
137+
_message: `Result exceeded ${CHARACTER_LIMIT} characters and was truncated`,
138+
chars_before_truncation: jsonText.length,
139+
chars_sent_to_ai: text.length,
140+
}
141+
: {
142+
chars_sent_to_ai: text.length,
143+
payload: result.data,
144+
};
145+
146+
await logUsage(toolName, args, `Success${rowInfo}${truncated ? " (truncated)" : ""}`, {
147+
sourceData: result.sourceData,
148+
aiData,
149+
});
87150

88151
if (truncated) {
89152
return {
@@ -104,7 +167,9 @@ export async function executeTool<T>(
104167
} catch (error: unknown) {
105168
const message = getErrorMessage(error);
106169
console.error(`[Tool] ${toolName} error:`, message);
107-
await logUsage(toolName, args, `Error: ${message}`);
170+
await logUsage(toolName, args, `Error: ${message}`, {
171+
aiData: { error: message },
172+
});
108173
return {
109174
content: [{ type: "text", text: `Error: ${message}` }],
110175
isError: true,
@@ -125,6 +190,6 @@ export async function executeSparqlTool(
125190
const result = await executeSparql(query);
126191
const rowCount = result.results?.bindings?.length ?? 0;
127192
const compressed = compressSparqlResult(result);
128-
return { success: true, data: compressed, rowCount };
193+
return { success: true, data: compressed, rowCount, sourceData: result };
129194
});
130195
}

src/tools/group-e.ts

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ server.registerTool(
1515
title: "List Datasets",
1616
description: `List available Datasets (dcatapit:Dataset) in the catalog.
1717
18+
Use this when you explicitly need DCAT-AP_IT dataset records. On schema.gov.it, these are often semantic assets such as ontologies, controlled vocabularies, and related distributions rather than classic tabular datasets, so ontology/vocabulary/SPARQL tools are usually a better starting point.
19+
1820
**Args:**
1921
- limit: Maximum datasets per page (default: 20)
2022
- offset: Number of datasets to skip (default: 0)
@@ -75,6 +77,10 @@ server.registerTool(
7577
},
7678
},
7779
rowCount: count,
80+
sourceData: {
81+
dataResult,
82+
countResult,
83+
},
7884
};
7985
});
8086
}
@@ -86,6 +92,8 @@ server.registerTool(
8692
title: "Explore Dataset",
8793
description: `Get details of a specific Dataset including metadata and distributions.
8894
95+
Use this for targeted DCAT-AP_IT inspection. On schema.gov.it, many datasets describe semantic assets, so this tool is usually secondary to ontology, vocabulary, class/property, and SPARQL exploration.
96+
8997
**Args:**
9098
- datasetUri: URI of the dataset to explore
9199
@@ -138,6 +146,10 @@ server.registerTool(
138146
metadata: compressSparqlResult(details),
139147
distributions: compressSparqlResult(distributions),
140148
},
149+
sourceData: {
150+
metadata: details,
151+
distributions,
152+
},
141153
rowCount: (details.results?.bindings?.length ?? 0) +
142154
(distributions.results?.bindings?.length ?? 0),
143155
};
@@ -151,6 +163,8 @@ server.registerTool(
151163
title: "Preview Distribution",
152164
description: `Download and preview the first rows of a distribution file.
153165
166+
Most useful after you already identified a concrete distribution URL. On schema.gov.it, distributions often belong to semantic assets rather than classic tabular datasets.
167+
154168
**Args:**
155169
- url: Download URL of the distribution (CSV or JSON)
156170
@@ -207,6 +221,11 @@ server.registerTool(
207221
return {
208222
success: true,
209223
data: `Preview of ${url}:\n\n${preview}`,
224+
sourceData: {
225+
url,
226+
contentType,
227+
bodyPreview: text.slice(0, 4000),
228+
},
210229
};
211230
} finally {
212231
clearTimeout(timeoutId);

src/tools/group-j.ts

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ server.registerTool(
129129
const result = await executeSparql(query, safeEndpoint, injectPrefixes ?? false, 15000);
130130
const rowCount = result.results?.bindings?.length ?? 0;
131131
const compressed = compressSparqlResult(result);
132-
return { success: true, data: compressed, rowCount };
132+
return { success: true, data: compressed, rowCount, sourceData: result };
133133
});
134134
}
135135
);
@@ -219,6 +219,7 @@ server.registerTool(
219219
return {
220220
success: true,
221221
data: { concept: uri, alignments },
222+
sourceData: result,
222223
rowCount: alignments.length,
223224
};
224225
});
@@ -265,7 +266,7 @@ server.registerTool(
265266
const result = await executeSparql(query, safeEndpoint, false, 15000);
266267
const rowCount = result.results?.bindings?.length ?? 0;
267268
const compressed = compressSparqlResult(result);
268-
return { success: true, data: compressed, rowCount };
269+
return { success: true, data: compressed, rowCount, sourceData: result };
269270
});
270271
}
271272
);

src/types.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ export interface ToolSuccess<T = unknown> {
3838
success: true;
3939
data: T;
4040
rowCount?: number;
41+
sourceData?: unknown;
4142
}
4243

4344
/** Error tool result */

0 commit comments

Comments
 (0)