-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
Bug Description
I have a working Text-to-SQL setup using NLSQLTableQueryEngine with synthesize_response=False (intentionally disabled to return raw database rows and avoid extra summarization cost/latency).
This setup worked perfectly and used reasonable token counts (~20k tokens max, coming from my custom text_to_sql_prompt).
Previously working code (minimal relevant part)
# Create the base Text-to-SQL query engine
query_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
tables=tables_to_include,
llm=llm,
text_to_sql_prompt=prompt_template,
synthesize_response=False, # We need raw rows, not a summary
)This engine alone behaved correctly: generated SQL, executed it, returned raw results, token usage stayed within expected bounds.
What changed
I wanted to add safety by validating generated SQL against guidelines (e.g. correct PostgreSQL syntax, avoid dangerous patterns, and so on). I wrapped the engine like this:
from llama_index.core.evaluation import GuidelineEvaluator
from llama_index.core.query_engine import RetryGuidelineQueryEngine
# Create the base Text-to-SQL query engine
base_query_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
tables=tables_to_include,
llm=_llm,
text_to_sql_prompt=prompt_template,
synthesize_response=False,
)
# Create the GuidelineEvaluator using our LLM and guidelines from config
# This evaluator will check the generated SQL against rules semantically.
guideline_evaluator = GuidelineEvaluator(
llm=_llm,
guidelines=GUIDELINES
)
# Wrap the base engine with RetryGuidelineQueryEngine for retries
# This adds error recovery: If evaluation fails, it refines and retries.
retry_query_engine = RetryGuidelineQueryEngine(
query_engine=base_query_engine,
guideline_evaluator=guideline_evaluator,
resynthesize_query=True, # Enable query refinement using feedback
max_retries=1
)As soon as a retry is triggered (i.e. the evaluator rejects the first SQL), the next LLM call during the rewrite phase receives an extremely large prompt in my logs up to ~400,000 tokens, far exceeding my 20k-token custom prompt.
What appears to be happening (but I'm not 100% sure)
When synthesize_response=False:
- The
Responseobject returned byNLSQLTableQueryEnginecontains the entire raw result set (thousands of rows) as a large string in response.response. GuidelineEvaluatorseems to sendresponse.responseto the LLM for evaluation.- Since my guideline is about SQL syntax, I would expect only the generated SQL query to be evaluated.
- Instead, it looks like the evaluator is receiving and sending the full database result set (CSV-like text of all returned rows), causing massive token usage and hitting rate limits immediately.
Expected behavior
When a guideline is about the SQL query itself (syntax, safety, etc.), the evaluator should inspect only the generated SQL (available in response.source_nodes[0].text or similar), not the executed result data.
Or maybe I’m doing something wrong; in that case, I’d really appreciate a solution for implementing the validation of SQL queries generated by NLSQLTableQueryEngine.
Version
0.12.49
Steps to Reproduce
Minimal reproduction (copied from bug description)
from llama_index.core.evaluation import GuidelineEvaluator
from llama_index.core.query_engine import RetryGuidelineQueryEngine
# Create the base Text-to-SQL query engine
base_query_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
tables=tables_to_include,
llm=_llm,
text_to_sql_prompt=prompt_template,
synthesize_response=False,
)
# Create the GuidelineEvaluator using our LLM and guidelines from config
# This evaluator will check the generated SQL against rules semantically.
guideline_evaluator = GuidelineEvaluator(
llm=_llm,
guidelines=GUIDELINES
)
# Wrap the base engine with RetryGuidelineQueryEngine for retries
# This adds error recovery: If evaluation fails, it refines and retries.
retry_query_engine = RetryGuidelineQueryEngine(
query_engine=base_query_engine,
guideline_evaluator=guideline_evaluator,
resynthesize_query=True, # Enable query refinement using feedback
max_retries=1
)That is, adding GuidelineEvaluator and RetryGuidelineQueryEngine causes extremely large prompts when the result obtained from base_query_engine is rewritten by RetryGuidelineQueryEngine.