All six MLflow env vars are required for OpenShift MLflow:
CREWAI_WEBSEARCH_AGENT_URL=https://<route> \
MLFLOW_TRACKING_URI=<uri> \
MLFLOW_EXPERIMENT_NAME=<experiment> \
MLFLOW_TRACKING_TOKEN=$(oc whoami -t) \
MLFLOW_WORKSPACE=<namespace> \
MLFLOW_TRACKING_INSECURE_TLS=true \
pytest agents/crewai/websearch_agent/tests/behavioral/ -m crewai_websearch -vCrewAI's multi-step ReAct loop makes multiple sequential LLM calls per user request (agent reasoning, tool call, observation, final answer). After the tool-use loop, CrewAI makes one final llm.call() to produce the answer (crewai/utilities/agent_utils.py:291). If the model returns an empty completion on any of these internal calls, CrewAI raises a hard ValueError("Invalid response from LLM call - None or empty.") with no retry.
The other agents in this repo are not affected:
- LangGraph uses LangChain's chat model, which has more robust response parsing and retry logic.
- Vanilla Python (OpenAI Responses) uses the OpenAI SDK directly, which raises specific API errors rather than empty responses.
The vllm-20b model endpoint occasionally returns empty completions. Because CrewAI makes more LLM round-trips per request than the other agents, it has a higher probability of hitting an empty response on at least one call. This is a model reliability issue amplified by CrewAI's architecture, not a test or tracing problem.
test_tool_selection_accuracyandtest_tool_call_has_valid_argsmay fail with HTTP 500 when the model returns empty on any internal LLM call.test_pass_at_k_tool_usageruns 8 iterations; if most hit 500s, the pass rate drops below the 0.85 threshold.- Tests that don't trigger tool use (greetings, coherence) are less affected since they require fewer LLM round-trips.