Description
Describe the bug
A clear and concise description of the bug.
Looking at the evaluation code, it only supports the default UTF-8 encoding:
initial_data_df = pd.read_json(data, lines=True)
Tested with a jsonL file with encoding='utf-8-sig', it errored out (as expected). Some of our clients are using multilingual input data, and they said 'utf-8-sig' is the only working encoding for them when they just use native promptflow. This seems a gap between promptflow and eval SDK.
Traceback (most recent call last):
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 115, in _validate_and_load_data
initial_data_df = pd.read_json(data, lines=True)
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 815, in read_json
return json_reader.read()
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1023, in read
obj = self._get_object_parser(self._combine_lines(data_lines))
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1051, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1187, in parse
self._parse()
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1403, in _parse
ujson_loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
...
results = evaluate(
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_telemetry/__init__.py", line 138, in wrapper
result = func(*args, **kwargs)
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 381, in evaluate
raise e
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 356, in evaluate
return _evaluate(
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 395, in _evaluate
input_data_df = _validate_and_load_data(target, data, evaluators, output_path, azure_ai_project, evaluation_name)
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 117, in _validate_and_load_data
raise ValueError(
ValueError: Failed to load data from /xxx/test_data.jsonl. Please validate it is a valid jsonl data. Error: Expected object or value.
How To Reproduce the bug
Steps to reproduce the behavior, how frequent can you experience the bug:
1.
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Running Information(please complete the following information):
- Promptflow Package Version using
pf -v
: [e.g. 0.0.102309906] - Operating System: [e.g. Ubuntu 20.04, Windows 11]
- Python Version using
python --version
: [e.g. python==3.10.12]
{
"promptflow": "1.15.0",
"promptflow-azure": "1.15.0",
"promptflow-core": "1.15.0",
"promptflow-devkit": "1.15.0",
"promptflow-evals": "0.3.2",
"promptflow-tracing": "1.15.0"
}
Executable '/anaconda/envs/azureml_py38/bin/python'
Python (Linux) 3.9.19 | packaged by conda-forge | (main, Mar 20 2024, 12:50:21)
[GCC 12.3.0]
Additional context
Add any other context about the problem here.