Skip to content

[BUG]prompt flow eval only supports UTF8 encoding #3670

Open
@yanggaome

Description

@yanggaome

Describe the bug
A clear and concise description of the bug.

Looking at the evaluation code, it only supports the default UTF-8 encoding:

initial_data_df = pd.read_json(data, lines=True)

https://github.com/microsoft/promptflow/blob/main/src/promptflow-evals/promptflow/evals/evaluate/_evaluate.py#L115

Tested with a jsonL file with encoding='utf-8-sig', it errored out (as expected). Some of our clients are using multilingual input data, and they said 'utf-8-sig' is the only working encoding for them when they just use native promptflow. This seems a gap between promptflow and eval SDK.

Traceback (most recent call last):
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 115, in _validate_and_load_data
    initial_data_df = pd.read_json(data, lines=True)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 815, in read_json
    return json_reader.read()
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1023, in read
    obj = self._get_object_parser(self._combine_lines(data_lines))
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1051, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1187, in parse
    self._parse()
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/pandas/io/json/_json.py", line 1403, in _parse
    ujson_loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
...
    results = evaluate(
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_telemetry/__init__.py", line 138, in wrapper
    result = func(*args, **kwargs)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 381, in evaluate
    raise e
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 356, in evaluate
    return _evaluate(
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 395, in _evaluate
    input_data_df = _validate_and_load_data(target, data, evaluators, output_path, azure_ai_project, evaluation_name)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 117, in _validate_and_load_data
    raise ValueError(
ValueError: Failed to load data from /xxx/test_data.jsonl. Please validate it is a valid jsonl data. Error: Expected object or value.

How To Reproduce the bug
Steps to reproduce the behavior, how frequent can you experience the bug:
1.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Running Information(please complete the following information):

  • Promptflow Package Version using pf -v: [e.g. 0.0.102309906]
  • Operating System: [e.g. Ubuntu 20.04, Windows 11]
  • Python Version using python --version: [e.g. python==3.10.12]

{
"promptflow": "1.15.0",
"promptflow-azure": "1.15.0",
"promptflow-core": "1.15.0",
"promptflow-devkit": "1.15.0",
"promptflow-evals": "0.3.2",
"promptflow-tracing": "1.15.0"
}

Executable '/anaconda/envs/azureml_py38/bin/python'
Python (Linux) 3.9.19 | packaged by conda-forge | (main, Mar 20 2024, 12:50:21)
[GCC 12.3.0]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions