`mean_success_rate = 0.5` when `num_complete_trials = 1` in benchmark results

Hello,

When I was running the benchmark evaluation, I encountered an unexpected result in the evaluation output. Specifically, for a task where `num_complete_trials` is `1`, the `mean_success_rate` is reported as `0.5`.

Here is the relevant line from the evaluation results:

| Index | Task               |num_complete_trials | mean_success_rate | mean_episode_length | total_runtime_s | num_fail_trials |
|-------|--------------------|--------------------|-------------------|---------------------|-----------------|-----------------|
| 114   | TurnOnWifiAndOpenApp |  1                   | 0.5               | 13                  | 107.6           | 0               |

My understanding is that if `num_complete_trials` is `1`, the `mean_success_rate` should logically be either `0.0` (if the single trial failed) or `1.0` (if the single trial succeeded). A value of `0.5` for a single trial seems contradictory.

Could you please clarify why this might be the case? Is there a specific interpretation or calculation method I'm missing, or could this be an anomaly in the reporting?

Thank you for your time and assistance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`mean_success_rate = 0.5` when `num_complete_trials = 1` in benchmark results #361

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

mean_success_rate = 0.5 when num_complete_trials = 1 in benchmark results #361

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`mean_success_rate = 0.5` when `num_complete_trials = 1` in benchmark results #361