Skip to content

Evaluate WebWand on the WebArena dataset #154

Open
@lynchee-owo

Description

@lynchee-owo

Use WebArena benchmark.

  1. Setup the standalone environment of WebArena
  2. Configurate the urls for each website.
  3. Generate config file for each test example and obtain the auto-login cookies for all websites
  4. Write script to use WebArena's environment based on its run.py
  5. Save task execution results and evaluate.
  6. Analyze the evaluation results

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions