Skip to content

Evaluation tutorial#412

Closed
yawenzzzz wants to merge 31 commits intomainfrom
20251024_eval_doc
Closed

Evaluation tutorial#412
yawenzzzz wants to merge 31 commits intomainfrom
20251024_eval_doc

Conversation

@yawenzzzz
Copy link
Copy Markdown
Collaborator

PR #408 has been merged in

config.launch.launch(follow=False)
config.launch.launch(
follow=False, torchrun=True
) # always run with torchrun so you can run distributed scripts optionally on single gpu
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Optional Launch Config Causes Errors

The launch field in OlmoEarthExperimentConfig is now optional, but the launch and launch_prep functions don't account for this. They attempt to access attributes or call methods on config.launch directly, which results in an AttributeError if no launch configuration is provided.

Additional Locations (1)

Fix in Cursor Fix in Web

if is_running_in_beaker() and beaker_user is None:
raise ValueError(
"Failed to get Beaker username. Make sure you are authenticated with Beaker."
"Failed to get Beaker username. Make sure you are authenticated with Beaker if you are not running on a local cluster."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Beaker Username Check Fails

The check for a missing Beaker username when running in Beaker is ineffective. beaker_user is assigned ANONYMOUS_USER if get_beaker_username() is None, which makes the subsequent beaker_user is None condition always false. This prevents the intended ValueError from being raised for unauthenticated Beaker users.

Fix in Cursor Fix in Web

num_workers=0,
pooling_type=PoolingType.MEAN,
norm_stats_from_pretrained=True,
eval_interval=Duration.steps(2),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Frequent Evaluation Interval in Production

The eval_interval for "m-eurosat" is set to Duration.steps(2) which means evaluation runs every 2 steps. This is extremely frequent and likely a debug value that was accidentally left in production code. Other tasks use 4000 or 20000 steps, so this should probably be similar.

Fix in Cursor Fix in Web

@gabrieltseng gabrieltseng mentioned this pull request Oct 30, 2025
3 tasks
@yawenzzzz yawenzzzz closed this Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants