Skip to content

Add FactScore-STEM-Geo dataset; Include CodeGenUQ in docs#409

Merged
dylanbouchard merged 10 commits into
developfrom
factscore-stem-geo
Jun 8, 2026
Merged

Add FactScore-STEM-Geo dataset; Include CodeGenUQ in docs#409
dylanbouchard merged 10 commits into
developfrom
factscore-stem-geo

Conversation

@dylanbouchard

@dylanbouchard dylanbouchard commented May 31, 2026

Copy link
Copy Markdown
Collaborator

Description

  • Add FactScore-STEM-Geo dataset (from Bouchard et al., 2026) to load_example_dataset utility. This uses wikipedia-api library to create the long-form answer key.
  • Added code gen methods to docs site

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Dependency update

Checklist

  • Tests added or updated for all changed behavior
  • Docstrings updated for any new or modified public API
  • Type annotations added for any new or modified functions
  • ruff check and ruff format pass locally

@dylanbouchard dylanbouchard changed the title Factscore stem geo Add FactScore-STEM-Geo dataset; Include CodeGenUQ in docs Jun 1, 2026
@dylanbouchard

Copy link
Copy Markdown
Collaborator Author

@mohitcek

Comment thread uqlm/utils/dataloader.py Outdated
if cols:
df = _dataset_processing(df=df, subset_columns=cols)
if isinstance(n, int):
df = df.iloc[:n]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this slicing used for? Reason I'm calling it out is that because it happens at the very end you've already gone through the hard part of all the http calls to wikipedia only to throw it away here if you're doing .iloc[:5] or something right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, good point. Should we just ignore n parameter for factscore-stem-geo and user can use .head() or sample()? Open to ideas here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah either that or passing n to load_factscore_stem_geo_dataset() so that it can handle what to do with n before fetching pages.

How long do you find it takes to load this dataset with this code? Might be worth putting a progress bar on the for loop calling wikipedia so the user understands what's taking so long. Unless you find that it happens fast b/c wikipedia+the wiki lib just hit the server hard and it's ok with that. But I think I originally put factscore in HF 1) to keep the HF-centric approach and 2) to avoid issues with having to scrape on demand... not that I'm arguing for this needing to be static in HF but just where all of my motivation for this comment thread is coming from :D

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated as discussed!

@virenbajaj virenbajaj left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One function that should be private is public I think.
2 documentation nits. Otherwise looks good!

Comment thread uqlm/utils/dataloader.py Outdated
print(f"Loading dataset - {name}...")
if dataset_dict[name]["load_params"].get("loader") == "_load_factscore_stem_geo_dataset":
if isinstance(n, int):
print("Note: the 'n' parameter is not used for 'factscore-stem-geo' — all available articles will be returned.")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this note says all available articles will be returned, but this is capped at 100 articles per entity type. Can we say something like: "At most 100 longest articles per entity will be returned"?

Comment thread uqlm/utils/dataloader.py Outdated
"livecodebench", "factscore-stem-geo"

n : int, optional
Number of rows to load from the dataset.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: change to
"n : int, optional
Number of rows to load from the dataset. Ignored for "factscore-stem-geo",
which always returns all fetched articles."

Comment thread uqlm/utils/dataloader.py Outdated
}


def get_wiki_texts_from_entities(entities: List[str]) -> dict:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a private helper that starts with an underscore _ like _load_factscore_stem_geo_dataset()?
def _get_wiki_texts_from_entities(entities: List[str]) -> dict:

@dylanbouchard dylanbouchard merged commit 0b44b24 into develop Jun 8, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants