Fix data schema in example evaluation script by antoine-tran · Pull Request #21 · facebookresearch/large_concept_model

antoine-tran · 2025-01-15T06:45:32Z

Why ?

In example evaluation script ("examples/evaluation/prepare_evaluation_data.py"), the processed datasets are hardcoded with the column schema with new names "prompt", "answer". This was done to make the next data processing steps in LCM evaluation (sentence splitting, sonar embedding) easier, but it was inconsistent in LLM evaluation, because they do not need much data processing and can work directly with original dataset.

This PR makes the following changes to make the evaluation script more flexible:

In Step 1 (preparing the JSONL dataset split), if the user specifies "prompt" parameters (prompt_prefix, prompt_suffix), we rename the columns to "prompt" and "answer".
If the user does no specify these parameters, the original column names are kept

NOTE: There is an issue in Python 3.12 compatibility related to stopes facebookresearch/stopes#71 , which also makes the current CI failed. This PR was tested and passed on Python 3.11

antoine-tran · 2025-01-16T06:17:26Z

Merged despite the CI failures to fix the issues

Fix data schema in example evaluation script

Tuan Tran added 16 commits January 14, 2025 14:40

update evaluation scripts

53a3f94

update error messages to be more informative

2cc2698

allow the 2 naming to co-exist

307335a

debug

8895df6

debug

b100519

fix bug in overwriting colum names

f739498

remove breakpoints

f3d1b44

fix bug in overwriting colum names

9cc6eef

remove breakpoints()

90881a6

fix Gemma predictors

7b02291

lint

1ef6efe

update CI

0df596b

update CI

0e9df23

isort

7e4c89c

Merge branch 'main' into tuan/fix_17

f53d6a7

update doc

893e22c

facebook-github-bot added the cla signed label Jan 15, 2025

antoine-tran mentioned this pull request Jan 15, 2025

[rank0]: AssertionError: Missing _source_text_column or article #17

Closed

lint

482e7d1

antoine-tran mentioned this pull request Jan 15, 2025

LCM_MSE eval fails with cnn_dailymail prepared parquet due to missing keys #19

Open

antoine-tran merged commit d640223 into main Jan 16, 2025
11 of 13 checks passed

LUIGIVAMPER pushed a commit to XiangningLin/large_concept_model that referenced this pull request Nov 25, 2025

Merge pull request facebookresearch#21 from facebookresearch/tuan/fix_17

93a9734

Fix data schema in example evaluation script

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix data schema in example evaluation script#21

Fix data schema in example evaluation script#21
antoine-tran merged 17 commits intomainfrom
tuan/fix_17

antoine-tran commented Jan 15, 2025 •

edited

Loading

Uh oh!

antoine-tran commented Jan 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

antoine-tran commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why ?

Uh oh!

antoine-tran commented Jan 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

antoine-tran commented Jan 15, 2025 •

edited

Loading