Clarifications of the benchmark

Hi,

I am trying to run the benchmark, and have following questions. 

1. Where can we find the [code_understanding_directory](https://github.com/mitdbg/Kramabench/blob/main/evaluate.py#L132)? This seems to be necessary to evaluate data pipeline. 
2. What are data sources for each query? For example, for the [data_sources of legal-tiny.json](https://github.com/mitdbg/Kramabench/blob/main/workload/legal-tiny.json#L8-L10), I don't see them being passed as part of the inputs during the generation time. I do see that the only time they were used was [format_code_understanding_messages](https://github.com/mitdbg/Kramabench/blob/main/benchmark/llm_tools/gpt_interface.py#L71) in the gpt_interface file, which was called in [generate_key_functionalities_for_workload](https://github.com/mitdbg/Kramabench/blob/main/benchmark/benchmark_utils.py#L6-L26). However, when and how is the [generate_key_functionalities_for_workload](https://github.com/mitdbg/Kramabench/blob/main/benchmark/benchmark_utils.py#L6-L26) being used? I couldn't see any function calls that invoke this method. 

Thank you for your time. 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifications of the benchmark #5

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Clarifications of the benchmark #5

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions