Skip to content

Allow supplemental API data to be passed to ColdfrontFetchProcessor#282

Open
QuanMPhm wants to merge 1 commit into
CCI-MOC:mainfrom
QuanMPhm:239/bm
Open

Allow supplemental API data to be passed to ColdfrontFetchProcessor#282
QuanMPhm wants to merge 1 commit into
CCI-MOC:mainfrom
QuanMPhm:239/bm

Conversation

@QuanMPhm

Copy link
Copy Markdown
Contributor

Closes #239. Related to https://github.com/CCI-MOC/invoicing-private-data/pull/68. This allows passing data for billable projects not currently in Coldfront Namely bare metal projects.

This will finally allow billing of Bare Metal projects

@QuanMPhm QuanMPhm requested review from knikolla and naved001 March 21, 2026 19:28
invoice.PI_FIELD: pi_name,
invoice.INSTITUTION_ID_FIELD: "N/A",
invoice.CLUSTER_NAME_FIELD: cluster_name,
invoice.IS_COURSE_FIELD: False, # (TODO) Quan Assuming supplemental data does not contain course info?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "Assuming supplemental data does not contain course info?"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joachimweyl The supplemental data's purpose is to provide data that the Coldfront API would normally have (an allocation's name, PI, whether it belongs to a course). The current supplemental data file does not currently have a column to indicate if a project belongs in a course. I wanted to ask if we want to assume projects listed in this file can be assumed to never be in courses.

@joachimweyl Adding the extra column to indicate course-membership is simple.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the supplemental data file against the original; it looks like it is out of date.
I don't know that we can never assume they are courses, but so far, none of them have been. I would say it is worth adding the column just in case.

Comment thread process_report/loader.py Outdated
Comment thread process_report/loader.py Outdated
Comment thread process_report/loader.py Outdated
Comment thread process_report/loader.py Outdated
)
in_time_range_mask = supplemental_df.apply(
lambda row: _is_in_time_range(
row[SUPPLEMENTAL_START_DATE], row[SUPPLEMENTAL_END_DATE]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the supplemental data as of now[1] has no start and end date, so this would just error out.

What is the behaviour supposed to be when no start or end date is found? Erroring out? logging? or assume the data is applicable (cc: @joachimweyl)

[1] https://github.com/CCI-MOC/invoicing-private-data/blob/main/project_api_data.csv

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log, and we should add dates. We will need to work with RH to obtain dates. The template is supposed to be gathering this but all of these projects are pre the template

@knikolla

Copy link
Copy Markdown
Contributor

@QuanMPhm quick question. Why not reuse the same logic as the ColdFront API data loading via JSON (it could be extended YAML too)?

@QuanMPhm

Copy link
Copy Markdown
Contributor Author

Why not reuse the same logic as the ColdFront API data loading via JSON (it could be extended YAML too)?

@knikolla There's nothing strictly preventing us from doing that. It would require converting the current supplemental data file from CSV to YAML. Is that fine to do?

@knikolla

Copy link
Copy Markdown
Contributor

Why not reuse the same logic as the ColdFront API data loading via JSON (it could be extended YAML too)?

@knikolla There's nothing strictly preventing us from doing that. It would require converting the current supplemental data file from CSV to YAML. Is that fine to do?

@QuanMPhm I'd rather do a one time conversion of the CSV to YAML than maintain two separate data formats. I would suggest trying to mimic the ColdFront API output as closely as possible within the JSON.

@QuanMPhm

QuanMPhm commented Mar 24, 2026

Copy link
Copy Markdown
Contributor Author

@knikolla @naved001 Here's the PR to do the one-time conversion

@QuanMPhm

QuanMPhm commented Apr 8, 2026

Copy link
Copy Markdown
Contributor Author

@knikolla @naved001 The PR has been updated to use the new data format for the supplemental API data. It greatly simplified things

Comment thread process_report/processors/coldfront_fetch_processor.py Outdated

@knikolla knikolla left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments, otherwise looks pretty good :)


if invoice_settings.keycloak_client_id:
logger.info("Loading Coldfront data from remote server")
api_data += self._fetch_coldfront_allocation_api()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now introduces the possibility of an Allocated Project ID being matched to several instances in the ColdFront data. Can you please write a test that documents the behavior in such a scenario?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From 2:30 discussion today, I'll keep the behavior as error out and write the test for it

# pydantic_settings parses complex types as JSON-encoded strings: https://pydantic.dev/docs/validation/latest/concepts/pydantic_settings/#parsing-environment-variable-values
env["COLDFRONT_API_FILEPATHS"] = json.dumps(
(
str(test_files["test_coldfront_api_data.json"]),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the file parsing code, you actually only call yaml.safe_load so you don't actually support JSON files anymore. Please reintroduce logic to check the file extension and call the appropriate loading mechanism for JSON or YAML.

@QuanMPhm QuanMPhm Apr 21, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I saw the e2e test pass, I realized yaml.safe_load() can load JSON files as well. Do you still want separate code paths for JSON and YAML loading?

json_data = '{"name": "Alice", "age": 30, "city": "New York"}'
parsed_data = yaml.safe_load(json_data)
# parsed_data is {'name': 'Alice', 'age': 30, 'city': 'New York'}

The logic for what API data files are fetched has changed slightly.
If local API data files are provided through `COLDFRONT_API_FILEPATHS`,
the processor loads all of them.
Seperately, if `keycloak_client_id` is set, also fetches from remote Coldfront server

Added unit test and updated e2e test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fill out Project Name and PI column using bm_projects.csv file in nonbillable repo

4 participants