Consolidate configurations into a `config.py` file by QuanMPhm · Pull Request #214 · CCI-MOC/invoicing

QuanMPhm · 2025-06-27T16:01:50Z

Closes #213 and closes #91. I'll submit this as a draft for now, since its quite voluminous already, and I want people's opinions before moving forward.

@knikolla @larsks @naved001 In particular, right now the unit tests are failing because config.py tries to fetch from s3 while unauthenticated. I was considering creating a base test class that mocks the s3 functions during setUp(), or to add a layer of abstraction to config.py so that other files can only access the configurations through specific getter functions. Do you think there's a better way?

larsks · 2025-07-10T17:14:58Z

I think mocking or dependency injection are both good ways of resolving this sort of situation. I'm driving at the moment and haven't had a chance to look at this code yet; I'll try to take a look this afternoon and see if I have any more specific recommendations.

larsks · 2025-07-10T19:03:28Z

One of the difficulties here is that you have modules that run code on import, which can really complicate testing (because it makes it much more difficult to mock out dependencies or otherwise change the logic of the code). Specifically, in config.py you run load_from_url on import:

rates_info = load_from_url()

As well as a variety of s3 related functions:

CSV_INVOICE_FILEPATH_LIST = fetch_s3_invoices(INVOICE_MONTH)
PREPAY_DEBITS_FILEPATH = util.fetch_s3(PREPAY_DEBITS_S3_FILEPATH)
OLD_PI_FILEPATH = util.fetch_s3(PI_S3_FILEPATH)
ALIAS_FILEPATH = util.fetch_s3(ALIAS_S3_FILEPATH)

If you can refactor your code so that these variables are initialized at runtime rather than at import, you'll find things easier to test.

larsks · 2025-07-10T19:47:01Z

It's not part of this PR, but this code in util.py is problematic:

@functools.lru_cache
def get_invoice_bucket():
    try:
        s3_resource = boto3.resource(
            service_name="s3",
            endpoint_url=os.environ.get(
                "S3_ENDPOINT", "https://s3.us-east-005.backblazeb2.com"
            ),
            aws_access_key_id=os.environ["S3_KEY_ID"],
            aws_secret_access_key=os.environ["S3_APP_KEY"],
        )
    except KeyError:
        logger.error(
            "Error: Please set the environment variables S3_KEY_ID and S3_APP_KEY"
        )
    return s3_resource.Bucket(os.environ.get("S3_BUCKET_NAME", "nerc-invoicing"))

If the attempt to reference the S3_* environment variables raises a KeyError, you should probably exit rather than simply logging an error. Currently, you log a a message and then proceed to execute...

    return s3_resource.Bucket(os.environ.get("S3_BUCKET_NAME", "nerc-invoicing"))

...which fails because s3_resource has not been defined:

UnboundLocalError: cannot access local variable 's3_resource' where it is not associated with a value

QuanMPhm · 2025-07-10T21:03:07Z

+### Miscellaneous config values TODO: Should these have their own getter functions?
+NEW_PI_CREDIT_AMOUNT = rates_info.get_value_at("New PI Credit", INVOICE_MONTH, Decimal)
+LIMIT_NEW_PI_CREDIT_TO_PARTNERS = rates_info.get_value_at(
+    "Limit New PI Credit to MGHPCC Partners", INVOICE_MONTH, bool
+)
+BU_SUBSIDY_AMOUNT = rates_info.get_value_at("BU Subsidy", INVOICE_MONTH, Decimal)


@larsks @knikolla My only question is should these values that are fetched from nerc-rates get their own getter functions? Having this code ran on import means local testing can't be done offline. Aside from that, the code that use these values can have them injected at runtime, so dependency injection during unit tests works fine.

In the current version of the code those are fetched from nerc-rates only if a value hasn't been provided by the CLI.

I don't exactly know what you mean by getter function, can you please describe how you would implement it?

By getter functions, I'm referring to functions like:

@functools.lru_cache def get_old_pi_filepath() -> str: return OLD_PI_FILEPATH or util.fetch_s3(PI_S3_FILEPATH)

...in contrast to what my original draft proposed:

OLD_PI_FILEPATH = util.fetch_s3(PI_S3_FILEPATH)

The problem with the original draft was that config.py would try to fetch from S3 on import, making testing and dependency injection a lot more difficult. Making a "getter" function meant the fetch S3 code can be ran on-demand (or at runtime), allowing dependency injection.

With the getter function, processors/invoices can access configurations like this:

old_pi_filepath: str = field(default_factory=config.get_old_pi_filepath)

And since it's just an argument in their __init__, during unit tests, I can inject some other value as I wish.

To address your original question, a getter function for the nerc-rates values would be:

NEW_PI_CREDIT_AMOUNT = None @functools.lru_cache def get_new_pi_credit_amount() -> Decimal: return NEW_PI_CREDIT_AMOUNT or rates_info.get_value_at("New PI Credit", INVOICE_MONTH, Decimal)

QuanMPhm · 2025-07-10T21:03:45Z

@larsks @knikolla I've decided to add getter functions for configurations that require fetching from S3. This PR is now ready for review

knikolla · 2025-07-10T21:43:28Z

@larsks @knikolla I've decided to add getter functions for configurations that require fetching from S3. This PR is now ready for review

@QuanMPhm you are trying to treat everything as being the same type of "configuration", however you have different types of things which may warrant being treated with different mechanisms

strings that define paths in s3
numbers that defines values that are either fetched from the CLI or nerc-rates
lists
old pis
aliases

Some can be generalized, like the first two. While others may require a specialized solution by implementing functions in either the Invoice base class or somewhere else.

QuanMPhm · 2025-07-14T19:18:03Z

@larsks @knikolla I have decided to use Pydantic to validate configurations.

larsks · 2025-07-25T15:51:03Z

+
+@functools.lru_cache
+def get_prepaid_credits_df() -> pandas.DataFrame:
+    pandas.read_csv(PREPAY_CREDITS_FILEPATH)


This function does not return anything. All of the tests are passing, which suggests that this function isn't tested. In fact, a test coverage report shows that most of the functions in this file are not tested.

QuanMPhm · 2025-07-25T18:48:14Z

@larsks Apologies. I forgot to actually push up my latest changes that use Pydantic.

This function does not return anything. All of the tests are passing, which suggests that this function isn't tested. In fact, a test coverage report shows that most of the functions in this file are not tested.

I will submit unit tests for this soon :P

larsks · 2025-07-25T19:40:10Z

+    # Input file paths
+    NONBILLABLE_PIS_FILEPATH: pydantic.FilePath = "pi.txt"
+    NONBILLABLE_PROJECTS_FILEPATH: pydantic.FilePath = "projects.txt"
+    NONBILLABLE_TIMED_PROECTS_FILEPATH: pydantic.FilePath = "timed_projects.txt"
+
+    PREPAY_PROJECTS_FILEPATH: pydantic.FilePath = "prepaid_projects.csv"
+    PREPAY_CREDITS_FILEPATH: pydantic.FilePath = "prepaid_credits.csv"
+    PREPAY_CONTACTS_FILEPATH: pydantic.FilePath = "prepaid_contacts.csv"


You are saying that these variables are all of type pydantic.FilePath, but then you're assigning them a string value. These two types are not equivalent, and something expecting to get a FilePath object will break if it gets a str object instead:

>>> from process_report import config >>> cfg = config.Config() >>> cfg.NONBILLABLE_PIS_FILEPATH.name Traceback (most recent call last): File "<python-input-10>", line 1, in <module> cfg.NONBILLABLE_PIS_FILEPATH.name AttributeError: 'str' object has no attribute 'name'

Either just make them strings (which is exactly what you're doing in lines 30-32 for S3 file paths), or properly initialize them:

NONBILLABLE_PIS_FILEPATH: pydantic.FilePath = pydantic.FilePath("pi.txt")

(Same issue crops up elsewhere in this file.)

larsks · 2025-07-25T19:42:17Z

+    @staticmethod
+    def fetch_s3_invoices(invoice_month):
+        """Fetches usage invoices from S3 given invoice month"""
+        s3_invoice_list = list()


Typicall to initialize an empty list you just write:

s3_invoice_list = []

larsks · 2025-07-25T19:59:24Z

+    def get_csv_invoice_filepaths(self) -> list[pydantic.FilePath]:
+        if not self.CSV_INVOICE_FILEPATH_LIST:
+            self.CSV_INVOICE_FILEPATH_LIST = self.fetch_s3_invoices(self.INVOICE_MONTH)
+        return self.CSV_INVOICE_FILEPATH_LIST


If you wanted to avoid having accessor methods for some attributes and not for others, you could make these properties instead:

class Config(pydantic.BaseModel): _csv_invoice_filepath_list: list[str] = [] @property def CSV_INVOICE_FILEPATH_LIST(self) -> list[str]: if not self._csv_invoice_filepath_list: self._csv_invoice_filepath_list = self.fetch_s3_invoices() return self._csv_invoice_filepath_list

And now your code can treat these values just like the static ones:

print(cfg.CSV_INVOICE_FILEPATH_LIST) print(cfg.NONBILLABLE_PIS_FILEPATH)

(Or use functools.cached_property, which would remove the need for private backing fields.)

larsks · 2025-07-25T20:11:44Z

+
+
+# Custom configurations goes here
+CONFIG_DICT = {}


What is CONFIG_DICT for?

My idea was that CONFIG_DICT can be populated with custom configuration, i.e:

CONFIG_DICT = { "UPLOAD_TO_S3": False, # Set to False for local testing, no files uploaded, "NONBILLABLE_INVOICE_NAME": "custom_nonbillable", } config = Config.model_validate(CONFIG_DICT)

larsks · 2025-07-25T20:15:27Z

+
+    def get_old_pi_filepath(self) -> pydantic.FilePath:
+        if not self.OLD_PI_FILEPATH:
+            self.OLD_PI_FILEPATH = util.fetch_s3(self.PI_S3_FILEPATH)


Should we have better error handling around s3 access (e.g., in the case of incorrect auth, connection timeouts, etc)?

That problem never occurred to me to address. Is there some reference code I can look at that does this s3 error handling?

larsks · 2025-07-25T20:18:54Z

 @dataclass
 class ValidatePIAliasProcessor(processor.Processor):
-    alias_map: dict
+    alias_map: dict = field(default_factory=config.get_alias_map)


Generally, we expect to see type arguments, like dict[str, str], rather than an untyped dict.

larsks · 2025-07-25T20:19:15Z

+        return dataframe[mask]["Project"].to_list()
+
+    def get_alias_map(self) -> dict:
+        alias_dict = dict()


Prefer:

alias_dict = {}

larsks · 2025-07-25T20:20:24Z

+        )
+        return dataframe[mask]["Project"].to_list()
+
+    def get_alias_map(self) -> dict:


Generally, we expect to see type arguments, like dict[str, str], rather than an untyped dict.

larsks · 2025-07-25T20:48:46Z

It's not part of this PR, but note that the environment variables for holding S3 (or other AWS-service) credentials are typically AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, rather than S3_KEY_ID and S3_APP_KEY, and for specifying an S3 endpoint AWS_ENDPOINT_URL_S3.

QuanMPhm · 2025-07-29T19:21:23Z

Note-to-self. Check @larsks's feedback from this repo: https://github.com/larsks/invoicing-example/tree/main

QuanMPhm · 2025-09-09T20:35:48Z

Note-to-self. A new revision is made on branch 213/config_final. The old revision is on 213/config_consolidate_pydantic

QuanMPhm · 2025-09-09T20:42:35Z

@larsks @knikolla I have made a revision of my PR. The main change has been the separation of settings and loading of information.

knikolla

Big improvement over how it was before, great work!

Some minor comments.

All invoicing configurations, such as output file names, input file paths, etc, have been grouped into a Pydantic module in `settings.py` file. There is no longer any CLI arguments, and all `Processor`/`Invoice` subclasses will now get their unique configurations from `settings.py`, instead of being passed through `__init__()`. Test cases, especially the e2e, have been changed accordingly. Loading of configuration files moved to `loader.py`. This separation between settings and loading logic should make the code more maintainable. With how we previously handled invoicing configurations, `process_report.py` had an unyieldingly long list of configurations, which also forced us to initialize each processor/invoice individually, creating much clutter over time. This change is made to reduce clutter by grouping all configurations into one file. Invoice names are no longer configurable. They are now considered immutable business logic The default invoice month will now always be the previous month, regardless of which date of the month invoicing is ran

QuanMPhm · 2025-10-21T14:56:20Z

@larsks @knikolla Great thanks for your approval of this ponderous PR

QuanMPhm requested review from hakasapl, knikolla, larsks and naved001 June 27, 2025 16:01

QuanMPhm force-pushed the 213/config_consolidate branch from d6b61ab to 8287e3f Compare July 10, 2025 20:51

QuanMPhm marked this pull request as ready for review July 10, 2025 20:52

QuanMPhm commented Jul 10, 2025

View reviewed changes

QuanMPhm force-pushed the 213/config_consolidate branch from 8287e3f to 32c8a3a Compare July 14, 2025 18:37

larsks requested changes Jul 25, 2025

View reviewed changes

QuanMPhm force-pushed the 213/config_consolidate branch from 32c8a3a to 4bf4abf Compare July 25, 2025 18:24

larsks requested changes Jul 25, 2025

View reviewed changes

QuanMPhm force-pushed the 213/config_consolidate branch 2 times, most recently from 32c8a3a to ace6639 Compare September 9, 2025 20:40

QuanMPhm requested a review from larsks September 9, 2025 20:42

QuanMPhm force-pushed the 213/config_consolidate branch 2 times, most recently from 6b61a71 to 7e7d0c3 Compare September 10, 2025 14:04

QuanMPhm mentioned this pull request Sep 10, 2025

Migrate to using the new project.yaml in the non-billable-projects repo #232

Closed

knikolla requested changes Sep 15, 2025

View reviewed changes

Comment thread process_report/settings.py Outdated

Comment thread process_report/settings.py Outdated

Comment thread process_report/settings.py Outdated

Comment thread process_report/invoices/billable_invoice.py

Comment thread process_report/invoices/billable_invoice.py

QuanMPhm mentioned this pull request Sep 15, 2025

Change error log message to raise Error #235

Closed

QuanMPhm force-pushed the 213/config_consolidate branch from 7e7d0c3 to 72eb6bb Compare September 15, 2025 18:19

QuanMPhm requested a review from knikolla September 30, 2025 16:54

knikolla mentioned this pull request Oct 1, 2025

Kristi does Code Reviews (Week of 2025-09-29) CCI-MOC/ops-issues#1621

Closed

9 tasks

knikolla approved these changes Oct 14, 2025

View reviewed changes

larsks approved these changes Oct 17, 2025

View reviewed changes

QuanMPhm merged commit fec354d into CCI-MOC:main Oct 21, 2025
6 checks passed

Uh oh!

Conversation

QuanMPhm commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

larsks commented Jul 10, 2025

Uh oh!

larsks commented Jul 10, 2025

Uh oh!

larsks commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QuanMPhm Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QuanMPhm commented Jul 10, 2025

Uh oh!

knikolla commented Jul 10, 2025

Uh oh!

QuanMPhm commented Jul 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QuanMPhm commented Jul 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

larsks Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QuanMPhm Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

larsks commented Jul 25, 2025

Uh oh!

QuanMPhm commented Jul 29, 2025

Uh oh!

QuanMPhm commented Sep 9, 2025

Uh oh!

QuanMPhm commented Sep 9, 2025

Uh oh!

knikolla left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

QuanMPhm commented Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

QuanMPhm commented Jun 27, 2025 •

edited

Loading

larsks commented Jul 10, 2025 •

edited

Loading

QuanMPhm Jul 11, 2025 •

edited

Loading

larsks Jul 25, 2025 •

edited

Loading

QuanMPhm Jul 29, 2025 •

edited

Loading