Consolidate configurations into a config.py file#214
Conversation
|
I think mocking or dependency injection are both good ways of resolving this sort of situation. I'm driving at the moment and haven't had a chance to look at this code yet; I'll try to take a look this afternoon and see if I have any more specific recommendations. |
|
One of the difficulties here is that you have modules that run code on import, which can really complicate testing (because it makes it much more difficult to mock out dependencies or otherwise change the logic of the code). Specifically, in As well as a variety of s3 related functions: If you can refactor your code so that these variables are initialized at runtime rather than at import, you'll find things easier to test. |
|
It's not part of this PR, but this code in @functools.lru_cache
def get_invoice_bucket():
try:
s3_resource = boto3.resource(
service_name="s3",
endpoint_url=os.environ.get(
"S3_ENDPOINT", "https://s3.us-east-005.backblazeb2.com"
),
aws_access_key_id=os.environ["S3_KEY_ID"],
aws_secret_access_key=os.environ["S3_APP_KEY"],
)
except KeyError:
logger.error(
"Error: Please set the environment variables S3_KEY_ID and S3_APP_KEY"
)
return s3_resource.Bucket(os.environ.get("S3_BUCKET_NAME", "nerc-invoicing"))If the attempt to reference the return s3_resource.Bucket(os.environ.get("S3_BUCKET_NAME", "nerc-invoicing"))...which fails because |
d6b61ab to
8287e3f
Compare
| ### Miscellaneous config values TODO: Should these have their own getter functions? | ||
| NEW_PI_CREDIT_AMOUNT = rates_info.get_value_at("New PI Credit", INVOICE_MONTH, Decimal) | ||
| LIMIT_NEW_PI_CREDIT_TO_PARTNERS = rates_info.get_value_at( | ||
| "Limit New PI Credit to MGHPCC Partners", INVOICE_MONTH, bool | ||
| ) | ||
| BU_SUBSIDY_AMOUNT = rates_info.get_value_at("BU Subsidy", INVOICE_MONTH, Decimal) |
There was a problem hiding this comment.
@larsks @knikolla My only question is should these values that are fetched from nerc-rates get their own getter functions? Having this code ran on import means local testing can't be done offline. Aside from that, the code that use these values can have them injected at runtime, so dependency injection during unit tests works fine.
There was a problem hiding this comment.
In the current version of the code those are fetched from nerc-rates only if a value hasn't been provided by the CLI.
I don't exactly know what you mean by getter function, can you please describe how you would implement it?
There was a problem hiding this comment.
By getter functions, I'm referring to functions like:
@functools.lru_cache
def get_old_pi_filepath() -> str:
return OLD_PI_FILEPATH or util.fetch_s3(PI_S3_FILEPATH)...in contrast to what my original draft proposed:
OLD_PI_FILEPATH = util.fetch_s3(PI_S3_FILEPATH)The problem with the original draft was that config.py would try to fetch from S3 on import, making testing and dependency injection a lot more difficult. Making a "getter" function meant the fetch S3 code can be ran on-demand (or at runtime), allowing dependency injection.
With the getter function, processors/invoices can access configurations like this:
old_pi_filepath: str = field(default_factory=config.get_old_pi_filepath)And since it's just an argument in their __init__, during unit tests, I can inject some other value as I wish.
To address your original question, a getter function for the nerc-rates values would be:
NEW_PI_CREDIT_AMOUNT = None
@functools.lru_cache
def get_new_pi_credit_amount() -> Decimal:
return NEW_PI_CREDIT_AMOUNT or rates_info.get_value_at("New PI Credit", INVOICE_MONTH, Decimal)
@QuanMPhm you are trying to treat everything as being the same type of "configuration", however you have different types of things which may warrant being treated with different mechanisms
Some can be generalized, like the first two. While others may require a specialized solution by implementing functions in either the Invoice base class or somewhere else. |
8287e3f to
32c8a3a
Compare
|
|
||
| @functools.lru_cache | ||
| def get_prepaid_credits_df() -> pandas.DataFrame: | ||
| pandas.read_csv(PREPAY_CREDITS_FILEPATH) |
There was a problem hiding this comment.
This function does not return anything. All of the tests are passing, which suggests that this function isn't tested. In fact, a test coverage report shows that most of the functions in this file are not tested.
32c8a3a to
4bf4abf
Compare
|
@larsks Apologies. I forgot to actually push up my latest changes that use
I will submit unit tests for this soon :P |
| # Input file paths | ||
| NONBILLABLE_PIS_FILEPATH: pydantic.FilePath = "pi.txt" | ||
| NONBILLABLE_PROJECTS_FILEPATH: pydantic.FilePath = "projects.txt" | ||
| NONBILLABLE_TIMED_PROECTS_FILEPATH: pydantic.FilePath = "timed_projects.txt" | ||
|
|
||
| PREPAY_PROJECTS_FILEPATH: pydantic.FilePath = "prepaid_projects.csv" | ||
| PREPAY_CREDITS_FILEPATH: pydantic.FilePath = "prepaid_credits.csv" | ||
| PREPAY_CONTACTS_FILEPATH: pydantic.FilePath = "prepaid_contacts.csv" |
There was a problem hiding this comment.
You are saying that these variables are all of type pydantic.FilePath, but then you're assigning them a string value. These two types are not equivalent, and something expecting to get a FilePath object will break if it gets a str object instead:
>>> from process_report import config
>>> cfg = config.Config()
>>> cfg.NONBILLABLE_PIS_FILEPATH.name
Traceback (most recent call last):
File "<python-input-10>", line 1, in <module>
cfg.NONBILLABLE_PIS_FILEPATH.name
AttributeError: 'str' object has no attribute 'name'
Either just make them strings (which is exactly what you're doing in lines 30-32 for S3 file paths), or properly initialize them:
NONBILLABLE_PIS_FILEPATH: pydantic.FilePath = pydantic.FilePath("pi.txt")
(Same issue crops up elsewhere in this file.)
| @staticmethod | ||
| def fetch_s3_invoices(invoice_month): | ||
| """Fetches usage invoices from S3 given invoice month""" | ||
| s3_invoice_list = list() |
There was a problem hiding this comment.
Typicall to initialize an empty list you just write:
s3_invoice_list = []
| def get_csv_invoice_filepaths(self) -> list[pydantic.FilePath]: | ||
| if not self.CSV_INVOICE_FILEPATH_LIST: | ||
| self.CSV_INVOICE_FILEPATH_LIST = self.fetch_s3_invoices(self.INVOICE_MONTH) | ||
| return self.CSV_INVOICE_FILEPATH_LIST |
There was a problem hiding this comment.
If you wanted to avoid having accessor methods for some attributes and not for others, you could make these properties instead:
class Config(pydantic.BaseModel):
_csv_invoice_filepath_list: list[str] = []
@property
def CSV_INVOICE_FILEPATH_LIST(self) -> list[str]:
if not self._csv_invoice_filepath_list:
self._csv_invoice_filepath_list = self.fetch_s3_invoices()
return self._csv_invoice_filepath_list
And now your code can treat these values just like the static ones:
print(cfg.CSV_INVOICE_FILEPATH_LIST)
print(cfg.NONBILLABLE_PIS_FILEPATH)
(Or use functools.cached_property, which would remove the need for private backing fields.)
|
|
||
|
|
||
| # Custom configurations goes here | ||
| CONFIG_DICT = {} |
There was a problem hiding this comment.
My idea was that CONFIG_DICT can be populated with custom configuration, i.e:
CONFIG_DICT = {
"UPLOAD_TO_S3": False, # Set to False for local testing, no files uploaded,
"NONBILLABLE_INVOICE_NAME": "custom_nonbillable",
}
config = Config.model_validate(CONFIG_DICT)|
|
||
| def get_old_pi_filepath(self) -> pydantic.FilePath: | ||
| if not self.OLD_PI_FILEPATH: | ||
| self.OLD_PI_FILEPATH = util.fetch_s3(self.PI_S3_FILEPATH) |
There was a problem hiding this comment.
Should we have better error handling around s3 access (e.g., in the case of incorrect auth, connection timeouts, etc)?
There was a problem hiding this comment.
That problem never occurred to me to address. Is there some reference code I can look at that does this s3 error handling?
| @dataclass | ||
| class ValidatePIAliasProcessor(processor.Processor): | ||
| alias_map: dict | ||
| alias_map: dict = field(default_factory=config.get_alias_map) |
There was a problem hiding this comment.
Generally, we expect to see type arguments, like dict[str, str], rather than an untyped dict.
| return dataframe[mask]["Project"].to_list() | ||
|
|
||
| def get_alias_map(self) -> dict: | ||
| alias_dict = dict() |
| ) | ||
| return dataframe[mask]["Project"].to_list() | ||
|
|
||
| def get_alias_map(self) -> dict: |
There was a problem hiding this comment.
Generally, we expect to see type arguments, like dict[str, str], rather than an untyped dict.
|
It's not part of this PR, but note that the environment variables for holding S3 (or other AWS-service) credentials are typically |
|
Note-to-self. Check @larsks's feedback from this repo: https://github.com/larsks/invoicing-example/tree/main |
|
Note-to-self. A new revision is made on branch |
32c8a3a to
ace6639
Compare
6b61a71 to
7e7d0c3
Compare
knikolla
left a comment
There was a problem hiding this comment.
Big improvement over how it was before, great work!
Some minor comments.
All invoicing configurations, such as output file names, input file paths, etc, have been grouped into a Pydantic module in `settings.py` file. There is no longer any CLI arguments, and all `Processor`/`Invoice` subclasses will now get their unique configurations from `settings.py`, instead of being passed through `__init__()`. Test cases, especially the e2e, have been changed accordingly. Loading of configuration files moved to `loader.py`. This separation between settings and loading logic should make the code more maintainable. With how we previously handled invoicing configurations, `process_report.py` had an unyieldingly long list of configurations, which also forced us to initialize each processor/invoice individually, creating much clutter over time. This change is made to reduce clutter by grouping all configurations into one file. Invoice names are no longer configurable. They are now considered immutable business logic The default invoice month will now always be the previous month, regardless of which date of the month invoicing is ran
7e7d0c3 to
72eb6bb
Compare
Closes #213 and closes #91. I'll submit this as a draft for now, since its quite voluminous already, and I want people's opinions before moving forward.
@knikolla @larsks @naved001 In particular, right now the unit tests are failing because
config.pytries to fetch from s3 while unauthenticated. I was considering creating a base test class that mocks the s3 functions duringsetUp(), or to add a layer of abstraction toconfig.pyso that other files can only access the configurations through specific getter functions. Do you think there's a better way?