Skip to content

DM-48690 : Implement Caching Butler Pool#165

Merged
tcjennings merged 4 commits intotickets/DM-49324/releasefrom
tickets/DM-48690/butler_pool
Apr 11, 2025
Merged

DM-48690 : Implement Caching Butler Pool#165
tcjennings merged 4 commits intotickets/DM-49324/releasefrom
tickets/DM-48690/butler_pool

Conversation

@tcjennings
Copy link
Copy Markdown
Collaborator

@tcjennings tcjennings commented Mar 5, 2025

Implements a caching butler factory for the CM service such that

  • Butler repositories specified by environment variables are instantiated at application startup, front-loading synchronous and/or IO-heavy operations and caching the results.
  • Butler instances are provided to callers as "clone" copies of cached Butlers, with collection constraints applied as necessary.
  • CM operations using Butler Python APIs acquire a Butler clone instance on demand.

@tcjennings tcjennings force-pushed the tickets/DM-48690/butler_pool branch 3 times, most recently from 2f6d285 to 5aecda3 Compare March 7, 2025 21:39
@tcjennings tcjennings force-pushed the tickets/DM-48690/butler_pool branch 2 times, most recently from eb0c9da to d55e8f9 Compare March 25, 2025 15:46
@tcjennings tcjennings changed the base branch from tickets/DM-49324/release to tickets/DM-49522/deprecate_db March 25, 2025 15:46
@tcjennings tcjennings force-pushed the tickets/DM-48690/butler_pool branch 2 times, most recently from 1d29e7c to 7f7275b Compare March 31, 2025 21:55
@tcjennings tcjennings force-pushed the tickets/DM-49522/deprecate_db branch from 715a894 to 7e1fa74 Compare April 1, 2025 15:09
Base automatically changed from tickets/DM-49522/deprecate_db to tickets/DM-49324/release April 1, 2025 15:19
@tcjennings tcjennings force-pushed the tickets/DM-48690/butler_pool branch 8 times, most recently from f0b22c8 to b69e742 Compare April 4, 2025 14:30
@tcjennings tcjennings force-pushed the tickets/DM-49324/release branch 5 times, most recently from c5cc5cd to 71b4107 Compare April 10, 2025 15:53
@tcjennings tcjennings force-pushed the tickets/DM-48690/butler_pool branch 3 times, most recently from bb57a8b to 77973c7 Compare April 10, 2025 17:10
@tcjennings tcjennings marked this pull request as ready for review April 10, 2025 18:30
@tcjennings tcjennings force-pushed the tickets/DM-48690/butler_pool branch 2 times, most recently from 93ca0ac to e8a8e7d Compare April 10, 2025 21:32
op.drop_index(op.f("ix_step_name"), table_name="step", if_exists=True)
op.drop_index(op.f("ix_campaign_spec_id"), table_name="campaign", if_exists=True)
op.drop_index(op.f("ix_campaign_spec_block_id"), table_name="campaign", if_exists=True)
op.drop_index(op.f("ix_campaign_parent_id"), table_name="campaign", if_exists=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these database schema changes included in this PR in error? They seem unrelated to the code change.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error, they're part of a bugfix to support more successful migration downgrades but not worth their own PR.

without_datastore=True,
)
butler = await to_thread.run_sync(butler_f)
butler = BUTLER_FACTORY.get_butler(butler_repo, collections=[input_coll, campaign_input_coll])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call to get get_butler can ultimately call Butler.from_config and butler.clone(), both of which do synchronous I/O. So this probably should be run in a thread pool.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

butler.clone() does sync IO? What is the nature of this IO?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a "default data ID" feature (long story) that gets triggered when you pass the collections argument to the Butler constructor or Butler.clone(). That hits the database to find out which data IDs are associated with the collections.

The default data ID feature is the main thing that can be triggered in either place. But also the constructor can incidentally go hit the disk or network to load miscellaneous configuration files etc, and it's hard to predict exactly when that will happen because the I/O is scattered across multiple layers of the code and in some cases is triggered by dynamic loading of classes based on strings in the configuration files (or implicitly loaded configuration files).

Basically anything you ever do with the Butler should be done in a thread pool, full stop.

@tcjennings tcjennings force-pushed the tickets/DM-48690/butler_pool branch from e8a8e7d to 584b3fe Compare April 11, 2025 15:08
@tcjennings tcjennings merged commit 66b109e into tickets/DM-49324/release Apr 11, 2025
7 checks passed
@tcjennings tcjennings deleted the tickets/DM-48690/butler_pool branch April 11, 2025 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants