-
Notifications
You must be signed in to change notification settings - Fork 20
Feat: change ingester data insertion policy #1638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat: change ingester data insertion policy #1638
Conversation
2e861d1 to
f761ab5
Compare
f761ab5 to
8f5cafe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR changes the ingester's data insertion policy from "never update" to "only update null fields", controlled by a new PRIO_DB environment variable. The implementation replaces Django's ORM bulk_create with raw SQL queries using PostgreSQL's ON CONFLICT clause with COALESCE operations to selectively update fields based on the priority policy.
Key changes:
- Added dynamic SQL query generation function
_generate_model_insert_querythat creates upsert queries with field-level update control - Replaced
bulk_createwith raw SQL execution usingcursor.executemany()inconsume_buffer - Introduced
PRIO_DBenvironment variable to toggle between database-priority and data-priority update modes
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| backend/kernelCI_app/constants/ingester.py | Adds PRIO_DB constant with documentation (contains typo in env var name) |
| backend/kernelCI_app/models.py | Adds module docstring noting explicit id column requirement for ingester |
| backend/kernelCI_app/management/commands/helpers/kcidbng_ingester.py | Implements dynamic SQL query generation and replaces ORM with raw SQL for data insertion |
| backend/kernelCI_app/tests/unitTests/commands/monitorSubmissions/kcidbng_ingester_test.py | Updates test to mock database connections instead of Django ORM methods |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
backend/kernelCI_app/management/commands/helpers/kcidbng_ingester.py
Outdated
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/helpers/kcidbng_ingester.py
Outdated
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/helpers/kcidbng_ingester.py
Outdated
Show resolved
Hide resolved
| out(f"Unknown table '{table_name}' passed to consume_buffer") | ||
| raise | ||
|
|
||
| updateable_model_fields, query = _generate_model_insert_query(table_name, model) |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _generate_model_insert_query function is called on every consume_buffer invocation, which means the same query is regenerated repeatedly for the same table. As noted in the PR description's "Future changes" section, consider caching the generated queries per table to avoid this overhead. The query structure is static for each model and only depends on PRIO_DB, which is set at startup.
| if isinstance(value, (dict, list)): | ||
| value = json.dumps(value) |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type check isinstance(value, (dict, list)) only handles JSON serialization for dicts and lists, but doesn't handle other types that might need special handling (e.g., datetime objects, Decimal, custom objects). Consider adding more comprehensive type handling or documenting the assumption that only dicts/lists need JSON serialization.
Now allows for overwrites of null data Closes kernelci#1552
8f5cafe to
8c15328
Compare
| INGEST_QUEUE_MAXSIZE = 5000 | ||
|
|
||
|
|
||
| PRIO_DB = is_boolean_or_string_true(os.environ.get("PRIO_DB", "True")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding PRIO_DB is bringing a lot of complexity to this pull request. If we decide to change the policy again in the future, we can send another PR.
I believe we can assume you aren't going to need it, though.
| mock_file_open.assert_called_once() | ||
|
|
||
|
|
||
| class TestGenerateInsertQuery: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, I never read this file before, it's quite high-level and scary.
Do you think you could add a test case achieving something close to:
input = json.loads(Path("fixtures/inputs/kcidb.json").read_text())
expect_data = json.loads(Path("fixtures/outputs/tables.json").read_text())
with self.assertNumQueries(5):
ingest(input)
result_data = {
"checkouts": Checkout.objects.all().values(),
"builds": Build.objects.all().values(),
"tests": Test.objects.all().values(),
"issues": Issue.objects.all().values(),
"incidents": Incident.objects.all().values(),
}
self.assertEqual(result_data, expected_data)
Description
Changes the insertion policy of the ingester from "never update" to "only update null fields". This follows the policy adopted by kcidb, including a
PRIO_DBenv var which tells which data to prioritize.If
PRIO_DBis set toTrueit will only allow updates on null data (allownull -> other valuebut notexisting value -> other valueand notexisting_value -> null).If set to
Falseit will allow overwriting existing data as long as the new data is not null (allowexisting value -> other valueand allownull -> other valuebut notexisting value -> null).Changes
consume_bufferinstead of Django'sbulk_createHow to test
Start the ingester and insert some data to it referencing the same object, check if the policy is respected.
Example realistic data for testing can be found here, but other tests are encouraged.
Future changes
Closes #1552