-
Notifications
You must be signed in to change notification settings - Fork 20
feat: hardware denormalization for hardware listing #1614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: hardware denormalization for hardware listing #1614
Conversation
ae2fc62 to
0175d2c
Compare
backend/kernelCI_app/models.py
Outdated
| origin = models.CharField(max_length=100) | ||
| platform = models.CharField(max_length=100) | ||
| compatibles = ArrayField(models.TextField(), null=True) | ||
| start_time = models.IntegerField() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we don't have buckets anymore, keep this as datetime field
backend/kernelCI_app/models.py
Outdated
|
|
||
| class Meta: | ||
| db_table = "hardware_status" | ||
| unique_together = ("checkout_id", "origin", "platform") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have origin, platform first so indexes reaching only them will work.
also need another index on origin, platform, start_time... since we also query based on those.
|
|
||
|
|
||
| class LatestCheckout(models.Model): | ||
| checkout_id = models.TextField(primary_key=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: group the unique together fields, then the other fields (checkout_id and start_time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also not likely, but checkout_id is not a PK, the same commit can be in multiple trees...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkout_id is not a PK, the same commit can be in multiple trees
checkout_id is always a single row in the checkouts table, so the same checkout_id can't be from multiple trees, but we can have multiple checkouts for the same tree, in which case this is the point of the LatestCheckout table, to only store the latest checkout for each tree
backend/kernelCI_app/models.py
Outdated
|
|
||
|
|
||
| class PendingBuild(models.Model): | ||
| id = models.TextField(primary_key=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this id? i guess it's build_id... so name it accordingly to make it easier to correlate in the code.
I'm also thinking if we have guarantees this won't conflict across trees/origins...
backend/kernelCI_app/models.py
Outdated
|
|
||
|
|
||
| class PendingTest(models.Model): | ||
| id = models.TextField(primary_key=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, i guess it's test_id
backend/kernelCI_app/models.py
Outdated
| id = models.TextField(primary_key=True) | ||
| checkout_id = models.TextField() | ||
| status = models.CharField( | ||
| max_length=10, choices=StatusChoices.choices, blank=True, null=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this really be null or blank?
also maybe we convert this into the subset we care (pass/fail/inc), it doesn't need to be the same choice, we can use a reduced subset with single letter DB storage (1 byte instead of a string)
backend/kernelCI_app/models.py
Outdated
| compatible = ArrayField(models.TextField(), null=True) | ||
| build_id = models.TextField() | ||
| status = models.CharField( | ||
| max_length=10, choices=StatusChoices.choices, blank=True, null=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
547b86d to
c36d84d
Compare
c36d84d to
f428a90
Compare
backend/kernelCI_app/management/commands/helpers/aggregation_helpers.py
Outdated
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/helpers/aggregation_helpers.py
Outdated
Show resolved
Hide resolved
63eb9f7 to
aee6f18
Compare
backend/kernelCI_app/management/commands/process_pending_aggregations.py
Outdated
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/process_pending_aggregations.py
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/process_pending_aggregations.py
Outdated
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/process_pending_aggregations.py
Outdated
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/process_pending_aggregations.py
Outdated
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/process_pending_aggregations.py
Outdated
Show resolved
Hide resolved
| valid_checkout_ids = set( | ||
| LatestCheckout.objects.values_list("checkout_id", flat=True) | ||
| ) | ||
|
|
||
| orphaned_entries = HardwareStatus.objects.exclude( | ||
| checkout_id__in=valid_checkout_ids | ||
| ) | ||
| orphaned_count = orphaned_entries.count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this must be inside the transaction
| ) | ||
|
|
||
| total_deleted = 0 | ||
| while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, because of the slice below
| "hardware_key", | ||
| "entity_id", | ||
| "entity_type", | ||
| blank=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank=true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default for composite primary key is blank=True and apparently it cannot be changed to False, didn't found why exactly yet
https://github.com/django/django/blob/main/django/db/models/fields/composite.py#L53-L73
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answering myself:
https://github.com/django/django/blob/stable/5.2.x/django/db/models/fields/composite.py#L50
if not kwargs.setdefault("blank", True):
raise ValueError("CompositePrimaryKey must be blank.")| if status not in simplified_status_to_count: | ||
| return simplified_status_to_count[SimplifiedStatusChoices.INCONCLUSIVE] | ||
| return simplified_status_to_count[status] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how? if all status are in there. But anyway, it's better to use .get(status, SimplifiedStatusChoices.INCONCLUSIVE) than to double-lookup
| is_already_processed = to_process in existing_processed | ||
|
|
||
| if is_already_processed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why assign to variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added just for readability, but yeah, to_process in existing_processed already sounds readable enough
| valid_checkout_ids = set( | ||
| LatestCheckout.objects.values_list("checkout_id", flat=True) | ||
| ) | ||
|
|
||
| orphaned_entries = HardwareStatus.objects.exclude( | ||
| checkout_id__in=valid_checkout_ids | ||
| ) | ||
| orphaned_count = orphaned_entries.count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this must be inside the transaction
| while True: | ||
| with transaction.atomic(): | ||
| batch_ids = list( | ||
| orphaned_entries.values_list("checkout_id", flat=True)[:batch_size] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the values_list("checkout_id", flat=True) you can specify when you create the orphaned_entries, here you just slice
| test, h_key, status_record, existing_processed, new_processed_entries | ||
| ) | ||
|
|
||
| _process_build_status( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only process build if the test was processed (return boolean in _process_test_status and check before this line)
| build, | ||
| h_key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note if we include the test.id, type=TEST in the hw_key above, we'll need to compute a new hw_key for the build, ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create an issue for adding unit tests for this and the other functions added in this PR just so that we don't forget
|
|
||
|
|
||
| class LatestCheckout(models.Model): | ||
| checkout_id = models.TextField(primary_key=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkout_id is not a PK, the same commit can be in multiple trees
checkout_id is always a single row in the checkouts table, so the same checkout_id can't be from multiple trees, but we can have multiple checkouts for the same tree, in which case this is the point of the LatestCheckout table, to only store the latest checkout for each tree
.../kernelCI_app/migrations/0010_pendingtest_processedhardwarestatus_hardwarestatus_and_more.py
Outdated
Show resolved
Hide resolved
| if len(result.hardware) < 1: | ||
| return create_api_error_response( | ||
| error_message=ClientStrings.NO_HARDWARE_FOUND, | ||
| status_code=HTTPStatus.OK, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you check for this len < 1 (btw why not len == 0) with the hardwares_raw instead of letting them go through the model validation? If there is no hardware then I see that there won't be any item to validate, but it seems out of order to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just followed the current approach on hardwareView, but you're right, we can check it earlier
backend/kernelCI_app/management/commands/process_pending_aggregations.py
Outdated
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/process_pending_aggregations.py
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/process_pending_aggregations.py
Outdated
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/process_pending_aggregations.py
Outdated
Show resolved
Hide resolved
backend/kernelCI_app/management/commands/process_pending_aggregations.py
Outdated
Show resolved
Hide resolved
acd80d1 to
eb0fbff
Compare
| h_key = get_hardware_key(test.origin, test.platform, checkout.id) | ||
| contexts.append((test, build, checkout, h_key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we move to include entity_id/type into the get_hardware_key() we must generate 2 h_key, one for the test and another for the build, report both in contexts and keys_to_check
|
|
||
| def _get_existing_processed(keys_to_check: set[bytes]) -> set[ProcessedHardwareStatus]: | ||
| """Fetch existing processed entries from the database.""" | ||
| return set(ProcessedHardwareStatus.objects.filter(hardware_key__in=keys_to_check)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we move to hardware_key to include entity_id/type it will be a PK, so we can change this line as well and .values("hardware_key", flat=True)
| to_process = ProcessedHardwareStatus( | ||
| hardware_key=h_key, | ||
| entity_id=test.test_id, | ||
| entity_type=HardwareStatusEntityType.TEST, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we move to hardware_key to include entity_id/type, then test_h_key is already what we'll check in existing_processed, no need to create the model instance here.
| to_process = ProcessedHardwareStatus( | ||
| hardware_key=h_key, | ||
| entity_id=build.id, | ||
| entity_type=HardwareStatusEntityType.BUILD, | ||
| ) | ||
|
|
||
| if to_process in existing_processed: | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we move to hardware_key to include entity_id/type, then build_h_key is already what we'll check in existing_processed, no need to create the model instance here.
|
|
||
| existing_processed = _get_existing_processed(keys_to_check) | ||
|
|
||
| for test, build, checkout, h_key in contexts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you will get test_h_key and build_h_key, then be careful when calling _process_*_status()
| def process_pending_batch(self, batch_size: int) -> int: | ||
| last_processed_test_id = None | ||
|
|
||
| while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should hold a transaction inside the loop... or outside (more correct), but that may hold the lock for too long, need to check how it behaves with real data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it is safer to keep the transaction inside the loop, with that we don't lose time reprocessing the previous batches if some of them fails to commit
| ready_tests: Sequence[PendingTest], | ||
| ready_builds: dict[str, Builds], | ||
| ) -> int: | ||
| with transaction.atomic(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then this transaction is in process_pending_batch and not here
backend/kernelCI_app/models.py
Outdated
| pk = models.CompositePrimaryKey( | ||
| "hardware_key", | ||
| "entity_id", | ||
| "entity_type", | ||
| ) | ||
| hardware_key = models.BinaryField( | ||
| max_length=32 | ||
| ) # this holds a sha256, thus digest_size = 32 bytes | ||
| entity_id = models.TextField() | ||
| entity_type = models.CharField( | ||
| max_length=1, choices=HardwareStatusEntityType.choices | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should include entity_id/type inside the hardware_key, currently the lookup for hardware_key will return all entities ever processed for that hardware, as the system grows and we never delete ProcessedHardwareStatus entries, the set will be large and this will hurt performance (and memory usage).
maybe we should keep a checkout_id field from a cron job we should remove those that are not related to the latest checkouts, same as we did with delete_unused_hardware_... (or even do it from inside that job)
| checkout_id: str | ||
| test_origin: str | ||
| platform: str | ||
| compatibles: Optional[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this an array of strings?
| def get_hardware_key(origin: str, platform: str, checkout_id: str) -> bytes: | ||
| """Generate a hash (hardware key) from origin, platform, and checkout ID.""" | ||
| return hashlib.sha256(f"{origin}|{platform}|{checkout_id}".encode("utf-8")).digest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add entity_id/type
eb0fbff to
c95ea60
Compare
- Add HardwareStatus model to store aggregated test/build/boot status by platform - Add LatestCheckout model to track latest checkout per origin/tree/repository - Add PendingTest model to queue tests for aggregation processing - Add ProcessedHardwareStatus model to track processed entities - Add SimplifiedStatusChoices enum for aggregated status values - Create database migration for new tables with appropriate indexes
- Add aggregation_helpers module with checkout and test aggregation logic - Integrate aggregation into kcidbng_ingester db_worker - Populate LatestCheckout table with checkout tracking - Populate PendingTest table with test records for processing - Run aggregations before buffer flush to ensure data consistency
- Add process_pending_aggregations management command for backlog processing - Implement batch processing with configurable batch size and loop mode - Add hardware status aggregation from PendingTest queue - Add get_hardware_listing_data_from_status_table query function - Query denormalized hardware_status table for improved performance
c95ea60 to
63a72f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename migration to a more meaningful name
Objective
Implement hardware status denormalization to improve query performance for hardware listing data
Problem Solved
The current hardware listing endpoint suffers from performance issues due to complex joins across multiple tables (checkouts, tests, builds). This PR introduces a denormalized approach to aggregate hardware status data.
Key Changes
process_pending_aggregationsmanagement command for batch processing of pending tests with configurable batch size and loop modeget_hardware_listing_data_from_status_tablefunction to query denormalized data instead of complex joinsHow to Test
1. Run Database Migration
cd backend poetry python manage.py migrate2. Process Test Submissions (Optional - populate test data)
Extract submissions.zip to a folder and run the ingester:
poetry python manage.py monitor_submissions --spool-dir ./{folder-name}3. Test Backlog Processing Command
Single batch processing:
Continuous loop mode:
4. Verify Hardware Status Data
Check that hardware_status table is populated in PostgreSQL using psql or other software that can communicate with the DB
5. Test Hardware Listing Endpoint
Query the hardware listing endpoint and verify improved performance and correct data aggregation.