Skip to content

Conversation

@joshheald
Copy link
Contributor

@joshheald joshheald commented Sep 4, 2025

Closes: WOOMOB-1089

Description

This PR adds a simple persistence service to save the full catalog to the database, after it's downloaded.

  1. Deletes all data from the database
  2. Inserts Products, Variations, Attributes, and Images, ignoring any conflicts
  3. If that succeeds, commits the transaction, otherwise rolls back and returns to the pre-deletion database.

Ignoring conflicts

On Large fun testing, I found that all my attempts to persist the data would fail with a primary key conflict.

You'll see that 4977 products are downloaded, but only 4976 persisted – I think that we get one of them twice.

For a large import, I think our best bet is to ignore conflicts. That gives a best-effort at loading the catalog, but if the data is duplicated we'll just keep the first one we get and continue with the import. An alternative would be to specify replace.

For images, this means that we only save one copy of each, with a direct relationship to the first product that we imported it for. Most products and variations don't have images with this implementation, but I'll fix that schema issue in a future PR. We either need a many-to-many join table, or to make copies with synthetic keys as we do for attributes.

Steps to reproduce

Check unit tests pass.

Testing information

This isn't used in the app yet. I added a button to test it – see https://github.com/woocommerce/woocommerce-ios/compare/woomob-1093-woo-poslocal-catalog-save-parsed-full-catalog-into-database-with-button for the branch with that on top.

When you persist everything, log messages tell you what was downloaded, and what was persisted. The detailed log in the message below is the most accurate, as it reads its numbers out of the db.

Screenshots

CleanShot 2025-09-04 at 14 19 49@2x

This is from Large fun testing. There genuinely are only 100 images in the data set, and only 101 on the site – they're shared between products and variations. That's not properly supported by our schema yet, but I'll fix it in a future PR.


  • I have considered if this change warrants user-facing release notes and have added them to RELEASE-NOTES.txt if necessary.

@joshheald joshheald added this to the 23.3 milestone Sep 4, 2025
@joshheald joshheald added type: task An internally driven task. status: feature-flagged Behind a feature flag. Milestone is not strongly held. labels Sep 4, 2025
@joshheald joshheald requested a review from jaclync September 4, 2025 13:28
@wpmobilebot
Copy link
Collaborator

wpmobilebot commented Sep 4, 2025

App Icon📲 You can test the changes from this Pull Request in WooCommerce iOS Prototype by scanning the QR code below to install the corresponding build.

App NameWooCommerce iOS Prototype
Build Numberpr16087-76b396b
Version23.2
Bundle IDcom.automattic.alpha.woocommerce
Commit76b396b
Installation URL1ulvea1j4dft0
Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

Copy link
Contributor

@jaclync jaclync left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and tests well! :shipit: Sounds good to update the relationship handling separately.

You'll see that 4977 products are downloaded, but only 4976 persisted – I think that we get one of them twice.

This is quite strange, I saw 4977 products in wp-admin and spent a bit of time checking what's going on by comparing the product IDs in the database and from the API / wp-admin. It turned out product ID 17300 appeared twice in both the API and wp-admin:

the end of page 5 (20 products per page) the beginning of page 6
Screenshot 2025-09-05 at 2 39 21 PM Screenshot 2025-09-05 at 2 39 38 PM

It feels like some bug in core? wp-admin fetches 20 per page and the API fetches 100 per page, might not be due to pagination. But at least we know it's not something on the client side, and we will keep potential duplicates in mind.

import class Networking.AlamofireNetwork
import class Networking.POSCatalogSyncRemote
import CocoaLumberjackSwift
import Storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: maybe can import just POSCatalogPersistenceServiceProtocol and POSCatalogPersistenceService instead of the whole Storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually for GRDBManagerProtocol, the persistence service is in Yosemite anyway so doesn't need to be imported.

I'm not sure that granular imports help us much here. AIUI, we do it from the main app to make it easier to split POS to a new app, if we ever do that. Storage is much simpler, and we implicitly depend on most of it via the storage model typealiases.

Perhaps a different question: Should GRDB be in a new module of its own? Maybe it would be best to do that now, in a new PR but before we add more to it. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that granular imports help us much here. AIUI, we do it from the main app to make it easier to split POS to a new app, if we ever do that. Storage is much simpler, and we implicitly depend on most of it via the storage model typealiases.

My understanding for granular imports is so that it's clear which (minimum) dependencies are required for the file. It's more useful when the imported module is bigger though, like Yosemite. For Storage I think it's fine, just a nit preference.

Perhaps a different question: Should GRDB be in a new module of its own? Maybe it would be best to do that now, in a new PR but before we add more to it. WDYT?

Were you thinking of keeping GRDB code in a separate module, and Storage imports it for its usage with potential abstraction in the interface with Yosemite? Was it because the GRDB code might get complex, or any other reasons? Right now, I think it's fine for GRDB to be within Storage as an implementation of the protocols. Would like to hear any strong reasons for moving it to a separate module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you thinking of keeping GRDB code in a separate module, and Storage imports it for its usage with potential abstraction in the interface with Yosemite? Was it because the GRDB code might get complex, or any other reasons? Right now, I think it's fine for GRDB to be within Storage as an implementation of the protocols. Would like to hear any strong reasons for moving it to a separate module.

No, I was thinking a new sibling module for Storage, e.g. GRDBStorage (bad name, but for discussion).

Our new persisted models don't rely on anything pre-existing in Storage. Now's the easiest time to separate them so that we don't end up with anything cutting across both types of storage.

By putting them in a separate module, we could call them Product, ProductVariation etc, at least within GRDBStorage. If we split POS into a separate app, we might be able to leave Storage behind (unlikely, but not impossible.)

I don't have super-strong arguments beyond that. Keeping it in Storage is fine... but other than the name, and saving a little effort adding a module, there's no particular reason it has to be in there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the current Storage is more like CoreDataStorage. Makes sense to me. We could put it up for discussion and fit this after the main tasks are done.

From development so far, it's not a strong need, but it might be great to have more abstraction on the interface between the (GRDB)Storage and Yosemite layers.

Comment on lines +246 to +248
private(set) var replaceAllCatalogDataCallCount = 0
private(set) var lastPersistedCatalog: POSCatalog?
private(set) var lastPersistedSiteID: Int64?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing test cases that assert on these mock persistence service variables will be added in a future PR?


// When
let service = POSCatalogFullSyncService(syncRemote: mockSyncRemote, batchSize: customBatchSize)
let service = POSCatalogFullSyncService(syncRemote: mockSyncRemote, batchSize: customBatchSize, persistenceService: mockPersistence)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we use mockPersistenceService here, instead of a separate mockPersistence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. I think this was from some tidying up or something.

Comment on lines 124 to 126
// One or both images should be saved (depending on conflict resolution)
#expect(imageCount >= 1)
#expect(imageCount <= 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, this would be resolved when we support either many-to-many relationship with a join table or a composite primary key right? as mentioned in the description:

For images, this means that we only save one copy of each, with a direct relationship to the first product that we imported it for. Most products and variations don't have images with this implementation, but I'll fix that schema issue in a future PR. We either need a many-to-many join table, or to make copies with synthetic keys as we do for attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Right now, only one will be saved, because we use the imageID from the network as the primary key id on the table. However, we don't have a join table, so it'll only belongTo the first product which it's added for – the second won't have an image. That's the ignore conflict resolution.

We can either make it many to many with a join table, or make copies for every product with a synthetic primary key.

If we make it many to many, then one image would still be saved. If we make copies, two images would be saved.

For now I'll simplify this test to check for exactly one.

}
}

@Test func replaceAllCatalogData_removes_related_images_and_attributes() async throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! do we also want to add similar setup/asserts for variation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure: b1ca363

@joshheald joshheald enabled auto-merge September 5, 2025 10:17
@joshheald joshheald merged commit 7688160 into trunk Sep 5, 2025
14 checks passed
@joshheald joshheald deleted the woomob-1093-woo-poslocal-catalog-save-parsed-full-catalog-into-database branch September 5, 2025 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: feature-flagged Behind a feature flag. Milestone is not strongly held. type: task An internally driven task.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants