When ingesting adverts (after a sync), we traverse the chain from the sync'd head until we find an advert that has already been processed ingest.go#L951.
If an advert has not already been processed, we add it to a map of adverts to process keyed by advert provider.
We later iterate over the adverts grouped by provider ingest.go#L1015. Per usual Go map iterations, the order is random.
Each iteration calls ingestWorkerLogic (ingest.go#L1034) where it iterates over the adverts from oldest to newest. As it ingests, it marks each advert "processed" ingest.go#L1171 and as a side effect, marks each advert as the latest advert ingested ingest.go#L555 per publisher.
Marking adverts as processed is problematic in failure cases - adverts for other providers in the map may appear earlier in the chain, and yet a later advert has been marked as processed and the latest advert.
In short, this can cause the ingestion process to skip chunks of a chain permanently. For example, when ingesting adverts for one provider fails (e.g. network error), and a subsequent provider's adverts that exist later in the chain are ingested successfully. When this happens, the next time ingest runs, it'll terminate at the latest advert recorded, which is after adverts that have not yet been ingested.
I hope I am reading the code correctly. It is not easy to follow. I'd appreciate if someone could confirm this is an issue.
When ingesting adverts (after a sync), we traverse the chain from the sync'd head until we find an advert that has already been processed ingest.go#L951.
If an advert has not already been processed, we add it to a map of adverts to process keyed by advert provider.
We later iterate over the adverts grouped by provider
ingest.go#L1015. Per usual Go map iterations, the order is random.Each iteration calls
ingestWorkerLogic(ingest.go#L1034) where it iterates over the adverts from oldest to newest. As it ingests, it marks each advert "processed"ingest.go#L1171and as a side effect, marks each advert as the latest advert ingestedingest.go#L555per publisher.Marking adverts as processed is problematic in failure cases - adverts for other providers in the map may appear earlier in the chain, and yet a later advert has been marked as processed and the latest advert.
In short, this can cause the ingestion process to skip chunks of a chain permanently. For example, when ingesting adverts for one provider fails (e.g. network error), and a subsequent provider's adverts that exist later in the chain are ingested successfully. When this happens, the next time ingest runs, it'll terminate at the latest advert recorded, which is after adverts that have not yet been ingested.
I hope I am reading the code correctly. It is not easy to follow. I'd appreciate if someone could confirm this is an issue.