Open
Description
The dump schema includes a date_modified
timestamp and other revision metadata.
To reduce disk I/O, we could store some metadata along the articles, compare it against the new one when processing, and skip them if they haven't changed.
One way to do this would be to store the date_modified
timestamp as the modified
attribute of the article file.
Activity
biodranik commentedon Jun 26, 2023
An interesting optimization, but it may not worth it. Need to prove its benefits first. Let's leave it in a very low priority for now.
newsch commentedon Jun 26, 2023
Understood, I've been thinking of it since you mentioned it here, we'll see what the profiling shows for the workflow.