-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Currently, checksync parses the files fresh on each run. On large codebases, this is pretty slow. There is the --fromCache
mode but this takes a complete parse-set and doesn't track changes to the parsed files; it's just a snapshot of the parsed state for a given run.
Instead, tools like babel will cache the processing of each file and then use the cache when it can. If a file is deemed to have changed, the cache is updated, otherwise the cache is used.
Cache format
This requires us to parse files in a mostly configuration agnostic manner. What files we parse and ignore would need to be adhered to, but anything that affects the output errors should not affect the cache format.
Cache invalidation
The simplest approach would be to look at file modification times when compared with the cached equivalent. However, if this isn't reliable, a one-way hash of the file contents could be used - it would slow down the first run, but as long as the hash calculation plus reuse of cached files is faster than a full parse, it would still be a win.
So, checksync would look in the cache for a parsed state of a given file, and if it is there, and it is considered "up-to-date", it would use that instead of re-parsing that file.
Other considerations
#887 and cross-repo "local" tags
Having this on-disk cache approach opens the door to making #887 a reality by creating a snapshot of the parsed state of a repo that some other checksync run can reference when validating its own tags.
--fromCache
and --outputCache
These options should be deleted when implementing this, since the on-disk cache would remove the need for them.
Forcing cache clearing
There should be a mechanism for ignoring or clearing the cache explicitly. Perhaps, a --clearCache
arg and/or a --ignoreCache
arg...or a --cache=clear
, --cache=ignore
type pattern.
Error reporting
This could open the door to more expressive errors. For example, some errors may affect different lines of code. Currently, we only report the line of the tag with the error, but we may want to reference the first tag in a batch of tags, as well as the tag with the error, as in this case. This is technically possible now, but the implementation and architecture don't facilitate it. Any refactoring and redesign done to support a cache could make this easier.