Description
The extractcode doc at https://scancode-toolkit.readthedocs.io/en/stable/tutorials/how_to_extract_archives.html doc doesn't mention the "--ignore" option at all. it's quite an important option to avoid wasting time on unnecessary files and also for preventing extractcode falling over when it encountered an invalid/corrupt archive file that isn't required.
When documenting this flag, it'd be helpful to explain the interaction between the extractcode --ignore and the scancode parameter of the same name. Specifically, having just spent several hours adding debug statements to the source code to understand why my extractcode --ignore globs weren't working, the piece of info that would really help is to know that the extractcode ignores do NOT apply to paths within the archives (e.g. my-archive.tar/tests/foo
is extracted even if I use extractcode --ignore=*/tests/*
) but only to the decision about which archives to unpack.
(aside: I was wondering about create an additional FR for applying extractcode ignores to individual files - could make it a LOT faster if extractcode didn't waste time writing to-be-ignored files such as /tests/ to disk only to be later ignored by scancode... if you think that's a good idea we could create an issue for that too)