Skip to content

Improve doc for extractcode --ignore option #59

Open
@ben-c8y

Description

@ben-c8y

The extractcode doc at https://scancode-toolkit.readthedocs.io/en/stable/tutorials/how_to_extract_archives.html doc doesn't mention the "--ignore" option at all. it's quite an important option to avoid wasting time on unnecessary files and also for preventing extractcode falling over when it encountered an invalid/corrupt archive file that isn't required.

When documenting this flag, it'd be helpful to explain the interaction between the extractcode --ignore and the scancode parameter of the same name. Specifically, having just spent several hours adding debug statements to the source code to understand why my extractcode --ignore globs weren't working, the piece of info that would really help is to know that the extractcode ignores do NOT apply to paths within the archives (e.g. my-archive.tar/tests/foo is extracted even if I use extractcode --ignore=*/tests/*) but only to the decision about which archives to unpack.

(aside: I was wondering about create an additional FR for applying extractcode ignores to individual files - could make it a LOT faster if extractcode didn't waste time writing to-be-ignored files such as /tests/ to disk only to be later ignored by scancode... if you think that's a good idea we could create an issue for that too)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions