Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend list of licenses #6

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

ben-edna
Copy link

@ben-edna ben-edna commented Dec 5, 2021

This PR extends the list of licenses using the text folder of spdx/license-list-data.

I've used our dependency fetching tool dfetch, since I only needed the "text" dir and adding the entire spdx-license repo seemed overkill. This added the dfetch.yaml manifest and .dfetch_data.yaml files. I can remove them if you like.

To check all the trove classifiers are valid and covered, I've added 2 tests that use packaging-classifiers to match the KNOWN_LICENSES list and all available classifiers mentioning License :: both ways.

Open questions

  • Do you mind the dfetch yaml files?
  • I haven't removed the original list of KNOWN_LICENSES, this leads to some duplication, should I merge your existing list of licenses?

@spoorcc
Copy link

spoorcc commented Dec 5, 2021

Note-to-self, use https://pypi.org/project/trove-classifiers/ instead of packaging-classifiers.

@thatch
Copy link
Member

thatch commented Dec 15, 2023

Hi Ben!

Thanks for the PR! I'm so sorry I didn't see this when it was fresh.

My hope is that this a low-deps way to classify ~80% of real-world projects, with the remainder just returning UNKNOWN (which means "human, go figure it out." It needs to be fairly responsive in order to be used in tools like pip-licenses to do better than just trove classifiers, and when I tried importing all of SPDX which slowed things down too much for my liking. "Better than just trove classifiers" is pretty much the goal.

SPDX is fantastic as a repository of known licenses, I agree. I think it's a lot of complexity aimed at getting those last few percent, which tbh you can get with licensedcode today, especially if you're interested in more fuzzy matches.

If you're interested in having a non-nexB solution for this, I'd recommend forking and calling it infer-license-full or so. I'd like to keep this small for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants