Skip to content

Conversation

@aforsythe
Copy link
Collaborator

Update rawtoaces license to Apache 2.0

@aforsythe aforsythe requested review from antond-weta and lgritz April 21, 2025 19:22
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Apr 21, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, exactly what we need.

See my comment about how the THIRD-PARTY file is actually including more than it needs to. In particular, I'm worried that this confusingly looks like you've incorporated libraw code into this project, and since it uses LGPL, that means that this whole project would need to be LGPL in order to be distributed. But I believe that's not actually what this project does -- if rawtoaces is merely using libraw as a dependency that is dynamically linked, it doesn't affect any of the other licensing, so we shouldn't take any chances that we give the impression that it does.

@antond-weta
Copy link
Contributor

Do we also need to put a license in the data files as well?
Specifically, does the data in
data/cmf/cmf_1931.json belong to CIE, and the data in data/illuminant/iso7589_stutung_380_780_5.json belong to ISO?

@lgritz
Copy link

lgritz commented Apr 21, 2025

Do we also need to put a license in the data files as well? Specifically, does the data in data/cmf/cmf_1931.json belong to CIE, and the data in data/illuminant/iso7589_stutung_380_780_5.json belong to ISO?

If the data file is a format that has provision for some kind of "comment" or other way to unobtrusively give an attribution, then yes, if possible, it should say where it comes from and who owns it. Just like what Alex did with the source code files here, the comment can just be 2 lines: one acknowledging the rights holder, and a second with the SPDX-License-Identifier: tag that identifies the license. This makes it easy for SBOM (software bill of materials) tools, including the one we use annually to scan the ASWF projects for license issues, to quickly inventory all the files and know what mix of licenses are represented.

Some data files don't have any way to be so marked, and in that case, it's ok to omit it but maybe mention it somewhere in the project documentation.

@antond-weta
Copy link
Contributor

Yes, there is the "license" field in the json files, which is currently set to null. The data in cmf_1931.json seems to be identical to https://files.cie.co.at/CIE_xyz_1931_2deg.csv, which is distributed under "CC BY-SA 4.0" according to the metadata file https://files.cie.co.at/CIE_xyz_1931_2deg.csv_metadata.json. This probably makes the json file a derivative work, it should also be under "CC BY-SA 4.0"?

There are also some data blobs in the source files, which probably should be excluded from the Apache-2. For example, this

} s_series[54] = {
is, I believe, the same data as https://cie.co.at/datatable/components-relative-spectral-distribution-daylight

@lgritz
Copy link

lgritz commented Apr 21, 2025

This probably makes the json file a derivative work, it should also be under "CC BY-SA 4.0"?

Sounds correct to me. Look up the spdx code for that license, I'm sure it has some standard way to say it, all the CC licenses are in there somewhere. https://spdx.org/licenses/

There are also some data blobs in the source files, which probably should be excluded from the Apache-2. For example, this

Yes, I think it is sufficient to say in a comment, right at the start of the table, "The data in this table is taken from ..." and put the same 2 lines we put at the head of these files identifying the rights holder and license. I think that should make it fairly clear that it's just referring to the table data and not the rest of the file, while still making it so that the license scanners will correctly find it.

- remove thie license references to third party build-time dependencies as their source is not included in the repo

Signed-off-by: Alex Forsythe <[email protected]>
@antond-weta
Copy link
Contributor

Yes, I think it is sufficient to say in a comment, right at the start of the table

Not a lawyer, but embedding of CC BY-SA 4.0 material may not be allowed, as per https://wiki.creativecommons.org/wiki/License_Versions#Application_of_effective_technological_measures_by_users_of_CC-licensed_works_prohibited. Distributing rawtoaces in binary form would restrict the users' access to the original data.

It won't be hard to split out the tables into separate JSON files, if needed.

That is assuming that the data is indeed under CC BY-SA 4.0. I do think we should clarify the ownership/license of each piece of data.

@lgritz
Copy link

lgritz commented Apr 22, 2025

@antond-weta We can check with LF legal if you want, but my guess is that while it would be nice to have a comment with a citation for the original source of the data (1931 blah blah), you probably don't need any license attribution there because the raw data in this small table is just a widely available list of facts, not "expressive content" that would be subject to copyright.

@aforsythe
Copy link
Collaborator Author

Do we also need to put a license in the data files as well? Specifically, does the data in data/cmf/cmf_1931.json belong to CIE, and the data in data/illuminant/iso7589_stutung_380_780_5.json belong to ISO?

We'd previously talked about moving the data here : https://github.com/AcademySoftwareFoundation/rawtoaces-data

@aforsythe
Copy link
Collaborator Author

There are also some data blobs in the source files

Seems like we should reference external files and make variable. Hard coding this stuff into the source doesn't feel right.

@antond-weta
Copy link
Contributor

Should we go ahead with this PR, and I'll make the changes removing the data blobs and switching to using https://github.com/AcademySoftwareFoundation/rawtoaces-data as a separate PR?

Copy link
Contributor

@antond-weta antond-weta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@lgritz
Copy link

lgritz commented Apr 22, 2025

Should we go ahead with this PR, and I'll make the changes removing the data blobs and switching to using https://github.com/AcademySoftwareFoundation/rawtoaces-data as a separate PR?

I think so, yes. We can continue to revise to address any concerns about the data blobs (which, in any case, are made no worse by anything in this PR, so there's no need to hold it up).

@antond-weta antond-weta merged commit 600d04d into AcademySoftwareFoundation:master Apr 22, 2025
24 checks passed
@antond-weta
Copy link
Contributor

We'd previously talked about moving the data here : https://github.com/AcademySoftwareFoundation/rawtoaces-data

@aforsythe, rawtoaces-data is currently a private repo. Does it require more work on your side, or can we make it public?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants