Skip to content

Commit 72fb57f

Browse files
committed
fill in NEWS
1 parent f33fc4c commit 72fb57f

File tree

1 file changed

+57
-5
lines changed

1 file changed

+57
-5
lines changed

NEWS

+57-5
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,62 @@
1-
2.0 - Daniel J. Hicks
1+
# Changelog
22

3-
- Evelyn Brister manually reviewed the dataset for accuracy, focusing on fixing name and gender attribution issues. These manual fixes have been seamlessly incorporated into the release dataset. (issue #12)
3+
All notable changes to this project will be documented in this file.
44

5-
- The extraneous URL field has been removed. (issue #11)
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
67

7-
- Several empty, nearly empty, redundant, or undocumented columns have been removed. In particular, the only list column in the publication formats is the author data, and there are no list columns in the author formats. This means there is minimal difference in the data coverage of the CSV and Rds files.
88

9-
- Universal Numeric Fingerprints (UNF) are now used to aid versioning. Each release of the dataset will be accompanied by a spreadsheet of UNF values. Your local copy of the dataset can be validated by generating the UNF value (using the R package `UNF` or the Python library `python-unf` <https://github.com/chaselgrove/python-unf>) and comparing it to the documented values. For documentation of the underlying algorithm, the advantages of UNF, and instructions on how to format data citations using UNF, see the vignettes for the `UNF` package at <https://cran.r-project.org/package=UNF> and the Dataverse Project guidelines at <http://guides.dataverse.org/en/latest/developers/unf/index.html>. (issue #6)
9+
## [Unreleased]
10+
11+
## [2.0] - 2019-11-11
12+
13+
### Added
14+
- This NEWS file
15+
- Universal Numeric Fingerprints (UNF) are now used to support dataset validation. The file `unf.csv` gives UNF hash strings for each dataset format, size, and file format. By comparing these hash strings to working datasets, users can confirm which version of the dataset they are using.
16+
- UNF are implemented using the `UNF` package in R. <https://cran.r-project.org/web/packages/UNF/index.html>
17+
- For a brief introduction to UNF, see <https://cran.r-project.org/web/packages/UNF/vignettes/citation.html>
18+
- The following block briefly illustrates the use of UNF in practice:
19+
20+
```{r}
21+
library(UNF)
22+
23+
## UNF value for publications-philosophy of science-Rds v2.0
24+
unf_value = 'nJaKSRjMpMV1zYGoOPFRlQ=='
25+
26+
pub_level = readRDS('publications_philsci.Rds')
27+
pub_level_unf = unf(pub_level, version = 6, digits = 3, timezone = 'UTC')
28+
29+
identical(pub_level_unf$unf, unf_value)
30+
```
31+
32+
### Removed
33+
- Several redundant or (almost entirely) empty/NA columns were removed.
34+
- Redundant `URL` column; cf <https://github.com/dhicks/comp-HOPOS/issues/11>
35+
- `member`, `prefix`, `score`, `source`, `subject`, `archive`, `authenticated.orcid`, `affiliation1.name`, `affiliation2.name`, `affiliation3.name`, `affiliation4.name`, `name`, `funder`, `assertion`
36+
- Evelyn Brister manually identified and removed numerous non-article documents, such as tables of contents and book reviews.
37+
- Evelyn Brister manually identified authors who qualified as philosophers of science using the threshold criterion (i.e., 2 or more papers in a primary venue) but who primarily worked in other areas of philosophy. These authors are:
38+
- E. J. Lowe (metaphysics, phil mind, and phil lang.)
39+
- H B Acton (political philosophy)
40+
- Alasdair MacIntyre (ethics)
41+
- V. J. McGill
42+
- Jan Narveson (political theory)
43+
- Patrick Nowell-Smith (moral theory)
44+
- Daniel J O’Connor (philosophy of education)
45+
46+
## Fixed
47+
- Evelyn Brister manually reviewed names and gender attribution, fixing issues related to initialization, misspellings, and incorrect or missing gender attribution (based on presentation on faculty websites, etc.).
48+
- cf <https://github.com/dhicks/comp-HOPOS/issues/12>
49+
50+
## Changed
51+
- The "philosophy of science" dataset size is now filtered by year, and includes only documents published between 1930 and 2017. The first primary philosophy of science venue (the first version of *Erkenntnis*) began publication in 1930, so our approach identifies very few "philosophers of science" prior to this year.
52+
53+
54+
55+
## [1.1] - 2018-08-26
56+
### Fixed
57+
This release fixes a substantial error that appeared when combing the gender attributions with the article metadata.
58+
59+
In v1.0, problems with the join logic when combining the results of the gender attribution algorithms (in script 06) meant that ~150 rows in the gender attribution dataframe had NA for both given and family names. All ~150 then matched to NA/NA author names in the article dataframe. The result was a massive inflation in the size of the dataset, and a mean of 26 authors per paper. Anyone familiar with philosophy should recognize this is incorrect.
60+
61+
Fixing the join logic in 06 appears to have solved the problem. Author inflation has disappeared. (In script 07, authors_unfltd has the same number of rows as authors_full.) In the full dataset, about 78% of papers have just 1 author; this is about 92% in the philosophy of science dataset.
1062

0 commit comments

Comments
 (0)