Skip to content

Commit 25649ec

Browse files
authored
Merge pull request #99 from MITLibraries/TIMX-227-springshare-sources
Add new Springshare sources Libguides and Research Databases
2 parents 2f7fdaa + b998a02 commit 25649ec

24 files changed

+1210
-20
lines changed

Pipfile

+2
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ click = "*"
1010
lxml = "*"
1111
sentry-sdk = "*"
1212
smart-open = {version = "*", extras = ["s3"]}
13+
python-dateutil = "*"
14+
types-python-dateutil = "*"
1315

1416
[dev-packages]
1517
bandit = "*"

Pipfile.lock

+28-20
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# 1. Springshare Source Naming
2+
3+
Date: 2023-07-26
4+
5+
## Status
6+
7+
Proposed
8+
9+
## Context
10+
11+
While working on adding two new sources to TIMDEX pipeline, there was some discussion and constraints around what source names should be used.
12+
13+
Both data sources are both from Springshare, Libguides and the AZ list of databases, and are retrieved via OAI-PMH.
14+
15+
At this time, source names are a string that accompany the records throughout the TIMDEX pipeline:
16+
* `transmogrifier`: defined in `transmogrifier.config.SOURCES`
17+
* drives what transformer class to use
18+
* saved to TIMDEX record as field
19+
* used for S3 key (folder structure + filename)
20+
* `timdex-pipeline-lambdas`: defined in `lambdas.config.INDEX_ALIASES`
21+
* promotes a newly created index to specific aliases if configured
22+
* used for S3 key (folder structure + filename)
23+
* `timdex-index-manager`: defined in `tim.config.VALID_SOURCES`
24+
* prevents indexing of sources if not present in this list
25+
* used for index name created in OpenSearch
26+
27+
Two distinct areas of consideration emerged when deciding on a source name:
28+
* **meaningful**
29+
* does it suggest what the original data source is?
30+
* does it have value or meaning to end users of the API?
31+
* **technically viable**
32+
* does it have special characters? are they allowed?
33+
* does it result in predictable S3 key naming conventions throughout?
34+
* is it an allowed OpenSearch index name?
35+
36+
## Decision
37+
38+
The following source names were decided on:
39+
* `libguides`: the Libguides data source
40+
* oai set: `guides`
41+
* `researchdatabases`: the AZ list databases
42+
* oai set: `az`
43+
44+
### `libguides`
45+
46+
Pretty self-explanatory, satisfies both "meaningful" and "technically viable" requirements.
47+
48+
### `researchdatabases`
49+
50+
This one was a bit thornier.
51+
52+
It was suggested that `az` was not terribly helpful for understanding where the data came from, and was very unhelpful for end users.
53+
54+
The first agreed upon alternative was `research_databases`. `databases` was also floated, but could be ambiguous from the POV of an end user.
55+
56+
For a variety of reasons, attempting to keep these words distinct in the name failed: `research_databases`, `research-databases`, and `researchDatabases`. The reasons are outlined in [this Jira ticket comments](https://mitlibraries.atlassian.net/browse/TIMX-19?focusedCommentId=107019):
57+
* `research_databases`: index name not correctly parsed in `timdex-index-manager`
58+
* `research-databases`: files not saved correctly to S3 in `timdex-pipeline-lambdas`
59+
* `researchDatabases`: not a valid Opensearch index name
60+
61+
And so, the final decided upon source name was `researchdatabases`; no hyphens, underscores, or camelCasing.
62+
63+
## Consequences
64+
65+
The source name `researchdatabases` reflects some compromises that must be made for sources:
66+
* if the source name is meaningful to end users, it may lose fidelity about the source origin
67+
* if the source name is technically viable, it may lose some human readability
68+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<records>
3+
<record xmlns="http://www.openarchives.org/OAI/2.0/"
4+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
5+
<header>
6+
<identifier>oai:libguides.com:guides/175846</identifier>
7+
<datestamp>2023-05-31T19:49:21Z</datestamp>
8+
<setSpec>guides</setSpec>
9+
</header>
10+
<metadata>
11+
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
12+
xmlns:dc="http://purl.org/dc/elements/1.1/"
13+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
14+
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
15+
<dc:title>Materials Science &amp; Engineering</dc:title>
16+
<dc:creator>Ye Li</dc:creator>
17+
<dc:subject>Engineering</dc:subject>
18+
<dc:subject>Science</dc:subject>
19+
<dc:description>Useful databases and other research tips for materials science.</dc:description>
20+
<dc:publisher>MIT Libraries</dc:publisher>
21+
<dc:date>2008-06-19 17:55:27</dc:date>
22+
<dc:identifier>https://libguides.mit.edu/materials</dc:identifier>
23+
</oai_dc:dc>
24+
</metadata>
25+
</record>
26+
</records>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<records>
3+
<record xmlns="http://www.openarchives.org/OAI/2.0/"
4+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
5+
<header>
6+
<identifier>oai:libguides.com:guides/175846</identifier>
7+
<datestamp>2023-05-31T19:49:21Z</datestamp>
8+
<setSpec>guides</setSpec>
9+
</header>
10+
<metadata>
11+
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
12+
xmlns:dc="http://purl.org/dc/elements/1.1/"
13+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
14+
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
15+
<dc:creator>Ye Li</dc:creator>
16+
<dc:subject>Engineering</dc:subject>
17+
<dc:subject>Science</dc:subject>
18+
<dc:description>Useful databases and other research tips for materials science.</dc:description>
19+
<dc:publisher>MIT Libraries</dc:publisher>
20+
<dc:date>2008-06-19T17:55:27</dc:date>
21+
</oai_dc:dc>
22+
</metadata>
23+
</record>
24+
</records>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<records>
3+
<record xmlns="http://www.openarchives.org/OAI/2.0/"
4+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
5+
<header>
6+
<identifier>oai:libguides.com:guides/175846</identifier>
7+
<datestamp>2023-05-31T19:49:21Z</datestamp>
8+
<setSpec>guides</setSpec>
9+
</header>
10+
<metadata>
11+
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
12+
xmlns:dc="http://purl.org/dc/elements/1.1/"
13+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
14+
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
15+
<dc:title>Materials Science &amp; Engineering</dc:title>
16+
<dc:creator></dc:creator>
17+
<dc:subject></dc:subject>
18+
<dc:subject></dc:subject>
19+
<dc:description></dc:description>
20+
<dc:publisher></dc:publisher>
21+
<dc:date></dc:date>
22+
<dc:identifier>https://libguides.mit.edu/materials</dc:identifier>
23+
</oai_dc:dc>
24+
</metadata>
25+
</record>
26+
</records>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<records>
3+
<record xmlns="http://www.openarchives.org/OAI/2.0/"
4+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
5+
<header>
6+
<identifier>oai:libguides.com:guides/175846</identifier>
7+
<datestamp>2023-05-31T19:49:21Z</datestamp>
8+
<setSpec>guides</setSpec>
9+
</header>
10+
<metadata>
11+
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
12+
xmlns:dc="http://purl.org/dc/elements/1.1/"
13+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
14+
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
15+
<dc:title>Materials Science &amp; Engineering</dc:title>
16+
<dc:identifier>https://libguides.mit.edu/materials</dc:identifier>
17+
</oai_dc:dc>
18+
</metadata>
19+
</record>
20+
</records>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<records>
3+
<record xmlns="http://www.openarchives.org/OAI/2.0/"
4+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
5+
<header>
6+
<identifier>oai:libguides.com:guides/175846</identifier>
7+
<datestamp>2023-05-31T19:49:21Z</datestamp>
8+
<setSpec>guides</setSpec>
9+
</header>
10+
<metadata>
11+
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
12+
xmlns:dc="http://purl.org/dc/elements/1.1/"
13+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
14+
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
15+
<dc:title>Materials Science &amp; Engineering</dc:title>
16+
<dc:creator>Ye Li</dc:creator>
17+
<dc:subject>Engineering</dc:subject>
18+
<dc:subject>Science</dc:subject>
19+
<dc:description>Useful databases and other research tips for materials science.</dc:description>
20+
<dc:publisher>MIT Libraries</dc:publisher>
21+
<dc:date>2008-06-19T17:55:27</dc:date>
22+
<dc:identifier>https://libguides.mit.edu/materials</dc:identifier>
23+
</oai_dc:dc>
24+
</metadata>
25+
</record>
26+
</records>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<records>
3+
<record xmlns="http://www.openarchives.org/OAI/2.0/"
4+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
5+
<header>
6+
<identifier>oai:libguides.com:guides/175846</identifier>
7+
<datestamp>2023-05-31T19:49:21Z</datestamp>
8+
<setSpec>guides</setSpec>
9+
</header>
10+
<metadata>
11+
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
12+
xmlns:dc="http://purl.org/dc/elements/1.1/"
13+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
14+
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
15+
<dc:title>Materials Science &amp; Engineering</dc:title>
16+
<dc:creator>Ye Li</dc:creator>
17+
<dc:subject>Engineering</dc:subject>
18+
<dc:subject>Science</dc:subject>
19+
<dc:description>Useful databases and other research tips for materials science.</dc:description>
20+
<dc:publisher>MIT Libraries</dc:publisher>
21+
<dc:date>2008-06-19 17:55:27</dc:date>
22+
<dc:identifier>https://libguides.mit.edu/materials</dc:identifier>
23+
</oai_dc:dc>
24+
</metadata>
25+
</record>
26+
</records>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<records>
3+
<record xmlns="http://www.openarchives.org/OAI/2.0/"
4+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
5+
<header>
6+
<identifier>oai:libguides.com:guides/175846</identifier>
7+
<datestamp>2023-05-31T19:49:21Z</datestamp>
8+
<setSpec>guides</setSpec>
9+
</header>
10+
<metadata>
11+
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
12+
xmlns:dc="http://purl.org/dc/elements/1.1/"
13+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
14+
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
15+
<dc:title>Materials Science &amp; Engineering</dc:title>
16+
<dc:creator></dc:creator>
17+
<dc:subject></dc:subject>
18+
<dc:subject></dc:subject>
19+
<dc:description></dc:description>
20+
<dc:publisher></dc:publisher>
21+
<dc:date></dc:date>
22+
<dc:identifier>https://libguides.mit.edu/materials</dc:identifier>
23+
</oai_dc:dc>
24+
</metadata>
25+
</record>
26+
</records>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<records>
3+
<record xmlns="http://www.openarchives.org/OAI/2.0/"
4+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
5+
<header>
6+
<identifier>oai:libguides.com:guides/175846</identifier>
7+
<datestamp>2023-05-31T19:49:21Z</datestamp>
8+
<setSpec>guides</setSpec>
9+
</header>
10+
<metadata>
11+
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
12+
xmlns:dc="http://purl.org/dc/elements/1.1/"
13+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
14+
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
15+
<dc:title>Materials Science &amp; Engineering</dc:title>
16+
<dc:identifier>https://libguides.mit.edu/materials</dc:identifier>
17+
</oai_dc:dc>
18+
</metadata>
19+
</record>
20+
</records>

0 commit comments

Comments
 (0)