Skip to content

arpie71/hlapiStata

Repository files navigation

Stata interface for History Lab API

This Stata package offers a command to use the History Lab API in order to download data from our collections.

To install the package, type net install histlabapi , from("https://raw.githubusercontent.com/arpie71/hlapiStata/main/") at Stata's command line.

Note that for Stata versions greater or equal to 16, the command uses a Python module to parse the json output. For versions before 16, Mata is used to parse the output.

Collections

The History Lab API has documents available from 11 different collections, as detailed below. The R functions will search across all collections, unless a specific collection or set of collections is specified.

corpus title starts ends documents
frus Foreign Relations of the United States 1620 1989 311866
cia CIA CREST Collection 1941 2005 935716
clinton Clinton E-Mail 2009 2013 54149
briefing Presidential Daily Briefings 1946 1977 9680
cfpf State Department Central Foreign Policy Files 1973 1979 3214293
kissinger Kissinger Telephone Conversations 1973 1976 4552
nato NATO Archives 1949 2013 46002
un United Nations Archives 1997 2016 192541
worldbank World Bank McNamara Records 1942 2020 128254
cabinet UK Cabinet Papers 1907 1990 42539
cpdoc Azeredo da Silveira Papers 1973 1979 10279

Fields

The API can return many different fields which are described in the table below. By default, the command will only return doc_id, authored, and title.

field title
authored Document date
wikidata_ids Wikidata IDs of entities in document
entgroups Entity types appearing in document
entities Standardized entity names in document
topic_names Top 5 tokens for each topic
topic_scores Topic score for each topic listed
topic_ids ID number for each topic (used to search by topic)
title Document title
classification Original classification of document (if known)
corpus Collection document is from
doc_id Unique document identifier

Command

The command has a number of different options to conduct various types of searches on History Lab documents.

Syntax

histlabapi option ,[date(string) start(string) end(string) text(string) id(string) COLlection(string) fields(string) entitytype(string) entityvalue(string) topicvalue(string) overview(string) sort(string) limit(integer 25) orlist(string)]

Options

There are 6 broad types of queries allowed using the option parameter. Only one option can be specified at a time. For some of the query options, other parameters are available to filter the search.

id - Searches the History Lab collections for a given document ID or list of document IDs. The id parameter is required to give an ID or list of IDs to search. The only optional parameter allowed is fields.

date - Searches the History Lab collections for all documents on a given date (date()) or within a range of dates (start() and end()). The search can be limited by fields, by collection, or by limit.

text - Searches the full-text of documents in the History Lab collections. Searches can be performed on either words or phrases. The collection, date/start/end, and the fields parameters can be used with this query.

entity - Searches the History Lab collections for all documents that contain a given Wikidata ID. With this option, the entityvalue parameters must be used to identify the entity or entities to be searched. Multiple values of entities are allowed. The collection, date/start/end and the fields parameters can also be used with this query.

topic - Searches the History Lab collections for all documents that contain a given topic ID. With this option, the topicvalue parameter must be used to specify the value(s) of the topics to be searched. Note that topic IDs are not unique to a collection so this option is best used with the collection parameter. Also, as discussed below, the find_topic_id command will return topic IDs for a search term. The collection, date/start/end, and the fields parameters can also be used with this query.

overview - Returns an overview of collections, topics, and entities from History Lab. The collections option will return a summary of the different collections processed by History Lab, including the date of the oldest and most recent documents, the number of documents, and the number of pages. The topics option will return a list of all the topics across collections. Each collection has between 40 and 130 topics and the collections option can be used to limit the list of topics to a specific collection. Finally, the entity option returns a list of all entities detected in History Lab collections, as well as the number of times they are mentioned. The search can be limited to specific types through the entitytype option. There are 5 entitytypes available: PERSON, ORG, LOC, GOVT, OTHER.

Parameters

The text parameter provides words or phrases that will be used to search the History Lab collections. Multiple words or multiple phrases should be separated with a comma. The parameter can only be used with the text option.

The collection parameter restricts a date, topic, or text search to a specified collection or list of collections. Multiple collections should be separated with a comma or a space. The list of collections allowed is given above. This option can also be used with the overview search if either collection or topic is specified.

The fields parameter restricts all searches except overview to return the specified field or list of fields. Multiple fields should be separated with a comma or a space. If no fields are specified, the commands will return the authored, doc_id, and title fields only.

The overview parameter is used only with the overview option to specify the entity to return for an overview search. Acceptable values are collections, topics, or entities and only one may be specified with this option at a time.

The id parameter is used only with the id option to provide the ID or list of IDs to search. As with the collection and field parameters, a list of IDs should be separated with a comma or a space. If an invalid ID is entered, the command will not return an error. Instead, the command simply will not return any results for that ID.

The date and start/end parameters can used with the date option, or with the entity, topic, or text options in order to restrict the dates of the documents retrieved to a given day or to a range of dates. Dates should be in numeric format and use either ".","-", or "/" as separators between month, day, and year. The function first looks for a "MDY" or a "YMD" format but it will recognize a "DMY" format for days greater than 12.

The entityvalue parameter is required with an entity search and specifies the Wikidata ID of the entity to search for.

The topicvalue parameter is required with a topic search and specifies the topic ID of the topic to search for.

The entitytype parameter is used with an overview search with the entity option. There are 5 entitytypes available: PERSON, ORG, LOC, GOVT, OTHER.

The sort option is also used with an overview search with the entity option. It sorts the entities returned in Ascending or Descending order of the number of appearances.

The orlist option specifies whether a search should be an "or" search or an "and" search. An or value will search for any of the text terms, Wikidata IDs, or topic IDs specified while an and value finds all of the terms. This option is available only with a text, entity, or topic search.

The limit parameter tells the API how many results to return. The default value is 25 and there is a system maximum of 10,000.

Examples

ID search:

  • histlabapi id , id(1973LIMA07564)
  • histlabapi id , id(frus1969-76ve05p1d11) fields(doc_id,body,title)
  • histlabapi id , id(1973LIMA07564,frus1969-76ve05p1d11) fields(doc_id,body,title)

Overview search:

  • histlabapi overview, overview(collections)
  • histlabapi overview, overview(entities) sort(D)
  • histlabapi overview, overview(entities) entitytype("PERSON") sort(D)
  • histlabapi overview, overview(topics) collection(frus)

Date search:

  • histlabapi date, start(1947-01-01) end(1948-12-01) fields(doc_id,authored,title) limit(100)
  • histlabapi date , collection(cpdoc,kissinger) start(01/1/1975) end(12/01/1975) limit(100)

Entity search:

  • histlabapi entity , collection(frus) entityvalue(Q242)
  • histlabapi entity , entityvalue(Q9588) collection(kissinger)

Full-text search:

  • histlabapi text, collection(cfpf) start(01/01/1950) end(12/31/2000) text(udeac)

  • histlabapi text , collection(frus,cfpf) start(1/1/1950) end(12/31/2000) text(udeac, asean) orlist("or") limit(20)

  • histlabapi text , collection(frus,cfpf) start(1/1/1950) end(12/31/2000) text(united nations) limit(200)

Notes

We also provide two commands to look up Wikidata IDs (for an entity search) and topic IDs (for a topic search.

  • find_wiki_id _anything_ will perform a search of the database and return all entities and Wikidata IDs that partially match a given string.

  • find_topic_id _anything_ will perform a search of the database and return all topics and topic IDs that partially match a given string. The topic names are in all lower case. The command will convert search terms to lower case.

Examples

  • find_wiki_id "China" /* China Wikidata ID is Q148 */

  • find_topic_id "iraq"

About

Stata package to access History Lab API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published