Time | Topic | Relevant Links |
---|---|---|
11:15-12:45 | Introductions, Overview of the Course, Classroom Expectations | |
12:45-2:15 | Lunch | |
2:15-3:45 | Determine the Data and Object of Study | |
7:30 | Communal Dinner |
Homework:
Time | Topic | Relevant Links |
---|---|---|
9:15-10:45 | What is Humanities Data?; Data Database Activity | |
10:45-11:15 | Break | |
11:15-12:45 | Metadata ; Build Movie Data Set Activity | |
12:45-2:15 | Lunch | |
2:15-3:45 | Project Sessions | |
3:45-4:15 | Break | |
4:15-5:45 | Lecture |
Time | Topic | Relevant Links |
---|---|---|
9:15-10:45 | Teaser Session | |
10:45-11:15 | Break | |
11:15-12:45 | Metadata, Brief Viz | |
12:45-2:15 | Lunch | |
2:15-3:45 | Tidy Data, Rebuild the Data Set | |
3:45-7:15 | Field Trip | |
7:30 | Communal Dinner |
Time | Topic | Relevant Links |
---|---|---|
9:15-10:45 | Tidy Data | ISO, LOC Genre |
10:45-11:15 | Break | |
11:15-12:45 | Data | |
12:45-2:15 | Lunch | |
2:15-3:45 | Projects | |
3:45-7:15 | Field Trip | |
8:00 | Communal Dinner |
Time | Topic | Relevant Links |
---|---|---|
9:15-10:45 | Open Refine/ Assessing Your Data | |
10:45-11:15 | Break | |
11:15-12:45 | Reflection / Slam Prep | LOC Subject Headings |
12:45-2:15 | Lunch | |
2:15-3:45 | Slam | |
3:45-4:15 | Break | |
3:45-5:45 | Field Trip | |
8:00 | Communal Dinner |
- Be kind, honest, and candid when expressing an opinion.
- Support building a collaborative environment including different styles of workshops such as working in groups.
- Courageous community in which experimenting is encouaged. Be brave!
- Signaling to the community by raising hands before talking.
- Time at end of class for questions.
- Time to experiment with tools.
- A balance between theory and practice.
- Support interdisciplinarity.
- Draw on our own examples and experiences while respecting the feedback.
- Make sure everyone is on the same page.
- All questions are good question and encouraged!
- I can ask my neighbor for help.
- Everyone has an opportunity to complete their sentences.
Time | Topic | Relevant Links |
---|---|---|
9:15-10:45 | Intro, Overview, Text Analysis | |
10:45-11:15 | Break | |
11:15-12:45 | Voyant | Projects using Voyant |
12:45-2:15 | Lunch | |
2:15-3:45 | Projects | |
3:45-4:15 | Break | |
4:15-5:45 | Lecture |
Time | Topic | Relevant Links |
---|---|---|
9:15-10:45 | Viz - Timelines | |
10:45-11:15 | Break | |
11:15-12:45 | Mapping - StoryMap | |
12:45-2:15 | Lunch | |
2:15-3:45 | Project | |
3:45-4:15 | Break | |
4:15-5:45 | Lecture | |
5:45-7:15 | Hands On |
Time | Topic | Relevant Links |
---|---|---|
9:15-10:45 | Teaser Session | |
10:45-11:15 | Break | |
11:15-12:45 | Carto | |
12:45-2:15 | Lunch | |
2:15- 7:15 | Excursions | |
7:30 | Communal Dinner |
Time | Topic | Relevant Links |
---|---|---|
9:15-10:45 | Networks | |
10:45-11:15 | Break | |
11:15-12:45 | Gephi | |
12:45-2:15 | Lunch | |
2:15-3:45 | Projects | |
3:45-4:15 | Break | |
4:15-5:45 | Panel |
Time | Topic | Relevant Links |
---|---|---|
9:15-10:45 | Your Project | |
10:45-11:15 | Break | |
11:15-12:45 | ||
12:45-2:15 | Lunch | |
2:15-3:45 | Workshop Results | |
3:45-4:15 | Break | |
3:45-5:45 | Lecture: Nancy Ide | |
8:00 | Dinner |
The following readings are optional. They are provided for future reference.
D'Ignazio, Catherine and Lauren F. Klein, "Feminist Data Visualization.". IEEE
D'Ignazio, Catherine and Rahul Bhargava, "DataBasic: design principles, tools and activities for Data Literacy Learners, The Journal of Community Informatics, 2016.
Drucker, Johanna. "Humanities Approaches to Graphical Display", 2011.
Drucker, Johanna. Graphesis Visual Forms of Knowledge Production, 2014.
Gibbs, Frederick W. “New Forms of History: Critiquing Data and Its Representations.” The American Historian, 2016
Munoz, Trevor and Katie Rawson. "[Against Cleaning.](http://curatingmenus.org/articles/against-cleaning/", July 6, 2016.
Munoz, Trevor. "Data Curation as Publishing for the Digital Humanities, 2013.
Posner, Miriam."Humanities Data: A Necessary Contradiction ," 2015.
Robsenburg, Daniel. "Data Before The Fact".
Palmer, Carole L, Nicholas M. Weber, Trevor Muñoz, Allen H. Renear. "Foundations of Data Curation: The Pedagogy and Practice of 'Purposeful Work' with Research Data", 2013.
Padilla, Thomas. "Humanities Data in the Library: Integrity, Form, Access". D-Lib, March/April 2016.
Schoch, Christof. "Big? Smart? Clean? Messy? Data in the Humanities Journal of Digital Humanities," Journal of Digital Humanities, 2013.
Wickham, Hadley. "Tidy Data," 2014. Informal/code version here.
Lincoln, Matthew. "Best Practices for Using Google Sheets in your Data Project"
Schultz, Kathryn. “What is Distant Reading?” New York Times. June 26, 2011.
Jockers, Matthew. Macroanalysis. Introduction.
Graphs, Maps, Trees. Verso. 2007.
Blevins, Cameron.Topic Modeling Martha Ballard’s Diary
Brett, Megan R. “Topic Modeling: A Basic Introduction.” Journal of Digital Humanities. Vol 2. No. 1 Winter 2012.
Goldstone, Andrew and Ted Underwood. “What Can Topic Models of PMLA Teach Us About the History of Literary Scholarship?” Journal of Digital Humanities. Vol 2. No. 1 Winter 2012.
Meeks, Elijah and Scott Weingart. “The Digital Humanities Contribution to Topic Modeling.” Journal of Digital Humanities. Vol 2. No. 1 Winter 2012.
Rhody, Lisa. Topic Modeling and Figurative Language, 2012.
Rhody, Lisa. "Why I Dig: Feminist Approaches to Text Analysis". Debates in the Digital Humanities, 2016.
Underwood, Ted. Topic modeling made just simple enough., 2012.
"Forum: Text Analysis at Scale.". Debates in the Digital Humanities 2016.
Examples: Mining the Dispatch;Signs@40
Easley, David. Networks, Crowds, and Markets: Reasoning About a Highly Connected World, 2010.
Newman, Mark. Networks: An Introduction., 2010.
Weingart, Scott. “Demystifying Networks, Part I & II.” Journal of Digital Humanities. Vol 1 No. 1. Winter 2011.
Example Projects: Linked Jazz,Republic of Letters, Signs at 40, and Wikipedia
Bodenhamer, David, “Beyond GIS: Geospatial Technologies and the Future of History.” History and GIS: Epistomologies, Considerations and Reflections. Springer, 2013.
White, Richard. “What is spatial history?” February 1, 2010
Shnayder, Evgenia. “A Data Model for Spatial History.” November 15, 2010.
Crampton, Jeremy. Mapping: A Critical Introduction to Cartography and GIS. Wiley-Blackwell, 2010.
Examples: American Panorama; Anti-Eviction Mapping Project; Photogrammar; Blevins, Cameron, "Space, Nation, and the Triumph of Region: A View of the World from Houston", Journal of American History, 2014.
Ceeilia - @mcpmagalhaes
Alexandra - @alexandracotoc
Jeff - @jeffklo
Giuditta - @giudicirni
Elisabetta - @ec_giovannini
Carol - @digitaldante
Lauren - @nolauren @distantviewing
OpenRefine (formerly known as GoogleRefine), is a very popular tool for working with unorganized, non-normalized (what some may call "messy") data. OpenRefine accepts TSV, CSV, XLS/XLSX, JSON, XML, RDF as XML, and Google Data formats, though others may be used with extensions. It works by opening into your default browser window, but all of the processing takes place on your machine and your data isn't uploaded anywhere.
We will be using the movie data that we created in Google Sheets. Download the "messymovies" data as a .csv file.
- Click 'Open' in the top right to open a new OpenRefine tab
- Click 'Browse' and locate the .csv on your hard drive. Then click 'Next.'
- The Configure Parsing Options screen will ask you to confirm a few things. It has made guesses, based on the data, on the type of file, the character encoding and the character that separates columns. Take a look at the data in the top window and make sure everything looks like it's showing up correctly.
- Name the project "movies-metadata" and click 'Create Project' in the top right corner.
Take a minute to look around. Consider the structure of the data with principles of "tidy data" in mind. This will help guide what types of operations you perform on the data. Also take time to evaluate the type of information that is represented and what type of questions you might want to ask of it.
Let's take a look at our movies. Does anything stand out?
- Select the column and then 'Facet', 'Text Facet'. A new window appears on the left. Let's explore.
- Select 'count'. What do you notice? Why does this matter?
- Select 'cluster'. We have several options. Select all of the titles you want to merge and make sure to check off the 'Merge?' box for each. Next, select 'Merge Selected & Recluster' or 'Merge Selected & Close'.
- Select a redundant film. Ex. Mad Max
- Select 'Sort', 'Sort'.
- Next to rows, there now appears 'Sort'.
- Select the new 'Sort', 'Reorder Rows Permanently.
- Select 'movie_title', 'Edit cells', 'Blank Down'
- Select 'move_title', 'Facet', 'Customized facets', 'Facets By Blank'
- In the window on the left, select True
- Select "All" in our matching Rows, 'Edit Rows', 'Remove All Matching Rows'.
- We want to split the countries into different columns.
- Select the menu on the 'Column', 'Edit Column', 'Split Into Several Columns'.
- Since a comma is used to seperate our values, we will keep ','.
We did the work to standardize our dates. Now, we need to tell OpenRefine that this is a date.
- Select 'Edit Cells', 'Common transforms', 'To date'
- Now we see that it adds time. This is not ideal but fine.
- We can then look at the timeline. Select 'Facet' and 'Timeline facet'
Let's take a look. What's going on here? Does this look ok?
Let's take a look.
- Select 'Facet', 'Numeric Facet'
Wait, what's wrong?
- 'Edit Cells', 'Common transforms', 'To Number'
Now let's facet it again. What do we learn?
- Let's edit the cell. Hover over the cell and select 'edit'.
- We also need to change the data type. Selec 'number'.
Let's facet it and explore. Any issues?
- Select 'Facet', 'Text Facet'.
- Select 'Y'
- Select 'Date_Released'
- Select 'Timeline Facet' and let's adjust our range.
- Let's go back and 'Edit Cells', 'Tranform'.
Here we see what is called GREL (General Refine Expression Language). We can use code to edit our data. This is very powerful! It opens up a plethora of ways to transform our data.
- In the expression box type, 'value.replace("Y", "N")'. What happened?
GitHub is a web platform for archiving collections of code and other materials into what are known as repositories. It is completely free to use for open source projects developed out in the public. Paid accounts given access to private repositories that can contain personal or closed-source materials. You are currently looking at the repository named user-template. In addition to providing tools for collaborative programming, GitHub also provides a system for hosting static websites. These are known collectively as GitHub pages. This tutorial will set-up a basic GitHub pages that can serve as your digital identity. The tutorial assumes that you already have signed up for a free GitHub account. If not, go to https://github.com/ and create and validate your username.
To follow this tutorial, please visit here.
Tesseract-OCR is an open source OCR (optical character recognition) engine, originally developed by Hewlett Packard Laboratories. The standard installation of Tesseract-OCR can convert images of text in 39 different languages to plain text data. We will use the "sevenagesofwomen" data set in here.
You visit an archive and need to capture images of text based archival collections for your research - ultimately you would like to convert these images into data that you can search, visualize, text mine, etc. Using a digital camera and/or a copier you capture photos of archival collections in the .tif / .tiff format. With these files in hand you are prepared to use Tesseract-OCR to convert your images into plain text files.
Sometimes we get page images, but what we really need is plain text. Tesseract is free OCR software available in lots of languages that can generate text from images at a large scale.
Navigate to the sevenagesofwoman folder ($ cd Desktop/project/corpora/sevenagesofwoman
)
$ cd sevenagesofwoman
List files in the sevenagesofwoman folder
$ ls
Convert one tiff file to one txt file using Tesseract OCR
$ tesseract sevenagesofwoman_thebride.tiff sevenagesofwoman_thebride
Using your GUI, compare the tif file to the txt files you generated
While we’re here, why don’t we just OCR all of them in one batch?
$ for i in *.tiff ; do tesseract $i $i; done;
Remember our loop from the Command Line Bootcamp? This works the same way, but condenses everything to a single line using semicolons between commands.
FYI for Windows Users
Find tesseract.exe (it’s probably in Program Files (x86)) and drag it in. You can also try the path below and then hunt down the file if it doesn’t work
$ '/c/Program Files (x86)/Tesseract-OCR/tesseract.exe' sevenagesofwoman_thebride.tif sevenagesofwoman_thebride
Using your GUI, compare the tif file to the txt files you generated
While we’re here, why don’t we just OCR all of them in one batch?
$ for %i in (*.tif) do '/c/Program Files (x86)/Tesseract-OCR/tesseract.exe' %i %i
Move back to the BWRP books
$ cd ..
Find out how many lines and words there are in a text of your choosing using wc -l -w
$ wc -l -w bwrp_ActoEPoems.txt
Results:
1607 15242 bwrp_ActoEPoems.txt
1607 lines and 15,242 words
Do some very basic searching with egrep — this will print the entire line it's mentioned in
$ egrep europe *txt
$ egrep Europe *txt
$ egrep America *txt
Do some very basic counting with egrep -c
$ egrep -c man *txt
$ egrep -c woman *txt
Count only whole words using egrep -cw
$ egrep -cw man *txt
The possibilities for regular expressions are endless (and sometimes difficult and always ugly), but you can also find matching patterns.
What 18th and 19th century years (or very similar four character numbers) are mentioned in these texts?
$ egrep -o '\b1[7-8][0-9][0-9]\b' *txt
the -o flag returns just the text that matches the pattern this looks for four numbers in a row that start with a 17 or 18—the rest can be any numbers
It's possible that some of these aren't years. They could be page numbers or amounts or anything else. Let's do a search that includes some context, but not the entire line.
$ egrep -o '.{0,50}\b1[7-8][0-9][0-9]\b.{0,50}' *txt
This context feature is interesting. What if we wanted to look at the words around 'America' to see what people are saying without getting every full line?
$ egrep -o '.{0,50}America.{0,50}' *txt
We could even move this into a separate corpus if we wanted by adding > americacontext.text
to that search.
Let's define Text Analysis. Are you familiar with this method? How would you define it?
For our corpus, we will explore the State of the Union addresses. The State of the Union is delivered by the President of the United States annualy to a joint session of Congress. It is often a space where the President reflects on current issues and outlines goals for the nation. Therefore, it is a key document for understanding the ways the executive branch understands the current position of the country and their priorities. While today it is delivered oraly by the President, the State of the Union was initially a written document submitted to congress. In this lab, we will use Voyant to identify issues and priorities.
We will be using Voyant: a web-based text reading and analysis environment.
According to the Voyant Website 1, we can do the folllowing:
- Use it to learn how computers-assisted analysis works. Check out our examples that show you how to do real academic tasks with Voyant.
- Use it to study texts that you find on the web or texts that you have carefully edited and have on your computer.
- Use it to add functionality to your online collections, journals, blogs or web sites so others can see through your texts with analytical tools.
- Use it to add interactive evidence to your essays that you publish online. Add interactive panels right into your research essays (if they can be published online) so your readers can recapitulate your results.
- Use it to develop your own tools using our functionality and code.
We will start with Obama's final State of the Union address.
To begin, we will load in our text from here.
Let's take a look at the speech.
- What kind of file is this?
- What does the format of this file tell us about one way that Voyant needs text to be structured to process it?
We can load data into Voyant three ways.
- Use the URL
- Copy and paste the text into the box.
- Upload a file.
Now let's take a look at the kinds of text analysis used by Voyant!
Cirrus: Provides a word cloud of the most frequent terms. You can hover over the word to see the number of times it is used.
- Are these the words we expected?
- Are there any words we would have expected that aren't included?
- Are there any words we think should be removed?
Stop Words are a list of common words that are filtered out before or during text processing. You can use default stop word lists, like those included in Voyant, or create your own.
Let's say we want to remove "mdash" and add it to our stop words. Go to the bottom left panel that says "Summary Documents Phrases". Next to the question mark is another set of options that only appear once you hover over the area. The first option to the left of the question mark allows us to adjust our stop words.
Voyant's default setting auto-detects a stop word list. Select "None" and see what happens!
- Is this helpful?
Let's go back and select "English". To adjust our list, select "Edit List." Let's add "mdash".
- Are there any other we want to add?
Let's add: "that's" and "it's".
(Tip: When I'm adjusting the stop word list, I like to make a text file with my additional stop words. You'll notice Voyant only allows you to adjust your stop words once. If you try to add more, it deletes your previous custom words.)
Cirrus: Now we have a new word cloud!
Terms: We see the raw count of words in a list.
Links: Provides a collocates graph shows a network graph of higher frequency terms that appear in proximity. Keywords are shown in green and collocates (words in proximity) are showing in red.
Let's click on "America." Let's take a look at the Reader in the panel to the right.
- What changed?
Now let's take a look at Trends in the panel furthest to the right. The default is Raw Frequencies.
Let's change to Relative Frequenices. (This isn't as helpful with one document but will be when we are analyzing more than one at a time.)
Now let's go back to Links and double click on "America".
- What changed?
Contexts: Puts a term in context.
Bubblelines: A visualization of the term frequency in the document.
Summary: Overview of the document. To increase the number of frequence words, adjust the items slider on the bottom left.
Documents: We only have one document, but it will be helpful when we look at multiple at once.
Phrases: Provides a table of repreating phrases.
Let's sort by the most common phrases. (Tip: If Voyant won't let you reset and see all the phraes, reload.)
Pair up and take a few minutes to explore.
- Interesting insights?
Let's now take a look at George Washington vs President Obama's SOTU addressess.
Download the corpus to your Downloads Folder.
Unzip the file.
Go to Voyant and select "Upload".
The speechs are named according to year.
Make sure the files are in numerical order for this determines how Voyant loads them in. Now, let's explore!
To begin, take a look at Cirrus.
- Do we want to remove any stop words? If so, why?
Let's remove them.
- Can we learn anything from this? Document Length? Vocabulary Density?
We also have a new option - Distinctive Words. Voyant uses Term Frequency-Inverse Document Frequency to weigh how important a word is in the document or corpus. Let's take a look at the terms used by Washington and Obama.
-
Are there any themes we can see in these speeches? By presidency?
-
Do we see particular phrases?
Interested in looking at all of the State of the Union addresses? Here you go!
Want to look at films plots from Wikipedia? Here you go!
- The "all" folder includes the plots from each film nominated for an Academy Award.
- The "win" folder inclues a subset, the plots from each film that won an Acadeym Award.
If you are interested in how to work with the State of the Union addressed with the R programming language, see my tutorial with Taylor Arnold on Programming Historian.
PS: A quick note about lemmatization is necessary. Lemmatization may be important for your study. For example, if we are interested in how common the corpus talks talks about "states" then we need to search "state" and "states". By lemmatizing, all instances of "states" becomes "state". Voyant does a version of this. However, we can then take this a step further. What if I use the term once to mean the political boundaries (ex. the state of Virginia) and as second time to mean a condition once was in (ex. in a state of happiness) We could then use Natural Language Processing (NLP) to distinguish between these two kinds. Tools for lemmatization and NLP include CleanNLP (for R) and Lexos (command line).
There is an expansive set of approachs and tools for mapping. The fields of geography and cartography offer many important methods and critiques to draw on when engaging in spatial analysis and mapping. Richard White's work is an exciting and necessary read for historians. While by no means exhaustive, I find it helpful to think about this area of inquiry as charachterized by several broad categories:
-
Interactive Mapping: Flexible interactive visualizations served over the web. Platforms include CARTO and MapBox. DH Projects like Photogrammar (project) and Mapping Inequality use Carto.
-
Narrative Mapping: Maps used to tell a story. They tend to support primarily linear story telling. Platforms include StoryMap.js, Odysessy, ESRI Story Maps. An example is Renewing Inequality.
-
Classic Cartography and Geographical Data: The study and practice of making maps. This often involves the use of rastor (set of pixels representing real world features such as an aerial photograph or land elevations) and vector (set of x,y coordinatates that create points, lines, polygons, which represent features such as county borders and streets ) data to create maps. This is often what people mean what they say they want a GIS system. They are often necessary for georectifying maps. The main platform is ESRI's ArcGIS and open source QGIS for Mac.
-
Spatial Analysis: While this term is sometimes applied more broadly, it commonly is used for computational analysis of spatial data. This most often is in the form of predictive modeling such as tracking population movement or the the spread of a disease across a community. Platforms such as ArcGIS have specialized in this area. Programming languages like R and python also allow for this. Web-based interactive mapping tools like Carto continue to build this kind of functionality. For an example, see Forced Migration of Enslaved Peoples.
Today, we are going to explore narrative mapping by focusing on another Knight Foundation tool called StoryMap.
This tool requires the use of Google. After logging in with Google, we are given two map options: Classic or Gigapixel. While the Gigapixel map is definitely worth exploring, it requires hosting a set of images on a web server. It also needs relatively large files. For today, we will use Classic. So, let's make a storymap!
The storymap is comprised of a series of slides with a spatial component. It words best when the number of slides doesn't exceed ~20 and the story is relatively contained. Let's walk through the first slide and then an example.
- Slide 1 - Title
- Headline: From Selma to Montgomery
- Text: Held in 1965, three marches were organized to highlight the continuation of racial injustice. The violence and murder commmited by white supremacists during the marches led to national outrage. Credited with leading to the passage of the Voting Rights Act of 1965, the events were a critical part of the long struggle for civil rights and the fight to racial oppression.
- Media: Holding Hands (Credit: Getty Images)
- Location: Selma
- Slide 2 - Murder of Jimmie Lee Jackson
- Headline: February 26, 1965: Jimmie Lee Jackson murdered by James Fowler, an Alabama State Trooper. In response, James Bevel and fellow civil rights activists organize a march from Selma to Montgomery.
- Media: (Image of Funeral)[http://assets.nydailynews.com/polopoly_fs/1.2099215.1422767566!/img/httpImage/image.jpg_gen/derivatives/article_750/cold-cases.jpg] (Credit: NY Daily News)
- Marion, Alabama
- Slide 3 - Bloody Sunday
- Headline: March 7, 1965: The first march turned quickly violent when state troopers attacked the unarmed marchers. The day became known as Bloody Sunday.
- Media: https://www.youtube.com/watch?v=BFhcR362RyE
- Location: Right across the bridge
- Slide 4 - A Photo Elicits National Outrage
- Headline: March 7, 1965: The event received national attention spurred by a photo of Amelia Boynton's unconscious body.
- Media: https://library.duke.edu/digitalcollections/snccdigitalgateway/selma6.jpg (Credit: SNCC Digital)
- Location: Edmund Pettus Bridge
- Slide 5 - Turn Around Tuesday
- Headline: March 9, 1965: The second march resulted in a confrontation between marchers led by Martin Luther King and state law enforcement. While the march ended peacefully, white segregationists murdered James Reeb, a white civil rights activist, later that evening. The event further spurred national uproar.
- Media: Find a piece of your choice.
- Location: Green St at Water Ave
- Slide 6 - Pick an event.
Now that we have several event on our map, we might decide we want to change the base map. We can do so by going to Options. StoryMap offers several base maps. However, you might want to add your own.
Once you are ready to share your map, select the Share button on the top right corner.
Limits of StoryMap:
- Have to pick a specific location.
- Can only use media that is supported by the tool.
- Only load one piece of media per slide.
- Best with visually rich media.
- The data is locked into the tool.
Others: Neatline, Odyssey; ESRI Story Maps
Example: FYS 100ng
CartoDB is a web-based tool for visualizing and analyzing small to medium sized spatial datasets. While ostensibly open-source software, it is quite difficult to compile yourself and much of the nice UI is actually proprietary. Fortunately, they offer a good free-tier of the service (no credit card needed; just an e-mail) which should suffice for the purposes of this tutorial.
Go to https://carto.com/signup/ and sign-up. I suggest using your @institutionname.edu account. You can then email Carto about access to free plans by going here: https://carto.com/community/ambassadors/#started. They review applications every month.
Let's get some data! Once downloaded, unzip and open.
Now, let's take a look at the data. Each row is a photo. Each column is an observation about the photo. Several aspects of this data:
- Make sure everything is typed the same. For exampe, "Alfred T. Palmer" should be consistent. Not "Alfred Palmer".
- If we google a cnumber, it will correspond to the LOC call number. When working with an archive/collection, consider keeping the collection's unique identifying information.
- We used "NA" when we didn't know a value. Often people just keep the cell blank. We did this so we knew what we had checked.
- You can change the data type. This can be important for different functions. Ex. Staff Photographer? True or False.
For mapping, the most important are the columns about location information. When we first had this data, we only had City, County, and State. We added longitude and latitude. Carto now has a (fickle) tool for doing this automatically.
To import data into CartoDB, drag and drop the file you want into the dashboard.
Go to New Map -> Add Datasets -> Connect Dataset.
We are going to upload the unzipped dataset: photo_dataset_all_raw.csv
Once the data is loaded, a pop-up will appear on the bottom left that will ask if you want to see the data. This will only appear once. To get back to the data, go to the top left of the screen. There you will see the Carto logo, your username, and Maps. Click on Maps -> Your datasets.
The screen will shift to a map interface. This is called "Carto Builder".
There are two main ways CartoBuilder is organized: Layers and Widgets.
Switching to the map view, we begin to see the benefits of using a tool like Carto. A reasonably nice map has been constructed out of the box from the data we imported. Zooming in and out, you'll notice that the map has discrete zoom levels. That's because the map is being created by a tile server, which serves rasterized tiles to the browser.
What's the difference between rastor and vector data?
vector data model: [data models] A representation of the world using points, lines, and polygons. Vector models are useful for storing data that has discrete boundaries, such as country borders, land parcels, and streets. - ESRI GIS Dictionary
In other words, it is the acutal shapes. It is stored as points (x,y).
raster data model: [data models] A representation of the world as a surface divided into a regular grid of cells. Raster models are useful for storing data that varies continuously, as in an aerial photograph, a satellite image, a surface of chemical concentrations, or an elevation surface. - ESRI GIS Dictionary
In other words, it is a picture. It is stored as pixels. You can't do analytics on rastor data. Raster data is common for web-based visualizations.
We can change the base map. Carto offers several options. The Voyager style offers several different colors and varying degrees of detail. For example, Positron has city and state labels while Positron (LITE) does not. Voyager is a slighlty different color as well as includes major highways. The color will depend on the mood you are trying to convey. Base map needs will also depend on the time period being mapped. Particularly for Photogrammar, we would not want to use the Here map collection. It would be ahistorical. Also, place names are not value neutral. For example, if one were mapping indigenous communities, which place names to use is a major issue to consider. If you want to add your own map, Mapbox is an option. This will require turning a map into raster data to then upload. One option is to use the open source QGIS. Programming Historian also offers a tutorial.
Click on Layers. By default, our data is the first layer. We will primarily work with this layer during the workshop. If we wanted to layer the map wtih additional data, all we would need to add is a new layer.
Let's explore our default layer. Click on "photo_dataset_all_raw" -> Style
Points: They are great, but you need to be careful when the actual location isn't precise. Since we are looking at a national scale, this is less of an issue. However, if we zoom in, this becomes an issue as we don't have the exact location of many of the photos.
- Labels can be added. We have way too much data for "Labels", but it is worth noting. If you click on it, you can see the options.
- What we can do instead is add a Pop-Up. Select "Pop-Up" and select the metadata you want to apeear. In our case, I am going to select pname, year, and title. You can change the name of the title as it will appear in the pop-up here, which we want to do here instead of adjusting our original data. If you select the final "Pop-Up Header with URL" and provide a full URL, an image will appear in the header. Let's add this columnt to our data. Go back to the data and click "Add Column" and name it "photourl". Let's then add the photo for "http://cdn.loc.gov/service/pnp/fsac/1a33000/1a33800/1a33850v.jpg" for CartoID 13 (fsa1992000013/PP). We can't leave any previous columns null, so click on them to add space. (One more reason to do data adjustments outside of Carto.) Carto can be fickle! Then, we go back to Layer -> Select Layer -> Pop-Up. Let's toggle on photourl and move it ot hte top. It much be the first selected item. You can also select if you want a user to see the pop-up when they "hover" or "click". We picked click becasue of the density of points.
Hexbin: It suggests the general area but doesn't visually suggest we know the exact location. It also helps show photo counts. One thing I don't like about Carto is their default color ramp; the lighter the color, the more photos there are. Visually, we tend to associate darkness with a higher count, so I'd switch the ramp. We can do that by going to color. One thing to keep in mind is those who may be color blind. I find this tool very helpful. You will notice that this is the same color ramp we use for Photogrammar. Let's pick this green color ramp under Style in Carto.
Heatmap: It can be great! Just not for this data set.
Legend: You can add aspects as necessary for your project. To rename the Title, we have to reame the layer.
Let's make our map points with an appropriate pop-up. Then, we can add widgets!
Widgets are a way to add additional interactivity to the map. Click on the pencil on the far left menu ("Edit Map"). Then select Widgets-> +Add New Widget.
There will be options based on your metadata. Let's choose "pname" and "year". We can then customize these two widgets.
Let's:
- Change the name of the table.
- Give each photographer a color.
Now that we have our widgets, let's publish. We can do by clicking "Publish" in the Edit Map menu.
Copy the link and open it in a new window. What do you see? You can share this link and share your map!
If you want to embed the map, you can do so using the generated iframe.
We will use the following tutorial.