Skip to content

Understanding the code co‐occurrence network

Alberto Cottica edited this page Jul 13, 2017 · 9 revisions

GraphRyder helps make sense of large ethnographic datasets like OpenCare's. It does so in two main ways.

Counting codes

GraphRyder counts codes, and arranges them by decreasing number of occurrences. This is found in the Code View (formerly Tag View) page, as a list to the right of the graph. This list tells us which ethnographic codes are more frequently associated with contributions in the conversation. All software applications for ethnographic research have this functionality.

Linking codes

GraphRyder also links codes into a network, the code co-occurrence network. To the best of our knowledge, no other ethnographic software does this. The network is induced by representing codes as nodes. Two codes are connected if they appear together on the same contribution. Co-occurrence is an indication that at least one informant has felt the need to reference both codes in the same argument. It is a "vote" for the two concepts augmenting each other in the context of the problem being studied.

The code-code network is undirected (A => B is not the same as B => A) and weighted (the edge has a weight of k if A co-occurs with B on k different contributions).

We can think of the co-occurrence network as an association map between the concepts expressed by the codes. It carries information on how informants connect all the key concepts that have emerged from the study, as seen by the ethnographer(s). There are two parts to this information:

  1. Which codes connect to which other codes. For example, the network could be disconnected into "islands" of codes, with no code in each of the islands ever occurring with any code in any of the other islands. This would be a strong indication that the informants think there are entirely separate, mutually independent sides to the problem at hand. In a less extreme variant of the same scenario, the network could be highly modular.

  2. Which connections are strongest. A higher edge weight k indicates a stronger whole conversation-level association between the two codes connected by the edge. Highest-k edges indicate the connections that arise most frequently in the conversation.

Notice that both types of information emerge from the conversation as a whole. No individual informant knows it (not, at least, without looking at the network). In this sense, they are really collective intelligence. An additional attractive property is that, since the community is unaware of it, it is unbiased by it.

How to use the co-occurrence network

  1. Filter out low-k (weak) co-occurrences to discover the high-level structure of the conversation. This is done from the code co-occurrence view (formerly "tag view full"). Click on the cog icon, select a value for k and click "Go". OpenCare data, for example, have a very clear structure: everything is connected, but the conversations resolve in clusters of mutually tightly connected codes, with some "bridges" to connect the clusters to one another (see figure below).
  2. Filter low-k co-occurrences back in and explore the neighbourhood of codes you are interested in to discover "novel" associations. In OpenCare, an interesting starting point is police brutality, which links to translation. If you find a novel association interesting, it makes sense to understand why informants made that association. This is done by clicking on the eye icon in the code view, then on the edge you are interested in. This brings up a list of contributions that carry both codes, and therefore connect them.

The code co-occurrence network in OpenCare, June 2017. Edges with k < 5 have been filtered out.

For a more structured discussion of the code co-occurrence network in the context of digital ethnography, read this paper (especially Section 2).

Clone this wiki locally