moorea_exploration/moorea_guilds.Rmd at master · owenrliu/moorea_exploration · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
title: "Moorea Guild CV and Concordance"
author: "Owen Liu"
date: "Tuesday, October 20, 2015"
output: html_document
---

```{r, echo= FALSE, include=FALSE}
library(dplyr)
library(doBy)
library(ggplot2)
library(DT)
library(knitr)
```

This an exploration of the Moorea fish coutns dataset, specifically into the relationship between persistence of species in specific habitats (concordance), and variation in abundance across censuses.  Concordance will be measured with Kendall's W.

First, assemble the data

```{r data}
data <- read.csv('~/github/moorea_exploration/data/MCR_LTER_Annual_Fish_Survey_20150318.csv')
names(data)

abun <- summaryBy(Count~Taxonomy, data= data, FUN=sum, na.rm=TRUE) #summarizes the data by abundance of species
abun <- abun[order(-abun[,2]),] #sorts by total abundance
top_50 <- abun[1:50,] #top 50 most abundant species
data.50 <- subset(data,data$Taxonomy %in% top_50$Taxonomy) # crop original data to top 50 species
```

We are interested in groups of species, first by family, then perhaps by trophic group.  We are interested in concordance and abundance variation across years, between sites, and between habitats.  We add zeros to the data for year/site/habitat combinations where fish of a given species were not observed.

```{r data summary by family}
data.fams <- summaryBy(Count~Family+Year+Site+Habitat,data=data.50,FUN=sum,na.rm=T)
data.fams$Family <- as.character(data.fams$Family)
data.fams$Habitat <- as.character(data.fams$Habitat)
fams <- unique(data.fams$Family)
names(data.fams)[5] <- 'Abundance'
names(data.fams)
```

```{r adding zeros,include=FALSE}
test=data.frame(Family=as.vector(sapply(fams,function(x) rep(x,(9*18)))), Year=rep(sapply(2006:2014,function(x) rep(x,18)),length(fams)),Site=rep(sort(rep(1:6,3)),9*length(fams)),Habitat=rep(c('BA','FO','FR'),6*9*length(fams)),Abundance=rep(0,18*9*length(fams)),stringsAsFactors=FALSE)
for (i in 1:nrow(data.fams)) {
  row <- data.fams[i,]
  match <- subset(test,test$Family==row$Family & test$Year==row$Year & test$Site==row$Site & test$Habitat==row$Habitat)
  test[row.names(match),'Abundance'] <- row$Abundance
}
data.fams<-test
```

Now we have the data in a more usable format, where counts are aggregated by family, but separated by year, site, and habitat (fringing reef vs. back reef vs. fore reef).  Next we have to calculate, for each family, the coefficient of variation in abundance with each year/site/habitat combination as a different data point.

```{r coefficient of variation}
cv.fams <- data.frame(Family=fams,CV=rep(NA,length(fams)))
cv.fams$CV <- sapply(fams,function(x) sd(subset(data.fams$Abundance,data.fams$Family==x))/mean(subset(data.fams$Abundance,data.fams$Family==x)))
```

Now, for Kendall's W, a measure of concordance, we rank the sites for each speices in each year according to their relative abundance. There are 18 unique sites (Sites 1-6 times 3 unique Habitats, fringing reef, fore reef, back reef) and 9 years (2006-2014).  Then, for each species, we calculate Kendall's W,

$$W = \dfrac{12S}{m^{2}(n^{3}-n)}$$

where $m$ is the number of years (9), $n$ is the number of sites (18). $S = \sum_{i=1}^{n}(R_{i} - \bar{R})^{2}$ is the sum of squared deviations of each site i's total rank across years, $R_{i} = \sum_{j=1}^{m}r_{i,j}$ from its mean rank,
$\bar{R} = 1/n\sum_{i=1}^{n}R_{i}$


```{r Kendalls W 1, echo=FALSE}
years <- unique(data.fams$Year)
kendall.m <- length(years) #m is the number of years
kendall.n <- length(unique(data.fams$Site))*length(unique(data.fams$Habitat)) # n is the number of site/habitat combinations
ranks.list = list() #for all species
for (i in 1:length(fams)) {
  ranks.mat <- matrix(nrow=kendall.n,ncol=kendall.m,
                      dimnames=list(sites=1:18,years=years)) #placeholder for individual species ranks
  fam <- fams[i]
  famdata <- subset(data.fams,data.fams$Family==fam)
  for (j in 1:kendall.m) { #for each year...
    yr.data <- famdata[famdata$Year==years[j],]
    ranks <- rank(-yr.data$Abundance)
    ranks.mat[,j] <- ranks
    }
  ranks.list[[as.character(fam)]] <- ranks.mat
}

kendallsW <- function(mat) {
  Ri <- apply(mat,1,sum)
  Rbar <- mean(Ri)
  n <- nrow(mat)
  m <- ncol(mat)
  S <- sum((Ri-Rbar)^2) #vector (length n) of sum of squared deviations for each site
  W <- (12*S)/(m^2*(n^3-n))
  return(W)
}
```

```{r ranks and cv output,echo=FALSE}
ranks.fams <- sapply(ranks.list,FUN=kendallsW)
fams.cv.ranks <- cbind(cv.fams,ranks.fams)
colnames(fams.cv.ranks) <- c('Family','CV','W')
knitr::kable(fams.cv.ranks,digits=3,row.names=FALSE)
```

```{r cv concordance plot, echo=FALSE}
fams_plot <- ggplot(fams.cv.ranks, aes(x=CV,y=W)) +
  geom_point()+geom_text(aes(label=Family))
fams_plot

```