Category: SDM: Random Forests

11/25/2015

Model Category: Machine Learning

Model Description: Random Forests (RF) is an ensemble technique that uses bootstrap aggregation (bagging) and classification or regression trees. Bootstrap aggregation takes uniform samples from an original dataset of predictor and response to create a subset of data that is allowed to have duplicated samples (replace=T). Then, each sample is used to create a tree. Trees break datasets up into subsets based on measures of variance. The breaks are applied on the grounds of creating two subsets with the minimum possible total intrasubset variance. Each split is normally done by considering all the points in the set that is going to be split. However, in random forests only a select number of randomly selected points in the set is used to split the points into two subsets. In this way, the RF technique takes n number of bagged subsets from an original dataset and creates n number of trees that are grown by randomly sampling i number of points for splitting at each node. Once all the trees are grown, and a new value needs to be predicted, the values calculated by all the regression trees are averaged, or, in the case of classifications trees, each tree casts a vote. This is how RF acts as an ensemble technique. If probabilities of species occurrence are desired, the votes (binary) from classification trees are used to create the probabilities.

Model Assumptions: Random forests has the common assumption that samples are representative of the species being modeled and that the samples are independent. There are no assumptions about the distribution of the data.

Model Response Data: The model can use presence/absence, pseudo-absence, and abundance. Presence/absence data would use a classification tree, while abundance data will use a regression tree.

Model Explanatory Data: The model can handle categorical and continuous predictors.

Model Links and Use with R: CRAN website: https://cran.r-project.org/web/packages/randomForest/index.html

Helpful videos: Youtube user mathematicalmonk’s: https://www.youtube.com/watch?v=o7iDkcpOr_g Youtube user edureka!’s: https://www.youtube.com/watch?v=IJgR7n-VqSo

Helpful presentation slides by Dr. Adele Cutler of Utah State University: http://www.math.usu.edu/adele/RandomForests/Ovronnaz.pdf

Example Papers:
Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32. http://doi.org/10.1023/A:1010933404324

Elith, J., & Graham, C. H. (2009). Do they? How do they? WHY do they differ? on finding reasons for differing performances of species distribution models. Ecography, 32(1), 66-77. http://doi.org/10.1111/j.1600-0587.2008.05505.x

Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classification and regression tree techniques: Bagging and random forests for ecological prediction. Ecosystems, 9(2), 181-199. http://doi.org/10.1007/s10021-005-0054-1

Example with R:

library(randomForest)     # package to run the model
library(dismo)
library(rgdal)
library(raster)
#library(rJava)
library(maptools)
library(sp)

# Set working directory 
setwd("C:/Users/nagelki4/Dropbox/Permanent/Grad School/Classes/FOR 870 Spatial Ecology/Labs/Lab 11")

Bring in the presence and background data.

load("modeldata.RData")

# Get all predictor variables for this example in the dismo package
files <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''), pattern='grd', full.names=TRUE ) 
predictors <- stack(files)

Load the full presence dataset.

file <- paste(system.file(package="dismo"), "/ex/bradypus.csv", sep="")
bradypus <- read.table(file,  header=TRUE,  sep=",")
bradypus <- bradypus[,-1]

Predict presence probabilities using Random Forests model using all the predictors except biome, and then only three predictors.

# Using all predictors
rf <- randomForest(pb ~ bio1 + bio5 + bio12+ bio16 + bio17 + bio6 + bio7 + bio8, data = modeldata )
p <- predict(predictors, rf) 
plot(p, main="Random Forests Prediction Utilizing all Predictor Variables")
points(bradypus, col="black", cex=.5, pch=20)

# Using only three predictors
rf2 <- randomForest(pb ~ bio1 + bio5 + bio12, data = modeldata)
p2 <- predict(predictors, rf2)
plot(p2, main="Random Forests Prediction Utilizing Three Predictor Variables")
points(bradypus, col="black", cex=.5, pch=20)

1 Comment

Spatial Ecology & R
(search via "category" below)

Species Distribution Model: RANDOM FORESTS

Spatial Ecology @ MSU

Category

Archive

Spatial Ecology & R(search via "category" below)

Species Distribution Model: RANDOM FORESTS

Spatial Ecology @ MSU

Category

Archive

Spatial Ecology & R
(search via "category" below)