Species Distribution Model: RANDOM FORESTS
Model Category: Machine Learning
Model Description: Random Forests (RF) is an ensemble technique that uses bootstrap aggregation (bagging) and classification or regression trees. Bootstrap aggregation takes uniform samples from an original dataset of predictor and response to create a subset of data that is allowed to have duplicated samples (replace=T). Then, each sample is used to create a tree. Trees break datasets up into subsets based on measures of variance. The breaks are applied on the grounds of creating two subsets with the minimum possible total intrasubset variance. Each split is normally done by considering all the points in the set that is going to be split. However, in random forests only a select number of randomly selected points in the set is used to split the points into two subsets. In this way, the RF technique takes n number of bagged subsets from an original dataset and creates n number of trees that are grown by randomly sampling i number of points for splitting at each node. Once all the trees are grown, and a new value needs to be predicted, the values calculated by all the regression trees are averaged, or, in the case of classifications trees, each tree casts a vote. This is how RF acts as an ensemble technique. If probabilities of species occurrence are desired, the votes (binary) from classification trees are used to create the probabilities.
Model Assumptions: Random forests has the common assumption that samples are representative of the species being modeled and that the samples are independent. There are no assumptions about the distribution of the data.
Model Response Data: The model can use presence/absence, pseudo-absence, and abundance. Presence/absence data would use a classification tree, while abundance data will use a regression tree.
Model Explanatory Data: The model can handle categorical and continuous predictors.
Model Links and Use with R: CRAN website: https://cran.r-project.org/web/packages/randomForest/index.html
Helpful presentation slides by Dr. Adele Cutler of Utah State University: http://www.math.usu.edu/adele/RandomForests/Ovronnaz.pdf
Elith, J., & Graham, C. H. (2009). Do they? How do they? WHY do they differ? on finding reasons for differing performances of species distribution models. Ecography, 32(1), 66-77. http://doi.org/10.1111/j.1600-0587.2008.05505.x
Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classification and regression tree techniques: Bagging and random forests for ecological prediction. Ecosystems, 9(2), 181-199. http://doi.org/10.1007/s10021-005-0054-1
Example with R:
Bring in the presence and background data.
Load the full presence dataset.
Predict presence probabilities using Random Forests model using all the predictors except biome, and then only three predictors.
Spatial Ecology @ MSU
Click on "Category" below to search for R code compiled by the Zarnetske Spatial & Community Ecology Lab and students in MSU's Spatial Ecology graduate course (FOR870/FW870)