Journal article
Modelling species presence-only data with random forests
Roozbeh Valavi, Jane Elith, José J Lahoz-Monfort, Gurutzeta Guillera-Arroita
Cold Spring Harbor Laboratory
Abstract
AbstractThe Random Forest (RF) algorithm is an ensemble of classification or regression trees, and is a widely used and high-performing machine learning technique. It is increasingly used for species distribution modelling (SDM). Many researchers use implementations of RF in the R programming language with default parameters to analyse species presence-only data together with background samples. However, there is good evidence that RF with default parameters does not perform well with such species “presence-background” data. This is often attributed to the typical disparity between the number of presence and background samples also known as class imbalance, and several solutions have been pr..
View full abstract