Journal article

Modelling species presence-only data with random forests

Roozbeh Valavi, Jane Elith, José J Lahoz-Monfort, Gurutzeta Guillera-Arroita

Cold Spring Harbor Laboratory

Abstract

AbstractThe Random Forest (RF) algorithm is an ensemble of classification or regression trees, and is a widely used and high-performing machine learning technique. It is increasingly used for species distribution modelling (SDM). Many researchers use implementations of RF in the R programming language with default parameters to analyse species presence-only data together with background samples. However, there is good evidence that RF with default parameters does not perform well with such species “presence-background” data. This is often attributed to the typical disparity between the number of presence and background samples also known as class imbalance, and several solutions have been pr..

View full abstract

Citation metrics