What is a Random Forest? 📂Machine Learning

What is a Random Forest?

Definition

Random Forest refers to a machine learning technique that improves the performance of decision trees by applying bagging. It trains decision trees on multiple datasets sampled with replacement (bootstrap sampling) and determines the final prediction by majority voting.

Description

Random forests are a classical machine learning technique commonly used for classification problems, and are often the first method to consider when a classification model is required. As weak models, decision trees suffer from severe overfitting and can exhibit inconsistent performance depending on how the data are presented; bagging effectively incorporates a form of cross-validation and, via majority voting, mitigates the sensitivity of individual decision trees.

However, random forests substantially reduce the interpretability of the constituent decision trees once ensembled, and because they are not a fundamentally new model, one should not expect large gains in performance beyond that. For example, creating multiple ensembles of random-forest groups and re-ensembling them is not likely to yield meaningful additional improvement.

Why “Random”

When subsets are created via bagging, sampling with replacement introduces randomness, so the resulting decision-tree structures inevitably vary widely.

Why “Forest”

Just as multiple trees (木) form a grove (林), in graph theory a collection of tree graphs is called a forest.