RandomForest vs Bagging

ismailmo1 · 6 March 2022 15:13

I’m a bit confused by the summary table presented at the end of the notebook with regards to the BaggingRegressor and RandomForestRegressor.

If the default is to have no subsampling then isn’t the difference between random forest and bagging methods redundant since the whole point of the random forest is to add feature subsampling at each node in the tree?

Apologies for my confusion here, but I’d really appreciate some explanation.

Many thanks!

glemaitre58 · 6 March 2022 15:47

In a regression setting, the default parameters induce that there is no subsampling of the features (all features are used as presented in the original paper of Breiman). Therefore, a RandomForestRegressor is the same as a BaggingRegressor by default.

However, it is also a good idea to fine-tune this parameter in practice since using all features is not necessarily the most efficient in terms of computation and statistical performance.

ismailmo1 · 7 March 2022 14:36

that makes sense, thank you