RandomForest vs Bagging

I’m a bit confused by the summary table presented at the end of the notebook with regards to the BaggingRegressor and RandomForestRegressor.

If the default is to have no subsampling then isn’t the difference between random forest and bagging methods redundant since the whole point of the random forest is to add feature subsampling at each node in the tree?

Apologies for my confusion here, but I’d really appreciate some explanation.

Many thanks!

1 Like

In a regression setting, the default parameters induce that there is no subsampling of the features (all features are used as presented in the original paper of Breiman). Therefore, a RandomForestRegressor is the same as a BaggingRegressor by default.

However, it is also a good idea to fine-tune this parameter in practice since using all features is not necessarily the most efficient in terms of computation and statistical performance.

that makes sense, thank you :slight_smile: