GridSearch and ExtraTrees

Hi,

1/ When applying the GridSearch, the number of trees is set using randint(10, 30) but I’m missing 2 things here.
First, why do we choose it randomly with randint instead of putting manually an arbitrary value we want ?
Second, here we are doing a grid search, so wy do we feed only one value for the parameter number of trees instead of giving several values as we did for the other parameters ?

2/ About extra tree we choose the split totally at random but I’m a bit confused on what is done exactly. Does it mean the trees “choose” their rule to classify data randomly ? On what is the split exactly ? I don’t really get on what is the training and fit if the process is totally random (of course I’m not familiar enough with all these notions so for extra tree I really have difficulties to understand what happens)

Thank you again for all your help !

Geoffrey

I look at the code, I have the following cell:

from scipy.stats import randint
from sklearn.model_selection import RandomizedSearchCV

param_grid = {
    "n_estimators": randint(10, 30),
    "max_samples": [0.5, 0.8, 1.0],
    "max_features": [0.5, 0.8, 1.0],
    "base_estimator__max_depth": randint(3, 10),
}
search = RandomizedSearchCV(
    bagging, param_grid, n_iter=20, scoring="neg_mean_absolute_error"
)
_ = search.fit(data_train, target_train)

The reason is that we used a RandomizedSearchCV and not a GridSearchCV. Otherwise, I agree with you that with GridSearchCV, we usually give a given list of possibility.

Basically, you don’t try to find the split that best separates two classes. You take it at random. Do not forget that you will repeat this procedure for each feature and select the best random split among this pool. I agree that this is quite counterintuitive that it even works :slight_smile:

In some way, this procedure adds some more randomization on the top of the feature and samples randomization of random forest. You can as well get a bit more details in the original paper that provide a bias/variance analysis that is interesting: https://link.springer.com/content/pdf/10.1007/s10994-006-6226-1.pdf

1 Like

Ok it’s clearer thank you again.