Range of value for Treshold

Hello,

For each variable we choose different thresholds and we keep the threshold that gives the best partition. But how is chosen the range of value of these thresholds?

Thank you in advance for the answer

Aurélien

I would assume the tresholds ranges for each feature are between min(feature) and max(feature).

But in the case where the range of values of the features is from zero to a million for example, the algorithm will look one by one all the thresholds ?

No, the trick will be to iterate only to the value between 0 and 1 million that are present in the dataset.
So, if there are a lot of data, then there is potentially a lot of splits to iterate over and make the search quite slow.

In the next module, you will see the histogram gradient-boosting that discretizes the feature values to have only a small number of bins and thus reduce the number of splits to evaluate.

2 Likes