KbinsDiscretizer in Speeding-up gradient-boosting lesson => I do not understand your sentence

echidne · 30 June 2021 08:58

you wrote :

We see that the discretizer transforms the original data into an integer. This integer represents the bin index when the distribution by quantile is performed. We can check the number of bins per feature.

but data_trans is :

array([[249.,  39., 231., ...,  83., 162.,  30.],
       [248.,  19., 203., ...,  28., 161.,  30.],
       [242.,  49., 249., ..., 125., 160.,  29.],
       ...,
       [ 17.,  15., 126., ...,  49., 200.,  82.],
       [ 23.,  16., 136., ...,  29., 200.,  77.],
       [ 53.,  14., 130., ...,  93., 199.,  81.]])

So I see that data_trans is an array of float numbers not an integer!

glemaitre58 · 30 June 2021 17:19

The data type are float but the number are integral.

echidne · 30 June 2021 17:39

I think you mean that the encoding as ordinal returns the bin identifiers as integer values but the result of the transformation is an array of float numbers.

type(249.) is float

return

True

glemaitre58 · 30 June 2021 17:53

x = 249.
x.is_integer()

returns

True

But we can improve the description.

echidne · 30 June 2021 18:57

Definition: is_integer() returns True if the float instance is finite with integral value, and False otherwise.

so x is not an integer but a float …

glemaitre58 · 30 June 2021 19:19

This pretty much the definition that I gave previously:

The data type is float but the number are integral.

echidne · 30 June 2021 19:58

I think the probleme beetween both sentence is the difference beetween a number and a value.

When you write x=249. The number x is a float (since in python 249. = 249.0) but his value is 249 so the value is an integer.

type(249.) => output: float
type(249) => output: int
249. == 249 => output: True
249. is 249 => output: False
249. is 249.0 => output: True

ArturoAmorQ · 31 January 2022 15:53

Solved in Fix wording in HGBDT notebook by glemaitre · Pull Request #508 · INRIA/scikit-learn-mooc · GitHub