KbinsDiscretizer in Speeding-up gradient-boosting lesson => I do not understand your sentence

you wrote :

We see that the discretizer transforms the original data into an integer. This integer represents the bin index when the distribution by quantile is performed. We can check the number of bins per feature.

but data_trans is :

array([[249.,  39., 231., ...,  83., 162.,  30.],
       [248.,  19., 203., ...,  28., 161.,  30.],
       [242.,  49., 249., ..., 125., 160.,  29.],
       ...,
       [ 17.,  15., 126., ...,  49., 200.,  82.],
       [ 23.,  16., 136., ...,  29., 200.,  77.],
       [ 53.,  14., 130., ...,  93., 199.,  81.]])

So I see that data_trans is an array of float numbers not an integer!

The data type are float but the number are integral.

I think you mean that the encoding as ordinal returns the bin identifiers as integer values but the result of the transformation is an array of float numbers.

type(249.) is float

return

True
x = 249.
x.is_integer()

returns

True

But we can improve the description.

Definition: is_integer() returns True if the float instance is finite with integral value, and False otherwise.

so x is not an integer but a float …

This pretty much the definition that I gave previously:

The data type is float but the number are integral.

I think the probleme beetween both sentence is the difference beetween a number and a value.

When you write x=249. The number x is a float (since in python 249. = 249.0) but his value is 249 so the value is an integer.

type(249.) => output: float
type(249) => output: int
249. == 249 => output: True
249. is 249 => output: False
249. is 249.0 => output: True

Solved in Fix wording in HGBDT notebook by glemaitre · Pull Request #508 · INRIA/scikit-learn-mooc · GitHub