Data types

In the comments it is said that :

the only two types in the dataset are integer and object…

However the output is :

array([dtype(‘int64’), dtype(‘O’)], dtype=object)

so it seems that there are 3 types ? How to understand this output ?

Is there another way to sort the numerical columns than manually with :

numerical_columns = [“age”, “capital-gain”, “capital-loss”, “hours-per-week”]

Thanks

1 Like

array([dtype(‘int64’), dtype(‘O’)], dtype=object)

The dtypes that we refer to are [dtype(‘int64’), dtype(‘O’)]. These dtypes are stored in a NumPy array that uses itself an object dtype.

Basically, it would be easier to transform the outer array into a list to not be bothered by its dtype. For instance:

In [5]: np.array([np.dtype('int64'), np.dtype('O')], dtype=object).tolist()
Out[5]: [dtype('int64'), dtype('O')]

Note that dtype('O') means dtype object.

Is there another way to sort the numerical columns than manually

Yes, you will see it later on in the course. We can use make_column_selector that can select or exclude based on the dtype.

7 Likes