LabelEncoder is a transformer to encode labels. Its usage is:
In [1]: from sklearn.preprocessing import LabelEncoder
In [2]: LabelEncoder().fit_transform(["cat", "dog", "dog", "cat", "cat"])
Out[2]: array([0, 1, 1, 0, 0])
It means that you transform non-numerical labels into numerical labels in a classification setting.
Usually, you never have to use this transformer because all classifiers in scikit-learn will use internally this transformer to encode and decode the labels and to provide you non-numerical target if it was originally your input.
In the first version of the MOOC, we had examples. I don’t remember if we removed them to simplify the learning process. In a very condensed way, there are the following strategies:
- use an imputer (cf. API Reference — scikit-learn 1.0.2 documentation) to replace missing values with another value. For categories, it makes sense only to consider a missing value as a category on its own that will be later encoded via the
OneHotEncoder or OrdinalEncoder. For numerical features, there are multiple options and in practice this is difficult to do better than a simple imputation.
- use a predictor that handles missing values (e.g.
HistGradientBoosting).