The following statements were taken from the lecture HANDLING CATEGORICAL DATA:
Then, we can send the raw dataset straight to the pipeline. Indeed, we do not need to make any manual preprocessing (calling the
transform
orfit_transform
methods) as it will be handled when calling thepredict
method.
The notebook then proceeds to call the fit
function to train a ML model with an unscaled, unprocessed train set:
_ = model.fit(data_train, target_train)
Why can we do that? Shouldn’t we pre-process the data before fitting a model?