Good morning,
I need some explanations about this remark in scikit-learn documentation about “power_transform”: " A common mistake is to apply it to the entire data before splitting into training and test sets. This will bias the model evaluation because information would have leaked from the test set to the training set."
Does it mean we should apply box-cox for instance only on training data? What if we want to make the target distribution more gaussian, should we do that only on the training data which seems not correct I think?