Hi all,
I was wondering what are the potential drawback (if any) in scaling the test data with the transformer fitted on the train data?
Usually the train set is way bigger than the test data, so probably their estimated statistics (mean/std) are closer to the population compared to the test data. However if the split is 50 - 50, can it be that we are not properly scaling the test data (like there is some bias due to the statistics)? What are the implications?
Thanks,
Matteo