About data preprocessing

Hello. Thank you for the good exercise.
I have a question about data preprocessing.

categorical_columns_selector = selector(dtype_include=object)
categorical_columns = categorical_columns_selector(data)

I wonder if it is guaranteed that data_test is not internally cheated even if data is delivered as a parameter rather than data_train.

Thank you.

I am not sure to understand what you mean. Can you elaborate more on what you mean by “cheated”?

Be aware that the 2 lines of code that you provide are just used to separate numerical and categorical features based on their dtype and that the same processing will be done on the training and testing set.

I believe he misunderstand the concept of data leakage. In these case there is no transformation at all, just a column selection/flag/filter based on dtype.

I take this opportunity to say It’s an amazing course! Congrats.