Calling Data and Target from the Census

Zahra_R · 23 June 2021 04:25

At the end of lecture one, when target_test and data_test are called to check the accuracy, the code is not actually different from the original data and target that were used to train the model. So why does it even gives a different answer?

glemaitre58 · 23 June 2021 07:16

I am not sure to understand. We are using to distinct set to train and test:

model.fit(data_train, target_train)
model.score(data_test, targtet_test)

Could you be more explicit with what output do you want to compare with?

Zahra_R · 23 June 2021 07:59

At first we describe:
target = adult_census[target_name]
data = adult_census.drop(columns=[target_name, ])
and we go on with the task.

To check accuracy, we describe target_test and data_test exactly like above. Is it supposed to be this way? I thought the data which is used for the test must be different from that which’s been used for training.
Or have I terribly misunderstood the whole concept?

glemaitre58 · 23 June 2021 08:33

We do a train-test code in between

data_train, data_test, target_train, target_test = train_test_split(
    data, target)

So it means that we have 2 independent sets: a training and a testing set. At fit, we never used information from the testing set. We keep this information only for scoring.

Zahra_R · 23 June 2021 09:50

Thanks! I think I’ll get the hang of it as we go on.