Value Error and accuracy_score

hi,
when instantiating dummy classifier like so:

class_to_predict = ">50K"

and

dum_clf_over = DummyClassifier(
    strategy = "constant",
    constant = class_to_predict
)

the following error gets raised:

ValueError: The constant target value must be present in the training data. You provided constant=>50K. Possible values are: [’ <=50K’, ’ >50K’].

after calling

dum_clf_over.fit(X_train, y_train)

Any hints on what I’m getting wrong here?

Also, I wonder about the arguments you’ve passed in the solution when calculating accuracy:

score = high_revenue_clf.score(data_numeric_test, target_test)

I understand “.score” takes predictions (“y_pred”) and compares them to real values of labels (“y_test”) and am surprised to see that you’ve passed “X_test” and “y_test”.

Could u please clarify why you pass “data_numeric_test”, instead of some kind of “y_pred”.

In my understanding, the following code should work as well:

pred_clf = dum_clf_over.predict(X_test)
accuracy_score(y_true = target, y_pred = pred_clf)

But it seems “.score” saves “.predict()” under the hood?

Many thanks in advance!

The error mentions that you provided the string =>50K which is none of the available classes: [" <=50K, " >50K"]

So you should really use the above two classes and make sure to respect the white spaces in the naming of the classes.

The .score() method is a helper for doing exactly the following:

def score(clf, X, y_true):
    y_pred = clf.predict(X)
    return accuracy_score(y_true, y_pred)

So score takes as input the data X and the true target y. It computes the predictions using X and the predict method under the hood and then compute the score given the target y and the predictions and returns the score.

Saves is not the right term since the predictions will not be stored anywhere but it indeed compute them before to call the scoring functions.

Thank you!
In fact, I had not taken into account the whitespace in " >50K".

Thank you also for clarification re score: I initially got confused with the difference between calling .score() method and calling accuracy_score(y_true, y_pred).

1 Like