Wp quiz 02

fhi62 · 30 May 2021 16:43

Hi,
code is not working to make a KFOLD test. Trying now for a while without finding the issue…

from sklearn.dummy import DummyClassifier

most_freq_donator_clf = DummyClassifier(strategy="most_frequent")
most_freq_donator_clf.fit(data_train, target_train)
score = most_freq_donator_clf.score(data_test, target_test)
print(f"Accuracy of a model predicting the most frequent class: {score:.3f}")


import numpy as np
from sklearn.model_selection import KFold

kf = KFold(n_splits=10)


model = DummyClassifier(strategy="most_frequent")

 
for train_index , test_index in kf.split(data):
     
    data_train , data_test = data[train_index], data[test_index]
    target_train , target_test = target[train_index], target[test_index]
    model.fit(data_train,target_train)
    pred_values = model.predict(data_test)
    acc = accuracy_score(pred_values , target_test)
    acc_score.append(acc)
     
avg_acc_score = sum(acc_score)/k
 
print('accuracy of each fold - {}'.format(acc_score))
print('Avg accuracy : {}'.format(avg_acc_score))

glemaitre58 · 30 May 2021 18:46

Could you provide as well the code that load the data such that we can reproduce. It would be also nice to give the full traceback error when there is one:

The signature of this function is accuracy_score(true, predicted) and not the inverse.

I modified slightly your code since some variable where missing:

In [18]: from sklearn.dummy import DummyClassifier
    ...: from sklearn.datasets import load_iris
    ...: data, target = load_iris(return_X_y=True)
    ...: 
    ...: import numpy as np
    ...: from sklearn.model_selection import KFold
    ...: 
    ...: k = 10
    ...: kf = KFold(n_splits=k, shuffle=True)
    ...: 
    ...: 
    ...: model = DummyClassifier(strategy="most_frequent")
    ...: 
    ...: acc_score = []
    ...: for train_index , test_index in kf.split(data):
    ...: 
    ...:     data_train , data_test = data[train_index], data[test_index]
    ...:     target_train , target_test = target[train_index], target[test_index]
    ...:     model.fit(data_train,target_train)
    ...:     pred_values = model.predict(data_test)
    ...:     acc = accuracy_score(target_test, pred_values)
    ...:     acc_score.append(acc)
    ...: 
    ...: avg_acc_score = sum(acc_score)/k
    ...: 
    ...: print('accuracy of each fold - {}'.format(acc_score))
    ...: print('Avg accuracy : {}'.format(avg_acc_score))

However, you could use cross_val_score or cross_validate?

In [17]: cross_val_score(model, data, target, cv=kf)
Out[17]: 
array([0.2       , 0.26666667, 0.33333333, 0.2       , 0.13333333,
       0.13333333, 0.33333333, 0.26666667, 0.26666667, 0.2       ])

Jawakar · 17 June 2021 11:50

I’m also facing issue, here is my code

from sklearn.dummy import DummyClassifier
dummy = DummyClassifier("most_frequent")
dummy.fit(X_train, y_train)
dummy.score(X_test, y_test)from sklearn.model_selection import KFold
k = 10
kf = KFold(n_splits=k, shuffle=True)
def get_score(model, X_train, X_test, y_train, y_test):
    model.fit(X_train, X_test)
    return model.score(y_train, y_test)dummy_scr = []
for train_index, test_index in kf.split(data):
    X_train, X_test = data[train_index], data[test_index]
    y_train, y_test =  target[train_index], target[test_index]
    dummy_scr.append(get_score(dummy, X_train, X_test, y_train, y_test))

Err

glemaitre58 · 17 June 2021 11:57

This line is really weird. Can you check you snippet.

Since data is a dataframe and you want to index with indices and not by location, you need to write it as:

X_train, X_test = data.iloc[train_index, :], data.iloc[test_index, :]

Jawakar · 17 June 2021 12:50

After the changes you suggested I get a new error. Help me to fix it out.

from sklearn.dummy import DummyClassifier
from sklearn.model_selection import KFold

dummy = DummyClassifier("most_frequent")

k = 10
kf = KFold(n_splits=k, shuffle=True)

def get_score(model, X_train, X_test, y_train, y_test):
    model.fit(X_train, X_test)
    return model.score(y_train, y_test)

dummy_scr = []
for train_index, test_index in kf.split(data):
    X_train, X_test = data.iloc[train_index,:], data.iloc[test_index,:]
    y_train, y_test =  target.iloc[train_index,:], target.iloc[test_index,:]
    dummy_scr.append(get_score(dummy, X_train, X_test, y_train, y_test))

Err2

glemaitre58 · 19 June 2021 07:31

Here this is a series (1D), you should not index with 2 variable or slices:

y_train = target.iloc[train_index]