Model.score() with Unmatching Error

Hi folks,
I cloned GitHub - INRIA/scikit-learn-mooc: scikit-learn-mooc on my colab to use the database. The whole code that I rewrite works well but just final cell:

ValueError: query data dimension must match training data dimension

can someone shed some light on this please?

Please provide the code snippet that produces the error and the full traceback to know which package raises this error (I don’t think that scikit-learn raises this specific message). It might be some inconsistency between the shape of X and y.

Hi @glemaitre58. Please let me know if the it is needed more traceback of code

!git clone 'https://github.com/INRIA/scikit-learn-mooc.git'
import pandas as pd

adult_census_test = pd.read_csv('/content/scikit-learn-mooc/datasets/adult-census-numeric-test.csv')
adult_census_test.head()
age	capital-gain	capital-loss	hours-per-week	class
0	20	0	0	35	<=50K
1	53	0	0	72	>50K
2	41	0	0	50	>50K
3	20	0	0	40	<=50K
4	25	0	0	40	<=50K
target_name = "class"
target_test = adult_census_test[target_name]
data_test = adult_census_test.drop(columns=[target_name, ])
print(f'The testing dataset contain {data_test.shape[0]} samples and'

f' {data_test.shape[1]} features')
The testing dataset contain 9769 samples and 4 features
accuracy = model.score(data_test_numeric, target_test)

model_name = model.__class__.__name__

print(f'The test accuracy using a {model_name} is '

f'{accuracy:.3f}')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-26a50ac37f3e> in <module>()
----> 1 accuracy = model.score(data_test_numeric, target_test)
      2 model_name = model.__class__.__name__
      3 
      4 print(f'The test accuracy using a {model_name} is '
      5 f'{accuracy:.3f}')

11 frames
/usr/local/lib/python3.7/dist-packages/sklearn/neighbors/_base.py in _tree_query_parallel_helper(tree, *args, **kwargs)
    545     under PyPy.
    546     """
--> 547     return tree.query(*args, **kwargs)
    548 
    549 

sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._kd_tree.BinaryTree.query()

ValueError: query data dimension must match training data dimension

Adapting slightly your code such that I can run it on colab, I do not get any error.
You can check the different steps and point-out what is the difference.

I assume that the data fitted and tested are not the same.

import pandas as pd

adult_census_train = pd.read_csv(
    '/content/scikit-learn-mooc/datasets/adult-census-numeric.csv'
)

adult_census_test = pd.read_csv(
    '/content/scikit-learn-mooc/datasets/adult-census-numeric-test.csv'
)

target_name = "class"
target_train = adult_census_train[target_name]
data_train = adult_census_train.drop(columns=[target_name, ])
target_test = adult_census_test[target_name]
data_test = adult_census_test.drop(columns=[target_name, ])

model = KNeighborsClassifier()
model.fit(data_train, target_train)
accuracy = model.score(data_test, target_test)

model_name = model.__class__.__name__
print(f'The test accuracy using a {model_name} is '
      f'{accuracy:.3f}')

You right. It works. Thank you