M1 wrap up Quiz Q6 program error

Hi,
got an error running my program.
Can somebody help as I supose I can not post my lines here ?

“The mean cross-validation accuracy is: nan +/- nan
/opt/conda/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:615: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last): File “/opt/conda/lib/python3.9/site-packages/sklearn/model_selection/_validation.py”, line 598, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File “/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py”, line 341, in fit
Xt = self._fit(X, y, **fit_params_steps)…”

Can you provide the full snippet of code to understand why it fails?
Also add error_score="raise" in cross_validate(...) or cross_val_score(...) to get the full traceback and provide this information as well.

Not sure that was allowed:

from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_validate


ames_housing = pd.read_csv("../datasets/house_prices.csv", na_values="?")
ames_housing = ames_housing.drop(columns="Id")

target_name = "SalePrice"
data, target = ames_housing.drop(columns=target_name), ames_housing[target_name]
target = (target > 200_000).astype(int)

# selectionner les colonnes numericales
numerical_columns = ["LotFrontage", "LotArea", "MasVnrArea", "BsmtFinSF1", "BsmtFinSF2",
  "BsmtUnfSF", "TotalBsmtSF", "1stFlrSF", "2ndFlrSF", "LowQualFinSF",
  "GrLivArea", "BedroomAbvGr", "KitchenAbvGr", "TotRmsAbvGrd", "Fireplaces",
  "GarageCars", "GarageArea", "WoodDeckSF", "OpenPorchSF", "EnclosedPorch",
  "3SsnPorch", "ScreenPorch", "PoolArea", "MiscVal"]

data_numeric = data[numerical_columns]
# selectionner les colonnes categorical

categorical_columns = data.drop(columns=["LotFrontage", "LotArea", "MasVnrArea", "BsmtFinSF1", "BsmtFinSF2", "BsmtUnfSF", "TotalBsmtSF", "1stFlrSF", "2ndFlrSF", "LowQualFinSF", "GrLivArea", "BedroomAbvGr", "KitchenAbvGr", "TotRmsAbvGrd", "Fireplaces", "GarageCars", "GarageArea", "WoodDeckSF", "OpenPorchSF", "EnclosedPorch", "3SsnPorch", "ScreenPorch", "PoolArea", "MiscVal"])

#prepare numeric data replacing empty by most frequent

#Imp_most=SimpleImputer(missing_values=np.nan, strategy='most_frequent')
#data_numeric=Imp_most.fit_transform(data_numeric)
scaler_imputer_transformer = make_pipeline(StandardScaler(), SimpleImputer(missing_values=np.nan, strategy='most_frequent'))
categorical_preprocessor = OneHotEncoder(handle_unknown="ignore")

#preprocessor = ColumnTransformer([('one-hot-encoder', categorical_preprocessor, categorical_columns),('standard-scaler', numerical_preprocessor, numerical_columns)])
preprocessor = ColumnTransformer([('one-hot-encoder', categorical_preprocessor, categorical_columns),('standard-scaler', scaler_imputer_transformer, numerical_columns)])
#separer en train & test sets (utile ?)

data_train, data_test, target_train, target_test = train_test_split(
    data, target, random_state=42)

#treat with logistic regression

model = make_pipeline(preprocessor, LogisticRegression(max_iter=500))
#test phase before calculating details


from sklearn.model_selection import train_test_split

data_train, data_test, target_train, target_test = train_test_split(
    data_numeric, target, random_state=42)



# do crossvalidation
cv_results = cross_validate(model, data, target, cv=5)
scores = cv_results["test_score"]
print("The mean cross-validation accuracy is: "
      f"{scores.mean():.3f} +/- {scores.std():.3f}")

Can you provide the full traceback as well.

you are super fast :slight_smile:

The mean cross-validation accuracy is: nan +/- nan
/opt/conda/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:615: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 341, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 303, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "/opt/conda/lib/python3.9/site-packages/joblib/memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 754, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 505, in fit_transform
    self._validate_remainder(X)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 324, in _validate_remainder
    self._has_str_cols = any(_determine_key_type(cols) == 'str'
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 324, in <genexpr>
    self._has_str_cols = any(_determine_key_type(cols) == 'str'
  File "/opt/conda/lib/python3.9/site-packages/sklearn/utils/__init__.py", line 268, in _determine_key_type
    raise ValueError(err_msg)
ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed

  warnings.warn("Estimator fit failed. The score on this train-test"
/opt/conda/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:615: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 341, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 303, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "/opt/conda/lib/python3.9/site-packages/joblib/memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 754, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 505, in fit_transform
    self._validate_remainder(X)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 324, in _validate_remainder
    self._has_str_cols = any(_determine_key_type(cols) == 'str'
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 324, in <genexpr>
    self._has_str_cols = any(_determine_key_type(cols) == 'str'
  File "/opt/conda/lib/python3.9/site-packages/sklearn/utils/__init__.py", line 268, in _determine_key_type
    raise ValueError(err_msg)
ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed

  warnings.warn("Estimator fit failed. The score on this train-test"
/opt/conda/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:615: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 341, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 303, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "/opt/conda/lib/python3.9/site-packages/joblib/memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 754, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 505, in fit_transform
    self._validate_remainder(X)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 324, in _validate_remainder
    self._has_str_cols = any(_determine_key_type(cols) == 'str'
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 324, in <genexpr>
    self._has_str_cols = any(_determine_key_type(cols) == 'str'
  File "/opt/conda/lib/python3.9/site-packages/sklearn/utils/__init__.py", line 268, in _determine_key_type
    raise ValueError(err_msg)
ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed

  warnings.warn("Estimator fit failed. The score on this train-test"
/opt/conda/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:615: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 341, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 303, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "/opt/conda/lib/python3.9/site-packages/joblib/memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 754, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 505, in fit_transform
    self._validate_remainder(X)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 324, in _validate_remainder
    self._has_str_cols = any(_determine_key_type(cols) == 'str'
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 324, in <genexpr>
    self._has_str_cols = any(_determine_key_type(cols) == 'str'
  File "/opt/conda/lib/python3.9/site-packages/sklearn/utils/__init__.py", line 268, in _determine_key_type
    raise ValueError(err_msg)
ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed

  warnings.warn("Estimator fit failed. The score on this train-test"
/opt/conda/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:615: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 341, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 303, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "/opt/conda/lib/python3.9/site-packages/joblib/memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/pipeline.py", line 754, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 505, in fit_transform
    self._validate_remainder(X)
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 324, in _validate_remainder
    self._has_str_cols = any(_determine_key_type(cols) == 'str'
  File "/opt/conda/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 324, in <genexpr>
    self._has_str_cols = any(_determine_key_type(cols) == 'str'
  File "/opt/conda/lib/python3.9/site-packages/sklearn/utils/__init__.py", line 268, in _determine_key_type
    raise ValueError(err_msg)
ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed

  warnings.warn("Estimator fit failed. The score on this train-test"
​```

The error is the following:

ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed

It is raised by the ColumnTransformer. It means that the provided column variable are wrong. So looking at your code, we have:

numerical_columns = ["LotFrontage", "LotArea", "MasVnrArea", "BsmtFinSF1", "BsmtFinSF2",
  "BsmtUnfSF", "TotalBsmtSF", "1stFlrSF", "2ndFlrSF", "LowQualFinSF",
  "GrLivArea", "BedroomAbvGr", "KitchenAbvGr", "TotRmsAbvGrd", "Fireplaces",
  "GarageCars", "GarageArea", "WoodDeckSF", "OpenPorchSF", "EnclosedPorch",
  "3SsnPorch", "ScreenPorch", "PoolArea", "MiscVal"]

categorical_columns = data.drop(columns=["LotFrontage", "LotArea", "MasVnrArea", "BsmtFinSF1",
 "BsmtFinSF2", "BsmtUnfSF", "TotalBsmtSF", "1stFlrSF", "2ndFlrSF", "LowQualFinSF", "GrLivArea", 
"BedroomAbvGr", "KitchenAbvGr", "TotRmsAbvGrd", "Fireplaces", "GarageCars", "GarageArea",
 "WoodDeckSF", "OpenPorchSF", "EnclosedPorch", "3SsnPorch", "ScreenPorch", "PoolArea", 
"MiscVal"])

While numerical_columns is a list of column names, categorical_columns is not. data.drop(columns=[...]) will return a dataframe and not the name of the categorical columns.

I would suggest the solution proposed there: How did you build the non-numerical columns? - #9 by aigle81

2 Likes

Thanks a lot, I will adjust
I also changed the sequence of operations to obtain this :
Image1

Is it ok ?

It looks meaningful

Thanks a lot. Just realized this is tougher than I thought but brings lot of value going deeper

image

I keep getting a similar error
" No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed"

Hi,
there´s an error in your construction of the preprocessor.
You use “data_numeric” but you meant to use “numerical_features”, don´t you? :wink:

I changed the numeric_features to data_numeric because that was the label I had used earlier for the numerical _column.

@nanfuka you need to pass the name of the columns and not the filtered data. To be explicit:

data = pd.read_csv(...)
categorical_columns = ["Col_A", "Col_B"]
data_categorical = data[categorical_columns]

You need to provide categorical_columns to the ColumnTransformer and not the data itself data_categorical.

Noted