Wrap up Quiz 6

I assume this question also follow Question 4 that need to remove ‘GarageCars’ variable.

Here is my code:

import numpy as np
from sklearn.linear_model import RidgeCV

alphas=np.logspace(-1, 3, num=30)
ridge = make_pipeline(StandardScaler(), SimpleImputer(), RidgeCV(alphas=alphas, store_cv_values=True))

from sklearn.model_selection import ShuffleSplit

cv = ShuffleSplit(n_splits=5, random_state=1)
model6_cv = cross_validate(ridge, data_r, target, scoring=‘neg_mean_squared_error’, return_train_score=True, cv=cv, return_estimator=True, n_jobs=-1)

coef6 = [estimator[-1].coef_ for estimator in model6_cv[‘estimator’]]

coef6 = pd.DataFrame(coef6, columns=‘numerical_features3’)

When I run the dataframe, it come out this error but I cannot find where is the error.

AssertionError                            Traceback (most recent call last)
/opt/conda/lib/python3.9/site-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
    567     try:
--> 568         columns = _validate_or_indexify_columns(content, columns)
    569         result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)

/opt/conda/lib/python3.9/site-packages/pandas/core/internals/construction.py in _validate_or_indexify_columns(content, columns)
    691             # caller's responsibility to check for this...
--> 692             raise AssertionError(
    693                 f"{len(columns)} columns passed, passed data had "

AssertionError: 19 columns passed, passed data had 23 columns

The above exception was the direct cause of the following exception:

    ValueError                                Traceback (most recent call last)
    <ipython-input-38-31fed112852d> in <module>
    ----> 1 coef6 = pd.DataFrame(coef6, columns='numerical_features3')
          2 coef6

    /opt/conda/lib/python3.9/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
        568                     if is_named_tuple(data[0]) and columns is None:
        569                         columns = data[0]._fields
    --> 570                     arrays, columns = to_arrays(data, columns, dtype=dtype)
        571                     columns = ensure_index(columns)
        572 

    /opt/conda/lib/python3.9/site-packages/pandas/core/internals/construction.py in to_arrays(data, columns, coerce_float, dtype)
        550         # last ditch effort
        551         data = [tuple(x) for x in data]
    --> 552         return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
        553 
        554 

    /opt/conda/lib/python3.9/site-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
        569         result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)
        570     except AssertionError as e:
    --> 571         raise ValueError(e) from e
        572     return result, columns
        573 

    ValueError: 19 columns passed, passed data had 23 columns

I think one of your problem is that you do:

coef6 = pd.DataFrame(coef6, columns='numerical_features3')

You should very likely pass the numerical_features3 variable instead:

coef6 = pd.DataFrame(coef6, columns=numerical_features3)
1 Like