Hello,
I am trying to complete 01 wrap up quiz.
Currently on question 6 but I cannot work out why i am getting the error: ‘Found input variables with inconsistent numbers of samples: [24, 14, 60]’
This occurs when I try to split the numerical_features data and target data into train and test data; or when I try to run the the cross validation.
I cannot work out what I am missing or doing wrong in the code.
Please help.
Code:
import pandas as pd
ames_housing = pd.read_csv("../datasets/house_prices.csv", na_values="?")
ames_housing = ames_housing.drop(columns="Id")
target_name = "SalePrice"
data = ames_housing.drop(columns=target_name)
target = ames_housing[target_name]
goal = (target > 200_000).astype(int)
numerical_features = [
"LotFrontage", "LotArea", "MasVnrArea", "BsmtFinSF1", "BsmtFinSF2",
"BsmtUnfSF", "TotalBsmtSF", "1stFlrSF", "2ndFlrSF", "LowQualFinSF",
"GrLivArea", "BedroomAbvGr", "KitchenAbvGr", "TotRmsAbvGrd", "Fireplaces",
"GarageCars", "GarageArea", "WoodDeckSF", "OpenPorchSF", "EnclosedPorch",
"3SsnPorch", "ScreenPorch", "PoolArea", "MiscVal",
]
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
data_train, data_test, target_train, target_test = train_test_split(
numerical_features, target, random_state=42, test_size=0.25)
model = make_pipeline(StandardScaler(),SimpleImputer(strategy = 'mean'), LogisticRegression())
model