Introduction to Machine Learning¶

Types of Algorithms to use and when:

Continuous Variables = Regression Analysis Categorical Variables/Data = Classification Algorithms Clustering Algorithms

Algorithm = Input (Independent Variables) -> Model -> Output

How to Build a Mode: 1) Get Data Set 2) Clean Data 3) Split Data 4) Train Model 5) Iterate Unit Model is Optimized 6) Test Model 7) Make Predictions on New Data with optimized model

In order to test your model, you use a percentage of the data (about 80%) for training your model and the remaining percentage (about 20%) to test the accuracy of your model.

import numpy as np
from sklearn.cross_validation import train_test_split

y = np.arange(0,5)
x = np.arange(0,10).reshape(5,2)

C:\Users\nwerner\AppData\Local\Continuum\Anaconda2\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

# 60% training data, 40% testing data

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.40)
print x_train
print x_test
print y_train
print y_test

[[0 1]
 [4 5]
 [8 9]]
[[6 7]
 [2 3]]
[0 2 4]
[3 1]

# 60% training data, 40% testing data
# Random State gives you a random set of train and testing value, but remembers how the data was broken up for validation purposes later

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.40, random_state = 0)
print x_train
print x_test
print y_train
print y_test

[[2 3]
 [6 7]
 [8 9]]
[[4 5]
 [0 1]]
[1 3 4]
[2 0]

Cross Validation - Run the model on different breaks of the data and average the multiple folds of the model

from sklearn.cross_validation import KFold

x = np.arange(16).reshape(8,2)
y = np.arange(8)
print x
print y

[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]
 [12 13]
 [14 15]]
[0 1 2 3 4 5 6 7]

kf = KFold(len(x),n_folds=4)
print len(kf)
print len(x)

4
8

for train_index, test_index in kf:
    print 'TRAIN:',train_index, 'TEST:',test_index

TRAIN: [2 3 4 5 6 7] TEST: [0 1]
TRAIN: [0 1 4 5 6 7] TEST: [2 3]
TRAIN: [0 1 2 3 6 7] TEST: [4 5]
TRAIN: [0 1 2 3 4 5] TEST: [6 7]

Regression Analysis¶

Most common Regression Techniques: 1) Multiple Regressions 2) Lasso Regression 3) Ridge Regression

Linear Regression: r^2 = correlation between the 2 variables (value from -1 to 1, low correlation to high correlation) R^2 = how close the data points are to the least squares line (value from 0 to 1, not close to very close)

%matplotlib inline
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('heightweight.csv')
df.head(10)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 237 entries, 0 to 236
Data columns (total 5 columns):
sex         237 non-null object
ageYear     237 non-null float64
ageMonth    237 non-null int64
heightIn    237 non-null float64
weightLb    237 non-null float64
dtypes: float64(3), int64(1), object(1)
memory usage: 9.3+ KB

df.describe()

df.isnull().sum()

sex         0
ageYear     0
ageMonth    0
heightIn    0
weightLb    0
dtype: int64

df.sex.nunique()

2

df.sex.value_counts()

m    126
f    111
Name: sex, dtype: int64

# Creating Dummy Variables for the Columnn 'Sex' because 'Sex' is an object

df_dummies = pd.get_dummies(df,drop_first=True)
# Only creates dummy variables for 'Object' Class
# drop_first = True, drops the one column of dummy variables because the other couumn is implied
df_dummies.head()

df_dummies.corr()

# Gives you the correlation between the different variables

plt.hist(df.heightIn,30)

(array([  1.,   2.,   1.,   2.,   2.,   2.,   6.,   5.,  11.,  12.,  14.,
         10.,  13.,  19.,   7.,  28.,  14.,  16.,  11.,  13.,  12.,   8.,
         12.,   6.,   3.,   2.,   2.,   0.,   2.,   1.]),
 array([ 50.5       ,  51.21666667,  51.93333333,  52.65      ,
         53.36666667,  54.08333333,  54.8       ,  55.51666667,
         56.23333333,  56.95      ,  57.66666667,  58.38333333,
         59.1       ,  59.81666667,  60.53333333,  61.25      ,
         61.96666667,  62.68333333,  63.4       ,  64.11666667,
         64.83333333,  65.55      ,  66.26666667,  66.98333333,
         67.7       ,  68.41666667,  69.13333333,  69.85      ,
         70.56666667,  71.28333333,  72.        ]),
 <a list of 30 Patch objects>)

plt.scatter(df.weightLb,df.heightIn)

<matplotlib.collections.PathCollection at 0x2195d668>

df[df['weightLb']>171]

from scipy import stats

estheight = stats.linregress(df.weightLb,df.heightIn)
estheight

LinregressResult(slope=0.15725760723145538, intercept=45.433100634484191, rvalue=0.77487610662760176, pvalue=1.0286858314030378e-48, stderr=0.0083683585485207733)

def predict(x):
    return estheight.slope*x + estheight.intercept

# Creating a new column called 'Predicted Height' based off of least squares line

df['Predicted Height'] = predict(df.weightLb)
df['Predicted Height']
df.head()

df['Height Error'] = abs(df['heightIn'] - df['Predicted Height'])
df.head()

from sklearn.metrics import mean_squared_error, r2_score
(mean_squared_error(df.heightIn,df['Predicted Height']))**0.5

2.4886733321477559

r2_score(df.heightIn,df['Predicted Height'])

0.60043298062235007

Multi-Variable Regression¶

df = pd.read_excel('cars.xls')
df.head()

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 804 entries, 0 to 803
Data columns (total 12 columns):
Price       804 non-null float64
Mileage     804 non-null int64
Make        804 non-null object
Model       804 non-null object
Trim        804 non-null object
Type        804 non-null object
Cylinder    804 non-null int64
Liter       804 non-null float64
Doors       804 non-null int64
Cruise      804 non-null int64
Sound       804 non-null int64
Leather     804 non-null int64
dtypes: float64(2), int64(6), object(4)
memory usage: 75.4+ KB

df.nunique()

Price       798
Mileage     791
Make          6
Model        32
Trim         47
Type          5
Cylinder      3
Liter        16
Doors         2
Cruise        2
Sound         2
Leather       2
dtype: int64

df.describe()

df.corr()

# Input for model in x-axis
x = df[['Mileage','Cylinder','Liter','Cruise']]

# Predict on the Y-Axis is the price
y = df[['Price']]

print x.head()
y.head()

   Mileage  Cylinder  Liter  Cruise
0     8221         6    3.1       1
1     9135         6    3.1       1
2    13196         6    3.1       1
3    16342         6    3.1       1
4    19832         6    3.1       1

import statsmodels.api as sm
import pandas as pd
import matplotlib.pyplot as plt
import seaborn

C:\Users\nwerner\AppData\Local\Continuum\Anaconda2\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

# Adding in a constant for a y-intercept. Will obtain a y-intercept later
x1 = sm.add_constant(x)
x1.head()

# Stats Module. Ordinary Least Squares
# Fit 
est =sm.OLS(y,x1)
est_price = est.fit()
est_price.summary()

# Use Built-In Function in Stats Model to give you the
y_pred = est_price.predict(x1)
type(y_pred)

pandas.core.series.Series

x1['Predicted Price'] = y_pred
x1.head()

from sklearn.metrics import mean_squared_error, r2_score
(mean_squared_error(df['Price'],x1['Predicted Price']))**0.5

7635.5141159257228

r2_score(df['Price'],x1['Predicted Price'])

0.40258426189243068

x1.drop('Liter',axis=1,inplace=True)
x1.head()

print mean_squared_error(df['Price'],x1['Predicted Price'])**0.5
r2_score(df['Price'],x1['Predicted Price'])

7635.51411593

0.40258426189243068

SKlearn¶

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x1,y,train_size=0.8)

X_train.shape, y_train.shape, X_test.shape, y_test.shape

((643, 5), (643, 1), (161, 5), (161, 1))

Steps: 1) Define the Algorithm 2) Fit the Algorithm 3) Predict from the Algorithm

from sklearn import linear_model

# Linear Regression
reg = linear_model.LinearRegression()
regmodel = reg.fit(X_train,y_train)

y_predtest = regmodel.predict(X_test)

from sklearn.metrics import mean_squared_error, r2_score

mean_squared_error(y_test, y_predtest)**0.5

7507.4298119835139

r2_score(y_test,y_predtest)

0.43135697172021248

print regmodel.intercept_
print regmodel.coef_

[ 3698.06614391]
[[  0.00000000e+00  -2.13857816e-01   4.43557098e+03   8.80595925e+03
   -3.68265040e-01]]

Ridge Regression¶

from sklearn.linear_model import Ridge,Lasso

# Create Model using Ridge Regression
ridgereg = linear_model.Ridge()
ridgereg.fit(X_train,y_train)

# Get Predicted Values using Ridge Regression Model
y_pred_ridge = ridgereg.predict(X_test)

# Test Predicted Values using Ridge Regression Model

print mean_squared_error(y_test,y_pred_ridge)**0.5
r2_score(y_test,y_pred_ridge)

7487.98904709

0.43429820253318729

Lasso Regression¶

# Creat Model using Lasso Regression
lassoreg = linear_model.Lasso()
lassoreg.fit(X_train,y_train)

# Get Predicted Values using Lasso Regression Model
y_pred_lasso = lassoreg.predict(X_test)

C:\Users\nwerner\AppData\Local\Continuum\Anaconda2\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)

# Test Predicted Values using Lasso Regression Model

print mean_squared_error(y_test,y_pred_lasso)**0.5
r2_score(y_test,y_pred_lasso)

7501.13081891

0.43231079371436343

Class Mini Project - Predicting Housing Prices¶

# Correlation Matrix
# Select Best Variables
# Remove Text Variables

# Shooting for an error of less than $100k

df = pd.read_csv('kc_house_data.csv')
df.head()

df['age'] = 2017 - df['yr_built']

df['renovated?'] = df['yr_renovated'].apply(lambda i: 1 if i>0 else 0)

df['basement?'] = df['sqft_basement'].apply(lambda i: 1 if i>0 else 0)

df['greaterThan15?'] = df['sqft_living'] - df['sqft_living15']

df['bathrooms'] = df['bathrooms']*1000

x = pd.get_dummies(df,columns=['zipcode'],drop_first=True)

# x = pd.get_dummies(df,columns=['lat'],drop_first=True)
# Will not print when I include 'lat' and 'long' in get_dummies. Too much info?

# Eliminating Outlier Housing Prices

high = np.percentile(x.price,90)
low = np.percentile(x.price,10)

x_no_outs = x[(x.price < high) & (x.price > low)]

x_no_outs.head()

x_no_outs.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 17280 entries, 1 to 21612
Data columns (total 93 columns):
id                17280 non-null int64
date              17280 non-null object
price             17280 non-null float64
bedrooms          17280 non-null int64
bathrooms         17280 non-null float64
sqft_living       17280 non-null int64
sqft_lot          17280 non-null int64
floors            17280 non-null float64
waterfront        17280 non-null int64
view              17280 non-null int64
condition         17280 non-null int64
grade             17280 non-null int64
sqft_above        17280 non-null int64
sqft_basement     17280 non-null int64
yr_built          17280 non-null int64
yr_renovated      17280 non-null int64
lat               17280 non-null float64
long              17280 non-null float64
sqft_living15     17280 non-null int64
sqft_lot15        17280 non-null int64
age               17280 non-null int64
renovated?        17280 non-null int64
basement?         17280 non-null int64
greaterThan15?    17280 non-null int64
zipcode_98002     17280 non-null uint8
zipcode_98003     17280 non-null uint8
zipcode_98004     17280 non-null uint8
zipcode_98005     17280 non-null uint8
zipcode_98006     17280 non-null uint8
zipcode_98007     17280 non-null uint8
zipcode_98008     17280 non-null uint8
zipcode_98010     17280 non-null uint8
zipcode_98011     17280 non-null uint8
zipcode_98014     17280 non-null uint8
zipcode_98019     17280 non-null uint8
zipcode_98022     17280 non-null uint8
zipcode_98023     17280 non-null uint8
zipcode_98024     17280 non-null uint8
zipcode_98027     17280 non-null uint8
zipcode_98028     17280 non-null uint8
zipcode_98029     17280 non-null uint8
zipcode_98030     17280 non-null uint8
zipcode_98031     17280 non-null uint8
zipcode_98032     17280 non-null uint8
zipcode_98033     17280 non-null uint8
zipcode_98034     17280 non-null uint8
zipcode_98038     17280 non-null uint8
zipcode_98039     17280 non-null uint8
zipcode_98040     17280 non-null uint8
zipcode_98042     17280 non-null uint8
zipcode_98045     17280 non-null uint8
zipcode_98052     17280 non-null uint8
zipcode_98053     17280 non-null uint8
zipcode_98055     17280 non-null uint8
zipcode_98056     17280 non-null uint8
zipcode_98058     17280 non-null uint8
zipcode_98059     17280 non-null uint8
zipcode_98065     17280 non-null uint8
zipcode_98070     17280 non-null uint8
zipcode_98072     17280 non-null uint8
zipcode_98074     17280 non-null uint8
zipcode_98075     17280 non-null uint8
zipcode_98077     17280 non-null uint8
zipcode_98092     17280 non-null uint8
zipcode_98102     17280 non-null uint8
zipcode_98103     17280 non-null uint8
zipcode_98105     17280 non-null uint8
zipcode_98106     17280 non-null uint8
zipcode_98107     17280 non-null uint8
zipcode_98108     17280 non-null uint8
zipcode_98109     17280 non-null uint8
zipcode_98112     17280 non-null uint8
zipcode_98115     17280 non-null uint8
zipcode_98116     17280 non-null uint8
zipcode_98117     17280 non-null uint8
zipcode_98118     17280 non-null uint8
zipcode_98119     17280 non-null uint8
zipcode_98122     17280 non-null uint8
zipcode_98125     17280 non-null uint8
zipcode_98126     17280 non-null uint8
zipcode_98133     17280 non-null uint8
zipcode_98136     17280 non-null uint8
zipcode_98144     17280 non-null uint8
zipcode_98146     17280 non-null uint8
zipcode_98148     17280 non-null uint8
zipcode_98155     17280 non-null uint8
zipcode_98166     17280 non-null uint8
zipcode_98168     17280 non-null uint8
zipcode_98177     17280 non-null uint8
zipcode_98178     17280 non-null uint8
zipcode_98188     17280 non-null uint8
zipcode_98198     17280 non-null uint8
zipcode_98199     17280 non-null uint8
dtypes: float64(5), int64(18), object(1), uint8(69)
memory usage: 4.4+ MB

x_no_outs.corr()

# Predict on the Y-Axis is the price
y = x_no_outs[['price']]

# Input for model in x-axis
x_no_outs.drop(['date','id','price','age'],axis=1,inplace=True)
#,'yr_built','condition','waterfront'

C:\Users\nwerner\AppData\Local\Continuum\Anaconda2\lib\site-packages\ipykernel_launcher.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """

x_no_outs = sm.add_constant(x_no_outs)
x_no_outs.head()

# Stats Module. Ordinary Least Squares
# Fit 
est =sm.OLS(y,x_no_outs)
est_price = est.fit()
est_price.summary()

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x_no_outs,y,train_size=0.8)
X_train.shape, y_train.shape, X_test.shape, y_test.shape

((13824, 90), (13824, 1), (3456, 90), (3456, 1))

from sklearn import linear_model

# Linear Regression
reg = linear_model.LinearRegression()
regmodel = reg.fit(X_train,y_train)

y_predtest = regmodel.predict(X_test)

from sklearn.metrics import mean_squared_error, r2_score

mean_squared_error(y_test, y_predtest)**0.5

73768.106058077668

r2_score(y_test,y_predtest)

0.79396540202174837

from sklearn.linear_model import Ridge,Lasso

# Create Model using Ridge Regression
ridgereg = linear_model.Ridge()
ridgereg.fit(X_train,y_train)

# Get Predicted Values using Ridge Regression Model
y_pred_ridge = ridgereg.predict(X_test)

# Test Predicted Values using Ridge Regression Model

print mean_squared_error(y_test,y_pred_ridge)**0.5
r2_score(y_test,y_pred_ridge)

73978.8268305

0.79278663300683838

# Creat Model using Lasso Regression
lassoreg = linear_model.Lasso()
lassoreg.fit(X_train,y_train)

# Get Predicted Values using Lasso Regression Model
y_pred_lasso = lassoreg.predict(X_test)

# Test Predicted Values using Lasso Regression Model

print mean_squared_error(y_test,y_pred_lasso)**0.5
r2_score(y_test,y_pred_lasso)

73827.3657178

0.79363424415642869

	sex	ageYear	ageMonth	heightIn	weightLb
0	f	11.91667	143	56.3	85.0
1	f	12.91667	155	62.3	105.0
2	f	12.75000	153	63.3	108.0
3	f	13.41667	161	59.0	92.0
4	f	15.91667	191	62.5	112.5
5	f	14.25000	171	62.5	112.0
6	f	15.41667	185	59.0	104.0
7	f	11.83333	142	56.5	69.0
8	f	13.33333	160	62.0	94.5
9	f	11.66667	140	53.8	68.5

	ageYear	ageMonth	heightIn	weightLb
count	237.000000	237.000000	237.000000	237.000000
mean	13.702532	164.430380	61.364557	101.308017
std	1.535481	18.425767	3.945402	19.440698
min	11.583330	139.000000	50.500000	50.500000
25%	12.333330	148.000000	58.800000	85.000000
50%	13.583330	163.000000	61.500000	101.000000
75%	14.833330	178.000000	64.300000	112.000000
max	20.833330	250.000000	72.000000	171.500000

	ageYear	ageMonth	heightIn	weightLb
0	11.91667	143	56.3	85.0
1	12.91667	155	62.3	105.0
2	12.75000	153	63.3	108.0
3	13.41667	161	59.0	92.0
4	15.91667	191	62.5	112.5

	ageYear	ageMonth	heightIn	weightLb	sex_m
ageYear	1.000000	1.000000	0.648857	0.634636	0.001275
ageMonth	1.000000	1.000000	0.648857	0.634636	0.001275
heightIn	0.648857	0.648857	1.000000	0.774876	0.199880
weightLb	0.634636	0.634636	0.774876	1.000000	0.117550
sex_m	0.001275	0.001275	0.199880	0.117550	1.000000

	sex	ageYear	ageMonth	heightIn	weightLb
130	m	20.83333	250	67.5	171.5
213	m	17.16667	206	69.5	171.5

	Price	Mileage	Make	Model	Trim	Type	Cylinder	Liter	Doors	Cruise	Sound	Leather
0	17314.103129	8221	Buick	Century	Sedan 4D	Sedan	6	3.1	4	1	1	1
1	17542.036083	9135	Buick	Century	Sedan 4D	Sedan	6	3.1	4	1	1	0
2	16218.847862	13196	Buick	Century	Sedan 4D	Sedan	6	3.1	4	1	1	0
3	16336.913140	16342	Buick	Century	Sedan 4D	Sedan	6	3.1	4	1	0	0
4	16339.170324	19832	Buick	Century	Sedan 4D	Sedan	6	3.1	4	1	0	1

	Price	Mileage	Cylinder	Liter	Doors	Cruise	Sound	Leather
count	804.000000	804.000000	804.000000	804.000000	804.000000	804.000000	804.000000	804.000000
mean	21343.143767	19831.934080	5.268657	3.037313	3.527363	0.752488	0.679104	0.723881
std	9884.852801	8196.319707	1.387531	1.105562	0.850169	0.431836	0.467111	0.447355
min	8638.930895	266.000000	4.000000	1.600000	2.000000	0.000000	0.000000	0.000000
25%	14273.073870	14623.500000	4.000000	2.200000	4.000000	1.000000	0.000000	0.000000
50%	18024.995019	20913.500000	6.000000	2.800000	4.000000	1.000000	1.000000	1.000000
75%	26717.316636	25213.000000	6.000000	3.800000	4.000000	1.000000	1.000000	1.000000
max	70755.466717	50387.000000	8.000000	6.000000	4.000000	1.000000	1.000000	1.000000

	Price	Mileage	Cylinder	Liter	Doors	Cruise	Sound	Leather
Price	1.000000	-0.143051	0.569086	0.558146	-0.138750	0.430851	-0.124348	0.157197
Mileage	-0.143051	1.000000	-0.029461	-0.018641	-0.016944	0.025037	-0.026146	0.001005
Cylinder	0.569086	-0.029461	1.000000	0.957897	0.002206	0.354285	-0.089704	0.075520
Liter	0.558146	-0.018641	0.957897	1.000000	-0.079259	0.377509	-0.065527	0.087332
Doors	-0.138750	-0.016944	0.002206	-0.079259	1.000000	-0.047674	-0.062530	-0.061969
Cruise	0.430851	0.025037	0.354285	0.377509	-0.047674	1.000000	-0.091730	-0.070573
Sound	-0.124348	-0.026146	-0.089704	-0.065527	-0.062530	-0.091730	1.000000	0.165444
Leather	0.157197	0.001005	0.075520	0.087332	-0.061969	-0.070573	0.165444	1.000000

Dep. Variable:	Price	R-squared:	0.403
Model:	OLS	Adj. R-squared:	0.400
Method:	Least Squares	F-statistic:	134.6
Date:	Sun, 18 Feb 2018	Prob (F-statistic):	6.79e-88
Time:	18:32:24	Log-Likelihood:	-8329.0
No. Observations:	804	AIC:	1.667e+04
Df Residuals:	799	BIC:	1.669e+04
Df Model:	4
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	2799.8569	1543.265	1.814	0.070	-229.475	5829.189
Mileage	-0.1644	0.033	-4.977	0.000	-0.229	-0.100
Cylinder	3007.8027	679.271	4.428	0.000	1674.436	4341.169
Liter	455.7311	860.596	0.530	0.597	-1233.565	2145.027
Cruise	6076.0488	676.570	8.981	0.000	4747.984	7404.114

Omnibus:	187.995	Durbin-Watson:	0.217
Prob(Omnibus):	0.000	Jarque-Bera (JB):	395.298
Skew:	1.297	Prob(JB):	1.45e-86
Kurtosis:	5.253	Cond. No.	1.38e+05

	const	Mileage	Cylinder	Liter	Cruise	Predicted Price
0	1.0	8221	6	3.1	1	26984.050024
1	1.0	9135	6	3.1	1	26833.798929
2	1.0	13196	6	3.1	1	26166.217203
3	1.0	16342	6	3.1	1	25649.050961
4	1.0	19832	6	3.1	1	25075.335072

	id	date	price	bedrooms	bathrooms	sqft_living	sqft_lot	floors	...	grade	sqft_above	sqft_basement	yr_built	yr_renovated	zipcode	lat	long	sqft_living15	sqft_lot15
0	7129300520	20141013T000000	221900.0	3	1.00	1180	5650	1.0	...	7	1180	0	1955	0	98178	47.5112	-122.257	1340	5650
1	6414100192	20141209T000000	538000.0	3	2.25	2570	7242	2.0	...	7	2170	400	1951	1991	98125	47.7210	-122.319	1690	7639
2	5631500400	20150225T000000	180000.0	2	1.00	770	10000	1.0	...	6	770	0	1933	0	98028	47.7379	-122.233	2720	8062
3	2487200875	20141209T000000	604000.0	4	3.00	1960	5000	1.0	...	7	1050	910	1965	0	98136	47.5208	-122.393	1360	5000
4	1954400510	20150218T000000	510000.0	3	2.00	1680	8080	1.0	...	8	1680	0	1987	0	98074	47.6168	-122.045	1800	7503

	id	price	bedrooms	bathrooms	sqft_living	sqft_lot	floors	waterfront	view	condition	...	zipcode_98146	zipcode_98148	zipcode_98155	zipcode_98166	zipcode_98168	zipcode_98177	zipcode_98178	zipcode_98188	zipcode_98198	zipcode_98199
id	1.000000	0.025003	0.005132	0.030501	0.015257	-0.141358	0.032259	-0.000446	0.019340	-0.039690	...	0.001460	0.019428	0.004035	-0.020824	-0.037117	-0.023158	-0.014553	0.004211	0.020111	-0.036516
price	0.025003	1.000000	0.218041	0.321041	0.505100	0.085042	0.183024	0.036774	0.172145	0.044692	...	-0.047665	-0.047400	-0.073215	-0.031079	-0.091548	0.029505	-0.070786	-0.060059	-0.075009	0.088184
bedrooms	0.005132	0.218041	1.000000	0.464432	0.572837	0.016306	0.109824	-0.041325	0.008489	0.028000	...	-0.007079	-0.004261	-0.002393	0.019327	-0.000704	-0.005586	0.022182	0.024950	0.004861	-0.048317
bathrooms	0.030501	0.321041	0.464432	1.000000	0.672830	0.049415	0.472426	-0.028300	0.044584	-0.151723	...	-0.039758	-0.004273	-0.064857	-0.026581	-0.040804	-0.027797	-0.012042	0.013796	-0.009957	-0.036033
sqft_living	0.015257	0.505100	0.572837	0.672830	1.000000	0.151776	0.279252	-0.010924	0.117881	-0.078037	...	-0.027206	-0.013488	-0.048314	0.013262	-0.019172	0.003335	0.002903	0.014412	0.001799	-0.041607
sqft_lot	-0.141358	0.085042	0.016306	0.049415	0.151776	1.000000	-0.033157	0.015360	0.090070	0.008647	...	-0.015421	-0.006879	-0.017788	-0.003559	-0.003929	-0.014203	-0.015312	-0.007046	-0.007937	-0.028812
floors	0.032259	0.183024	0.109824	0.472426	0.279252	-0.033157	1.000000	-0.016143	-0.037225	-0.300903	...	-0.042592	-0.020051	-0.084405	-0.052245	-0.037037	-0.056613	-0.047729	-0.024656	-0.036357	-0.029076
waterfront	-0.000446	0.036774	-0.041325	-0.028300	-0.010924	0.015360	-0.016143	1.000000	0.299867	0.015918	...	0.026583	-0.002569	-0.008375	0.044280	-0.004422	-0.005915	0.073863	-0.003727	0.084290	-0.006194
view	0.019340	0.172145	0.008489	0.044584	0.117881	0.090070	-0.037225	0.299867	1.000000	0.028564	...	0.045507	-0.012774	-0.010439	0.067675	-0.017433	0.048496	0.102690	0.008440	0.115757	0.010946
condition	-0.039690	0.044692	0.028000	-0.151723	-0.078037	0.008647	-0.300903	0.015918	0.028564	1.000000	...	-0.000580	-0.022754	0.022500	0.036199	-0.014719	0.010520	0.000918	-0.006151	0.009113	0.016368
grade	0.042787	0.526507	0.257400	0.554000	0.640211	0.079411	0.428700	-0.026470	0.082694	-0.200611	...	-0.052570	-0.025302	-0.069663	-0.018610	-0.062231	0.005703	-0.048629	-0.020074	-0.031451	-0.011463
sqft_above	0.020153	0.422028	0.436234	0.595395	0.830403	0.153917	0.502136	-0.018199	0.025344	-0.198297	...	-0.034080	-0.006267	-0.058595	-0.005371	-0.030342	-0.018864	-0.035570	-0.001502	-0.015296	-0.075661
sqft_basement	-0.007297	0.171052	0.265339	0.171936	0.347285	0.005781	-0.357868	0.011569	0.163038	0.197582	...	0.009888	-0.012987	0.014317	0.032181	0.017615	0.037570	0.064934	0.027677	0.028884	0.054743
yr_built	0.025875	-0.010143	0.138653	0.553601	0.341695	0.038631	0.524942	-0.047645	-0.079718	-0.382869	...	-0.042173	-0.005811	-0.056478	-0.051522	-0.035865	-0.046047	-0.040616	-0.001615	-0.008958	-0.068162
yr_renovated	-0.020053	0.065906	-0.006490	0.008492	0.008218	0.015188	-0.017623	0.059604	0.051425	-0.059040	...	0.031913	-0.009300	0.000126	0.038978	-0.001021	0.023975	0.000636	-0.004548	0.009849	0.012810
lat	0.002947	0.368550	-0.101505	-0.146746	-0.141875	-0.132476	-0.056329	-0.056577	-0.075202	0.013081	...	-0.051015	-0.046500	0.211658	-0.094052	-0.048495	0.139597	-0.050346	-0.063099	-0.131219	0.066572
long	0.021436	0.068074	0.147883	0.265490	0.309147	0.225690	0.133000	-0.064348	-0.086554	-0.126502	...	-0.108866	-0.039508	-0.100804	-0.106189	-0.056350	-0.120743	-0.022756	-0.033502	-0.072726	-0.149578
sqft_living15	0.022375	0.460563	0.336654	0.468683	0.700420	0.141574	0.217445	-0.002009	0.141564	-0.131867	...	-0.057549	-0.021051	-0.060647	-0.005693	-0.049157	0.012808	-0.022951	-0.021600	-0.010231	-0.037481
sqft_lot15	-0.151596	0.074105	0.012337	0.049246	0.165413	0.735087	-0.037051	0.034714	0.084813	0.007855	...	-0.017703	-0.006940	-0.016221	-0.000650	-0.006656	-0.013235	-0.017073	-0.007127	-0.010318	-0.033526
age	-0.025875	0.010143	-0.138653	-0.553601	-0.341695	-0.038631	-0.524942	0.047645	0.079718	0.382869	...	0.042173	0.005811	0.056478	0.051522	0.035865	0.046047	0.040616	0.001615	0.008958	0.068162
renovated?	-0.020044	0.065586	-0.006708	0.008054	0.008063	0.015252	-0.017627	0.059866	0.051609	-0.058605	...	0.032051	-0.009300	0.000101	0.038950	-0.000908	0.023986	0.000575	-0.004554	0.009816	0.012844
basement?	-0.002479	0.111738	0.121646	0.082818	0.132705	-0.038494	-0.344237	0.012306	0.122284	0.150601	...	0.008239	-0.009576	0.008280	0.025237	0.024587	0.035156	0.047419	0.018072	0.023651	0.090427
greaterThan15?	-0.003718	0.188506	0.420265	0.412152	0.608300	0.053106	0.145569	-0.012919	0.006098	0.038389	...	0.026258	0.004699	0.000422	0.024729	0.028069	-0.009616	0.029550	0.044013	0.013874	-0.016036
zipcode_98002	0.002621	-0.087525	0.035289	0.030896	0.005729	-0.012829	0.006593	-0.003793	-0.016214	0.011635	...	-0.007319	-0.003426	-0.011172	-0.007814	-0.005898	-0.007890	-0.006920	-0.004972	-0.006985	-0.008262
zipcode_98003	0.002595	-0.090122	0.018842	0.028055	0.028719	-0.008497	-0.017116	-0.005440	0.024972	-0.007588	...	-0.010496	-0.004914	-0.016023	-0.011206	-0.008459	-0.011316	-0.009925	-0.007130	-0.010017	-0.011849
zipcode_98004	-0.003893	0.116797	0.000848	-0.036770	-0.020377	-0.008030	-0.044550	-0.003923	-0.015661	0.020667	...	-0.007568	-0.003543	-0.011553	-0.008080	-0.006100	-0.008159	-0.007156	-0.005141	-0.007223	-0.008544
zipcode_98005	0.021594	0.104474	0.034138	0.016470	0.032532	-0.001319	-0.050190	-0.004459	-0.015399	0.045693	...	-0.008604	-0.004028	-0.013134	-0.009185	-0.006934	-0.009275	-0.008135	-0.005845	-0.008211	-0.009713
zipcode_98006	-0.007427	0.123464	0.049871	0.028312	0.058826	-0.010633	-0.035935	-0.007395	-0.007814	0.077435	...	-0.014267	-0.006679	-0.021779	-0.015231	-0.011498	-0.015380	-0.013490	-0.009692	-0.013615	-0.016106
zipcode_98007	-0.004238	0.043006	0.036637	-0.001625	0.003947	-0.011379	-0.022188	-0.004552	-0.020423	0.030854	...	-0.008782	-0.004112	-0.013407	-0.009376	-0.007078	-0.009468	-0.008304	-0.005966	-0.008381	-0.009914
zipcode_98008	-0.008666	0.047054	0.047100	-0.016946	-0.007176	-0.016273	-0.080471	-0.006539	0.010186	0.051988	...	-0.012617	-0.005907	-0.019260	-0.013470	-0.010168	-0.013601	-0.011929	-0.008571	-0.012040	-0.014243
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
zipcode_98092	-0.031101	-0.097740	0.034232	0.052894	0.064362	0.070696	0.046144	-0.006893	-0.001079	-0.033526	...	-0.013300	-0.006227	-0.020303	-0.014199	-0.010719	-0.014338	-0.012575	-0.009035	-0.012692	-0.015014
zipcode_98102	-0.013103	0.053960	-0.026006	-0.000481	-0.034964	-0.019497	0.051789	-0.003401	-0.009529	-0.020131	...	-0.006562	-0.003072	-0.010017	-0.007005	-0.005288	-0.007074	-0.006204	-0.004457	-0.006262	-0.007407
zipcode_98103	-0.001394	0.077554	-0.072693	-0.060086	-0.108644	-0.052070	0.109134	-0.009741	-0.005990	0.021563	...	-0.018794	-0.008799	-0.028691	-0.020065	-0.015147	-0.020261	-0.017771	-0.012767	-0.017936	-0.021217
zipcode_98105	0.017373	0.072923	-0.005229	-0.034625	-0.043008	-0.025463	0.002599	-0.005057	-0.020157	0.031064	...	-0.009756	-0.004568	-0.014894	-0.010416	-0.007863	-0.010518	-0.009225	-0.006628	-0.009311	-0.011014
zipcode_98106	-0.011359	-0.103079	-0.016730	-0.028195	-0.070874	-0.029427	-0.019122	-0.006744	-0.014695	-0.020534	...	-0.013011	-0.006091	-0.019862	-0.013891	-0.010486	-0.014027	-0.012302	-0.008839	-0.012417	-0.014688
zipcode_98107	-0.056313	0.050207	-0.057059	-0.017303	-0.081890	-0.035718	0.076347	-0.006539	0.003198	0.007691	...	-0.012617	-0.005907	-0.019260	-0.013470	-0.010168	-0.013601	-0.011929	-0.008571	-0.012040	-0.014243
zipcode_98108	0.007762	-0.061715	-0.004553	-0.010971	-0.028309	-0.023039	-0.021710	-0.005204	-0.008410	0.006168	...	-0.010040	-0.004700	-0.015327	-0.010719	-0.008092	-0.010824	-0.009493	-0.006820	-0.009581	-0.011334
zipcode_98109	-0.023787	0.066939	-0.029023	-0.021188	-0.034565	-0.018502	0.010736	-0.003401	0.014101	-0.008707	...	-0.006562	-0.003072	-0.010017	-0.007005	-0.005288	-0.007074	-0.006204	-0.004457	-0.006262	-0.007407
zipcode_98112	0.039650	0.091588	-0.031629	-0.030295	-0.027822	-0.024244	0.021511	-0.004607	-0.016345	-0.012036	...	-0.008888	-0.004161	-0.013568	-0.009489	-0.007163	-0.009582	-0.008404	-0.006038	-0.008482	-0.010033
zipcode_98115	0.047026	0.084607	-0.044090	-0.085640	-0.073873	-0.041937	-0.040578	-0.009402	-0.014940	0.029639	...	-0.018140	-0.008493	-0.027692	-0.019367	-0.014620	-0.019556	-0.017152	-0.012323	-0.017312	-0.020478
zipcode_98116	0.027202	0.055004	-0.050627	-0.041520	-0.058379	-0.033290	-0.003224	-0.006955	0.045148	0.024048	...	-0.013418	-0.006282	-0.020484	-0.014325	-0.010814	-0.014465	-0.012687	-0.009115	-0.012805	-0.015148
zipcode_98117	-0.020878	0.064132	-0.080612	-0.088710	-0.098295	-0.045095	-0.023945	-0.009328	-0.014330	0.040663	...	-0.017996	-0.008425	-0.027472	-0.019213	-0.014504	-0.019401	-0.017016	-0.012225	-0.017175	-0.020316
zipcode_98118	-0.015644	-0.052790	-0.027328	-0.067726	-0.060287	-0.036521	-0.042249	-0.001331	0.015522	0.006379	...	-0.016219	-0.007593	-0.024759	-0.017315	-0.013071	-0.017485	-0.015335	-0.011018	-0.015478	-0.018309
zipcode_98119	-0.007253	0.088882	-0.037944	-0.013067	-0.038978	-0.025313	0.030463	-0.004661	0.012511	-0.012112	...	-0.008992	-0.004210	-0.013727	-0.009600	-0.007247	-0.009694	-0.008503	-0.006109	-0.008582	-0.010151
zipcode_98122	0.071459	0.052194	-0.035553	-0.018286	-0.059711	-0.035012	0.055085	-0.006500	-0.012020	-0.031378	...	-0.012541	-0.005872	-0.019145	-0.013389	-0.010108	-0.013520	-0.011858	-0.008520	-0.011969	-0.014158
zipcode_98125	0.014865	-0.033516	-0.025351	-0.063729	-0.060496	-0.027721	-0.037010	-0.008082	0.003497	-0.011473	...	-0.015593	-0.007300	-0.023804	-0.016647	-0.012567	-0.016810	-0.014744	-0.010593	-0.014881	-0.017603
zipcode_98126	0.029264	-0.040090	-0.071494	-0.078276	-0.082130	-0.033167	-0.036542	-0.007348	0.050861	0.022639	...	-0.014177	-0.006637	-0.021642	-0.015136	-0.011426	-0.015284	-0.013405	-0.009631	-0.013530	-0.016005
zipcode_98133	-0.002722	-0.087903	-0.032849	-0.082177	-0.081896	-0.031615	-0.028574	-0.008907	-0.038521	0.040608	...	-0.017185	-0.008045	-0.026233	-0.018347	-0.013850	-0.018526	-0.016249	-0.011674	-0.016400	-0.019400
zipcode_98136	0.003114	0.018865	-0.061939	-0.053006	-0.058694	-0.027710	-0.008444	0.002906	0.061794	-0.001649	...	-0.012262	-0.005741	-0.018718	-0.013091	-0.009882	-0.013219	-0.011594	-0.008330	-0.011702	-0.013842
zipcode_98144	-0.011139	0.000104	-0.033298	-0.032471	-0.055552	-0.036940	0.028072	-0.007016	0.012985	0.015424	...	-0.013536	-0.006337	-0.020663	-0.014451	-0.010909	-0.014592	-0.012799	-0.009195	-0.012918	-0.015280
zipcode_98146	0.001460	-0.047665	-0.007079	-0.039758	-0.027206	-0.015421	-0.042592	0.026583	0.045507	-0.000580	...	1.000000	-0.004956	-0.016159	-0.011301	-0.008531	-0.011412	-0.010009	-0.007191	-0.010102	-0.011950
zipcode_98148	0.019428	-0.047400	-0.004261	-0.004273	-0.013488	-0.006879	-0.020051	-0.002569	-0.012774	-0.022754	...	-0.004956	1.000000	-0.007565	-0.005291	-0.003994	-0.005343	-0.004686	-0.003367	-0.004729	-0.005595
zipcode_98155	0.004035	-0.073215	-0.002393	-0.064857	-0.048314	-0.017788	-0.084405	-0.008375	-0.010439	0.022500	...	-0.016159	-0.007565	1.000000	-0.017252	-0.013023	-0.017420	-0.015279	-0.010977	-0.015421	-0.018242
zipcode_98166	-0.020824	-0.031079	0.019327	-0.026581	0.013262	-0.003559	-0.052245	0.044280	0.067675	0.036199	...	-0.011301	-0.005291	-0.017252	1.000000	-0.009108	-0.012183	-0.010686	-0.007677	-0.010785	-0.012758
zipcode_98168	-0.037117	-0.091548	-0.000704	-0.040804	-0.019172	-0.003929	-0.037037	-0.004422	-0.017433	-0.014719	...	-0.008531	-0.003994	-0.013023	-0.009108	1.000000	-0.009197	-0.008067	-0.005795	-0.008142	-0.009631
zipcode_98177	-0.023158	0.029505	-0.005586	-0.027797	0.003335	-0.014203	-0.056613	-0.005915	0.048496	0.010520	...	-0.011412	-0.005343	-0.017420	-0.012183	-0.009197	1.000000	-0.010790	-0.007752	-0.010890	-0.012882
zipcode_98178	-0.014553	-0.070786	0.022182	-0.012042	0.002903	-0.015312	-0.047729	0.073863	0.102690	0.000918	...	-0.010009	-0.004686	-0.015279	-0.010686	-0.008067	-0.010790	1.000000	-0.006799	-0.009552	-0.011299
zipcode_98188	0.004211	-0.060059	0.024950	0.013796	0.014412	-0.007046	-0.024656	-0.003727	0.008440	-0.006151	...	-0.007191	-0.003367	-0.010977	-0.007677	-0.005795	-0.007752	-0.006799	1.000000	-0.006862	-0.008118
zipcode_98198	0.020111	-0.075009	0.004861	-0.009957	0.001799	-0.007937	-0.036357	0.084290	0.115757	0.009113	...	-0.010102	-0.004729	-0.015421	-0.010785	-0.008142	-0.010890	-0.009552	-0.006862	1.000000	-0.011404
zipcode_98199	-0.036516	0.088184	-0.048317	-0.036033	-0.041607	-0.028812	-0.029076	-0.006194	0.010946	0.016368	...	-0.011950	-0.005595	-0.018242	-0.012758	-0.009631	-0.012882	-0.011299	-0.008118	-0.011404	1.000000

	const	bedrooms	bathrooms	sqft_living	sqft_lot	floors	condition	grade	...	zipcode_98198
1	1.0	3	2250.0	2570	7242	2.0	3	7	...	0
3	1.0	4	3000.0	1960	5000	1.0	5	7	...	0
4	1.0	3	2000.0	1680	8080	1.0	3	8	...	0
6	1.0	3	2250.0	1715	6819	2.0	3	7	...	0
7	1.0	3	1500.0	1060	9711	1.0	3	7	...	1

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	-8.507e+06	3.6e+06	-2.366	0.018	-1.56e+07	-1.46e+06
bedrooms	-2083.3047	827.108	-2.519	0.012	-3704.520	-462.090
bathrooms	10.1584	1.446	7.025	0.000	7.324	12.993
sqft_living	45.6760	0.879	51.982	0.000	43.954	47.398
sqft_lot	0.2843	0.022	13.073	0.000	0.242	0.327
floors	-1.309e+04	1693.136	-7.734	0.000	-1.64e+04	-9775.592
waterfront	1.312e+05	1.19e+04	11.028	0.000	1.08e+05	1.55e+05
view	2.291e+04	1058.149	21.649	0.000	2.08e+04	2.5e+04
condition	2.125e+04	1045.685	20.326	0.000	1.92e+04	2.33e+04
grade	3.823e+04	1010.051	37.854	0.000	3.63e+04	4.02e+04
sqft_above	50.1094	1.422	35.248	0.000	47.323	52.896
sqft_basement	-4.4334	1.853	-2.393	0.017	-8.065	-0.802
yr_built	-487.7769	35.268	-13.831	0.000	-556.906	-418.648
yr_renovated	1587.6713	190.029	8.355	0.000	1215.196	1960.147
lat	1.724e+05	3.37e+04	5.116	0.000	1.06e+05	2.38e+05
long	-8172.9289	2.77e+04	-0.295	0.768	-6.25e+04	4.61e+04
sqft_living15	42.5552	1.015	41.939	0.000	40.566	44.544
sqft_lot15	0.0448	0.034	1.326	0.185	-0.021	0.111
renovated?	-3.143e+06	3.79e+05	-8.290	0.000	-3.89e+06	-2.4e+06
basement?	1.282e+04	2333.907	5.494	0.000	8247.800	1.74e+04
greaterThan15?	3.1207	0.888	3.513	0.000	1.380	4.862
zipcode_98002	-1.039e+04	9807.321	-1.060	0.289	-2.96e+04	8832.455
zipcode_98003	-8718.9095	7761.213	-1.123	0.261	-2.39e+04	6493.860
zipcode_98004	3.882e+05	1.41e+04	27.529	0.000	3.61e+05	4.16e+05
zipcode_98005	2.593e+05	1.36e+04	19.051	0.000	2.33e+05	2.86e+05
zipcode_98006	2.04e+05	1.14e+04	17.839	0.000	1.82e+05	2.26e+05
zipcode_98007	1.87e+05	1.39e+04	13.481	0.000	1.6e+05	2.14e+05
zipcode_98008	1.793e+05	1.34e+04	13.360	0.000	1.53e+05	2.06e+05
zipcode_98010	9.44e+04	1.27e+04	7.443	0.000	6.95e+04	1.19e+05
zipcode_98011	6.544e+04	1.69e+04	3.862	0.000	3.22e+04	9.86e+04
zipcode_98014	5.802e+04	1.9e+04	3.051	0.002	2.07e+04	9.53e+04
zipcode_98019	2.686e+04	1.9e+04	1.413	0.158	-1.04e+04	6.41e+04
zipcode_98022	1.243e+04	1.13e+04	1.102	0.270	-9682.776	3.46e+04
zipcode_98023	-3.737e+04	7203.376	-5.188	0.000	-5.15e+04	-2.33e+04
zipcode_98024	9.755e+04	1.74e+04	5.622	0.000	6.35e+04	1.32e+05
zipcode_98027	1.533e+05	1.18e+04	12.947	0.000	1.3e+05	1.76e+05
zipcode_98028	5.627e+04	1.65e+04	3.415	0.001	2.4e+04	8.86e+04
zipcode_98029	1.861e+05	1.35e+04	13.761	0.000	1.6e+05	2.13e+05
zipcode_98030	-4596.7442	8093.903	-0.568	0.570	-2.05e+04	1.13e+04

Omnibus:	943.097	Durbin-Watson:	1.993
Prob(Omnibus):	0.000	Jarque-Bera (JB):	2771.542
Skew:	0.262	Prob(JB):	0.00
Kurtosis:	4.891	Cond. No.	1.34e+17

Continuous Variable Algorithms - Class Notes