Machine Learning Programming Workshop

2.1E Linear Regression in Machine Learning

Prepared By: Cheong Shiu Hong (FTFNCE)



In [1]:
import numpy as np # Linear Algebra
import pandas as pd # Data Frames
import matplotlib.pyplot as plt # Visualization
from mpl_toolkits.mplot3d import axes3d # 3D Visualization
import ipywidgets as widgets # Interactivity
from IPython.display import display # Display Widgets
In [2]:
%matplotlib notebook


5) Linear Regression with Boston Housing Dataset

return to top

Import Boston Dataset from Sci-Kit Learn Library

In [3]:
import sklearn.datasets as datasets
import time # To Track Time
In [4]:
boston = datasets.load_boston()


Let's Check Out the Dataset

In [5]:
boston.keys()
Out[5]:
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])
In [6]:
print("Feature Names:\n", boston['feature_names'])
Feature Names:
 ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']

Shape of X and Y

In [7]:
boston['data'].shape, boston['target'].shape
Out[7]:
((506, 13), (506,))


Features

In [8]:
df = pd.DataFrame(boston['data'], columns=boston['feature_names'])
df.head()
Out[8]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33

Label

In [9]:
pd.Series(boston['target']).head()
Out[9]:
0    24.0
1    21.6
2    34.7
3    33.4
4    36.2
dtype: float64


In [10]:
X = boston['data'][:450]
Y = boston['target'][:450]
X_val = boston['data'][450:]
Y_val = boston['target'][450:]


Define the Model

In [11]:
def model(theta, X):
    return np.dot(theta, X) # Vectorize the Calculations


Define the Training Algorithm

In [12]:
def train(x, y, learning_rate=3e-6, iterations=1, first=False):
    global theta, prev_theta
    prev_theta = theta
    
    X = np.vstack([np.ones(y.shape[0]), x])

    for _ in range(iterations):
        # Model
        pred = model(theta, X)

        # Calculations for Backpropagation
        error = pred - y
        cost = np.mean(error**2)
        dcost_dtheta = np.mean(X * error, 1) # Calculate Gradients
        theta = theta - (dcost_dtheta * learning_rate) # Gradient Descent
    
    return cost, dcost_dtheta


Instantiate Random Numbers 'Theta' to be Trained

Since there are 13 Features, We will need 14 Parameters (13 Features + 1 Y-Intercept) to be Trained

In [13]:
theta = np.random.randn(X.shape[1]+1)
print(theta.shape)
(14,)


Training The Parameters for 25 x 30000 Iterations

In [14]:
epochs = 25

total_time = time.time()
start = time.time()

for i in range(1, epochs+1):
    lr = 5e-6 if i <= 15 else 2e-6
    cost, dcost_dtheta = train(X.T, Y, learning_rate=lr, iterations=30000)
    print('Epoch {} - Cost: {:.3f}\nTime: {:.2f}s\n'.format(i, cost, time.time()-start))
    start = time.time()

print('Total Time Taken: {:.2f}s'.format(time.time()-total_time))
Epoch 1 - Cost: 46.683
Time: 1.37s

Epoch 2 - Cost: 41.318
Time: 1.39s

Epoch 3 - Cost: 38.236
Time: 1.38s

Epoch 4 - Cost: 35.830
Time: 1.40s

Epoch 5 - Cost: 33.889
Time: 1.40s

Epoch 6 - Cost: 32.318
Time: 1.42s

Epoch 7 - Cost: 31.044
Time: 1.39s

Epoch 8 - Cost: 30.010
Time: 1.36s

Epoch 9 - Cost: 29.172
Time: 1.36s

Epoch 10 - Cost: 28.491
Time: 1.35s

Epoch 11 - Cost: 27.939
Time: 1.37s

Epoch 12 - Cost: 27.490
Time: 1.37s

Epoch 13 - Cost: 27.126
Time: 1.37s

Epoch 14 - Cost: 26.829
Time: 1.36s

Epoch 15 - Cost: 26.589
Time: 1.35s

Epoch 16 - Cost: 26.505
Time: 1.35s

Epoch 17 - Cost: 26.429
Time: 1.35s

Epoch 18 - Cost: 26.358
Time: 1.36s

Epoch 19 - Cost: 26.293
Time: 1.37s

Epoch 20 - Cost: 26.234
Time: 1.36s

Epoch 21 - Cost: 26.179
Time: 1.36s

Epoch 22 - Cost: 26.128
Time: 1.37s

Epoch 23 - Cost: 26.081
Time: 1.37s

Epoch 24 - Cost: 26.038
Time: 1.37s

Epoch 25 - Cost: 25.999
Time: 1.37s

Total Time Taken: 34.29s


Evaluate the Performance of the Model

In [15]:
Xs = np.vstack([np.ones(Y.shape[0]), X.T])
modelpred = model(theta, Xs)
np.mean((modelpred-Y)**2) # Mean Squared Error
Out[15]:
25.9987258139818
In [16]:
Xs = np.vstack([np.ones(Y_val.shape[0]), X_val.T])
modelpred = model(theta, Xs)
np.mean((modelpred-Y_val)**2) # Mean Squared Error
Out[16]:
15.756312117683022


6) Linear Regression with Sci-Kit Learn

return to top

Sci-Kit Learn is a Powerful Python Library that has Many Built-In Machine Learning Algorithms

Import Sklearn's Linear Regression Object from sklearn.linear_model

In [17]:
from sklearn.linear_model import LinearRegression


Instantiate Linear Regression Object

In [18]:
model = LinearRegression()


Fit Model to Data

In [19]:
model.fit(X, Y)
Out[19]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)


Evaluate Score of Fitted Model

In [20]:
model.score(X, Y)
Out[20]:
0.741527173293562
In [21]:
model.score(X_val, Y_val)
Out[21]:
0.37565166960619634


Evaluate MSE of Fitted Model

In [22]:
skpred = model.predict(X)
np.mean((skpred-Y)**2)
Out[22]:
23.33864094108222
In [23]:
skpred = model.predict(X_val)
np.mean((skpred-Y_val)**2)
Out[23]:
11.407003268828058


Bootstrap Aggregating (Bagging)

In [24]:
num_bags = 250
bag_size = 150
In [25]:
bags = []
for i in range(num_bags):
    idx = np.random.choice(np.arange(X.shape[0]), bag_size)
    bags.append([X[idx], Y[idx]])
In [26]:
models = []
for bag in bags:
    models.append(LinearRegression())
    models[-1].fit(bag[0], bag[1])
In [27]:
skpreds = []
for model in models:
    skpreds.append(model.predict(X))
avg_preds = np.array(skpreds).mean(0)
np.mean((avg_preds-Y)**2)
Out[27]:
23.383261612426924
In [28]:
skpreds = []
for model in models:
    skpreds.append(model.predict(X_val))
avg_preds = np.array(skpreds).mean(0)
np.mean((avg_preds-Y_val)**2)
Out[28]:
11.053799301528954


7) What other algorithms can we use?

return to top

Ridge Regression

In [29]:
from sklearn.linear_model import Ridge
In [30]:
ridge_model = Ridge()
In [31]:
ridge_model.fit(X, Y)
Out[31]:
Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)
In [32]:
ridge_pred = ridge_model.predict(X)
np.mean((ridge_pred - Y)**2)
Out[32]:
23.499611434235288

Lasso Regression

In [33]:
from sklearn.linear_model import Lasso
In [34]:
lasso_model = Lasso()
In [35]:
lasso_model.fit(X, Y)
Out[35]:
Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)
In [36]:
lasso_pred = lasso_model.predict(X) 
np.mean((lasso_pred - Y)**2)
Out[36]:
27.85918052610639

Support Vector Machine

In [37]:
from sklearn.svm import SVR
In [38]:
SVM = SVR()
In [39]:
SVM.fit(X, Y)
C:\Users\cheon\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
Out[39]:
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
  gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True,
  tol=0.001, verbose=False)
In [40]:
SVM_pred = SVM.predict(X)
np.mean((SVM_pred - Y)**2)
Out[40]:
76.78542653583092

Decision Trees

In [41]:
from sklearn.tree import DecisionTreeRegressor
In [42]:
dec_tree = DecisionTreeRegressor()
In [43]:
dec_tree.fit(X, Y)
Out[43]:
DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')
In [44]:
dec_tree_pred = dec_tree.predict(X)
np.mean((dec_tree_pred - Y)**2)
Out[44]:
0.0

Ensemble Algorithms

In [45]:
from sklearn.ensemble import GradientBoostingRegressor
In [46]:
GBR = GradientBoostingRegressor()
In [47]:
GBR.fit(X, Y)
Out[47]:
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=100, n_iter_no_change=None, presort='auto',
             random_state=None, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
In [48]:
GBR_pred = GBR.predict(X)
np.mean((GBR_pred - Y)**2)
Out[48]:
1.8658266399537913