Simple Linear Regression

Data Science

Machine Learning

Artificial Intelligence

Author

Rafiq Islam

Published

August 29, 2024

Simple Linear Regression

A simple linear regression in multiple predictors/input variables/features/independent variables/ explanatory variables/regressors/ covariates (many names) often takes the form

\[ y=f(\mathbf{x})+\epsilon =\mathbf{\beta}\mathbf{x}+\epsilon \]

where $\mathbf{\beta} \in \mathbb{R}^d$ are regression parameters or constant values that we aim to estimate and $\epsilon \sim \mathcal{N}(0,1)$ is a normally distributed error term independent of $x$ or also called the white noise.

In this case, the model:

\[ y=f(x)+\epsilon=\beta_0+\beta_1 x+\epsilon \]

Therefore, in our model we need to estimate the parameters $\beta_0,\beta_1$. The true relationship between the explanatory variables and the dependent variable is $y=f(x)$. But our model is $y=f(x)+\epsilon$. Here, this $f(x)$ is the working model with the data. In other words, $\hat{y}=f(x)=\hat{\beta}_0+\hat{\beta}_1 x$. Therefore, there should be some error in the model prediction which we are calling $\epsilon=\|y-\hat{y}\|$ where $y$ is the true value and $\hat{y}$ is the predicted value. This error term is normally distributed with mean 0 and variance 1. To get the best estimate of the parameters $\beta_0,\beta_1$ we can minimize the error term as much as possible. So, we define the residual sum of squares (RSS) as:

\[\begin{align} RSS &=\epsilon_1^2+\epsilon_2^2+\cdots+\epsilon_{10}^2\\ &= \sum_{i=1}^{10}(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)^2\\ \hat{\mathcal{l}}(\bar{\beta})&=\sum_{i=1}^{10}(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)^2\\ \end{align}\]

Using multivariate calculus we see

\[\begin{align} \frac{\partial l}{\partial \beta_0}&=\sum_{i=1}^{10} 2(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)(-1)\\ \frac{\partial l}{\partial \beta_1}&= \sum_{i=1}^{10} 2(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)(-x_i) \end{align}\]

Setting the partial derivatives to zero we solve for $\hat{\beta_0},\hat{\beta_1}$ as follows

\[\begin{align*} \frac{\partial l}{\partial \beta_0}&=0\\ \implies \sum_{i=1}^{10} y_i-10 \hat{\beta_0}-\hat{\beta_1}\left(\sum_{i=1}^{10} x_i\right)&=0\\ \implies \hat{\beta_0}&=\bar{y}-\hat{\beta_1}\bar{x} \end{align*}\]

and,

\[\begin{align*} \frac{\partial l}{\partial \beta_1}&=0\\ \implies \sum_{i=1}^{10} 2(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)(-x_i)&=0\\ \implies \sum_{i=1}^{10} (y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)(x_i)&=0\\ \implies \sum_{i=1}^{10} x_iy_i-\hat{\beta_0}\left(\sum_{i=1}^{10} x_i\right)-\hat{\beta_1}\left(\sum_{i=1}^{10} x_i^2\right)&=0\\ \implies \sum_{i=1}^{10} x_iy_i-\left(\bar{y}-\hat{\beta_1}\bar{x}\right)\left(\sum_{i=1}^{10} x_i\right)-\hat{\beta_1}\left(\sum_{i=1}^{10} x_i^2\right)&=0\\ \implies \sum_{i=1}^{10} x_iy_i-\bar{y}\left(\sum_{i=1}^{10} x_i\right)+\hat{\beta_1}\bar{x}\left(\sum_{i=1}^{10} x_i\right)-\hat{\beta_1}\left(\sum_{i=1}^{10} x_i^2\right)&=0\\ \implies \sum_{i=1}^{10} x_iy_i-\bar{y}\left(\sum_{i=1}^{10} x_i\right) -\hat{\beta_1}\left(\sum_{i=1}^{10}x_i^2-\bar{x}\sum_{i=1}^{10}x_i\right)&=0\\ \implies \sum_{i=1}^{10} x_iy_i-\bar{y}\left(\sum_{i=1}^{10} x_i\right) -\hat{\beta_1}\left(\sum_{i=1}^{10}x_i^2-10\bar{x}^2\right)&=0\\ \implies \sum_{i=1}^{10} x_iy_i-\bar{y}\left(\sum_{i=1}^{10} x_i\right) -\hat{\beta_1}\left(\sum_{i=1}^{10}x_i^2-2\times 10\times \bar{x}^2+10\bar{x}^2\right)&=0\\ \implies \hat{\beta_1}&=\frac{\sum_{i=1}^{10} x_iy_i-10\bar{x}\bar{y}}{\sum_{i=1}^{10}x_i^2-2\times 10\times \bar{x}^2+10\bar{x}^2}\\ \implies \hat{\beta_1}&=\frac{\sum_{i=1}^{10} x_iy_i -10\bar{x}\bar{y}-10\bar{x}\bar{y}+10\bar{x}\bar{y}}{\sum_{i=1}^{10}x_i^2-2\bar{x}\times 10\times\frac{1}{10}\sum_{i=1}^{10}x_i +10\bar{x}^2}\\ \implies \hat{\beta_1}&=\frac{\sum_{i=1}^{10} x_iy_i-\bar{y}\left(\sum_{i=1}^{10} x_i\right)-\bar{x}\left(\sum_{i=1}^{10} y_i\right)+10\bar{x}\bar{y}}{\sum_{i=1}^{10}(x_i-\bar{x})^2}\\ \implies \hat{\beta_1}&=\frac{\sum_{i=1}^{10}\left(x_iy_i-x_i\bar{y}-\bar{x}y_i+\bar{x}\bar{y}\right)}{\sum_{i=1}^{10}(x_i-\bar{x})^2}\\ \implies \hat{\beta_1}&=\frac{\sum_{i=1}^{10}(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{10}(x_i-\bar{x})^2}\\ \end{align*}\]

Therefore, we have the following

\[\begin{align*} \hat{\beta_0}&=\bar{y}-\hat{\beta_1}\bar{x}\\ \hat{\beta_1}&=\frac{\sum_{i=1}^{10}(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{10}(x_i-\bar{x})^2} \end{align*}\]

Simple Linear Regression slr is applicable for a single feature data set with contineous response variable.

import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.linear_model import LinearRegression

Assumptions of Linear Regressions

Linearity: The relationship between the feature set and the target variable has to be linear.
Homoscedasticity: The variance of the residuals has to be constant.
Independence: All the observations are independent of each other.
Normality: The distribution of the dependent variable $y$ has to be normal.

Synthetic Data

To implement the algorithm, we need some synthetic data. To generate the synthetic data we use the linear equation $y(x)=2x+\frac{1}{2}+\xi$ where $\xi\sim \mathbf{N}(0,1)$

X=np.random.random(100)
y=2*X+0.5+np.random.randn(100)

Note that we used two random number generators, np.random.random(n) and np.random.randn(n). The first one generates $n$ random numbers of values from the range (0,1) and the second one generates values from the standard normal distribution with mean 0 and variance or standard deviation 1.

plt.figure(figsize=(9,6))
plt.scatter(X,y)
plt.xlabel('$X$')
plt.ylabel('y')
plt.gca().set_facecolor('#f4f4f4') 
plt.gcf().patch.set_facecolor('#f4f4f4')
plt.show()

Model

We want to fit a simple linear regression to the above data.

slr=LinearRegression()

Now to fit our data $X$ and $y$ we need to reshape the input variable. Because if we look at $X$,

array([0.81400473, 0.37056686, 0.3785916 , 0.39254204, 0.08438379,
       0.91767596, 0.17031026, 0.05427965, 0.92147118, 0.1091097 ,
       0.77815531, 0.6586457 , 0.80038458, 0.89487252, 0.06435121,
       0.47082908, 0.54951467, 0.36477879, 0.86062606, 0.84858102,
       0.84257349, 0.76229658, 0.01022782, 0.76934706, 0.03716226,
       0.61438914, 0.06960816, 0.44744438, 0.65576014, 0.67055268,
       0.88231782, 0.90820554, 0.68833212, 0.73260165, 0.59921735,
       0.78194526, 0.12798521, 0.52169228, 0.97397878, 0.4915905 ,
       0.20474316, 0.88064237, 0.01104265, 0.15743068, 0.80807495,
       0.94646331, 0.75292736, 0.03419   , 0.26182547, 0.06806625,
       0.64304272, 0.34828412, 0.68903675, 0.83106131, 0.2946132 ,
       0.90446306, 0.8115116 , 0.56989448, 0.72749933, 0.38329105,
       0.07827917, 0.74477899, 0.15516748, 0.13177768, 0.0425292 ,
       0.40637781, 0.44359778, 0.70858874, 0.92635162, 0.18362159,
       0.81510803, 0.06532338, 0.72463854, 0.959555  , 0.02275863,
       0.77542088, 0.78289199, 0.03841737, 0.24167029, 0.02407059,
       0.11899303, 0.46489333, 0.19338197, 0.42489796, 0.47560875,
       0.93274731, 0.66347626, 0.92896714, 0.60293678, 0.54760662,
       0.17854231, 0.7804145 , 0.61608698, 0.15111843, 0.09122827,
       0.79446916, 0.38579831, 0.84902519, 0.63008368, 0.30184473])

It is a one-dimensional array/vector but the slr object accepts input variable as matrix or two-dimensional format.

X=X.reshape(-1,1)
X[:10]

array([[0.81400473],
       [0.37056686],
       [0.3785916 ],
       [0.39254204],
       [0.08438379],
       [0.91767596],
       [0.17031026],
       [0.05427965],
       [0.92147118],
       [0.1091097 ]])

Now we fit the data to our model

slr.fit(X,y)
slr.predict([[2],[3]])

array([4.60629401, 6.65682399])

We have our $X=2,3$ and the corresponding $y$ values are from the above cell output, which are pretty close to the model $y=2x+\frac{1}{2}$.

intercept = round(slr.intercept_,4)
slope = slr.coef_

Now our model parameters are: intercept $\beta_0=$ 0.5052 and slope $\beta_1=$ array([2.05052999]).

plt.figure(figsize=(9,6))
plt.scatter(X,y, alpha=0.7,label="Sample Data")
plt.plot(np.linspace(0,1,100),
    slr.predict(np.linspace(0,1,100).reshape(-1,1)),
    'k',
    label='Model $\hat{f}$'
)
plt.plot(np.linspace(0,1,100),
    2*np.linspace(0,1,100)+0.5,
    'r--',
    label='$f$'
)
plt.xlabel('$X$')
plt.ylabel('y')
plt.legend(fontsize=10)
plt.gca().set_facecolor('#f4f4f4') 
plt.gcf().patch.set_facecolor('#f4f4f4')
plt.show()

So the model fits the data almost perfectly.

Up next multiple linear regression.

Share on

You may also like

Citation

BibTeX citation:

@online{islam2024,
  author = {Islam, Rafiq},
  title = {Simple {Linear} {Regression}},
  date = {2024-08-29},
  url = {https://mrislambd.github.io/dsandml/simplelinreg/},
  langid = {en}
}

For attribution, please cite this work as:

Islam, Rafiq. 2024. “Simple Linear Regression.” August 29, 2024. https://mrislambd.github.io/dsandml/simplelinreg/.

--- title: "Simple Linear Regression" date: "2024-08-29" author: Rafiq Islam categories: [Data Science, Machine Learning, Artificial Intelligence] citation: true search: true lightbox: true image: slr.png listing: contents: "/../../dsandml" max-items: 3 type: grid categories: true date-format: full fields: [image, date, title, author, reading-time] format: html: default ipynb: default docx: toc: true adsense: enable-ads: false epub: toc: true adsense: enable-ads: false pdf: toc: true pdf-engine: pdflatex adsense: enable-ads: false number-sections: false colorlinks: true cite-method: biblatex toc-depth: 4 --- # Simple Linear Regression <p style="text-align: justify"> A simple linear regression in multiple predictors/input variables/features/independent variables/ explanatory variables/regressors/ covariates (many names) often takes the form </p> $$ y=f(\mathbf{x})+\epsilon =\mathbf{\beta}\mathbf{x}+\epsilon $$ <p style="text-align: justify"> where $\mathbf{\beta} \in \mathbb{R}^d$ are regression parameters or constant values that we aim to estimate and $\epsilon \sim \mathcal{N}(0,1)$ is a normally distributed error term independent of $x$ or also called the white noise. </p> In this case, the model: $$ y=f(x)+\epsilon=\beta_0+\beta_1 x+\epsilon $$ <p style="text-align: justify"> Therefore, in our model we need to estimate the parameters $\beta_0,\beta_1$. The true relationship between the explanatory variables and the dependent variable is $y=f(x)$. But our model is $y=f(x)+\epsilon$. Here, this $f(x)$ is the working model with the data. In other words, $\hat{y}=f(x)=\hat{\beta}_0+\hat{\beta}_1 x$. Therefore, there should be some error in the model prediction which we are calling $\epsilon=\|y-\hat{y}\|$ where $y$ is the true value and $\hat{y}$ is the predicted value. This error term is normally distributed with mean 0 and variance 1. To get the best estimate of the parameters $\beta_0,\beta_1$ we can minimize the error term as much as possible. So, we define the residual sum of squares (RSS) as: </p> \begin{align} RSS &=\epsilon_1^2+\epsilon_2^2+\cdots+\epsilon_{10}^2\\ &= \sum_{i=1}^{10}(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)^2\\ \hat{\mathcal{l}}(\bar{\beta})&=\sum_{i=1}^{10}(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)^2\\ \end{align} Using multivariate calculus we see \begin{align} \frac{\partial l}{\partial \beta_0}&=\sum_{i=1}^{10} 2(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)(-1)\\ \frac{\partial l}{\partial \beta_1}&= \sum_{i=1}^{10} 2(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)(-x_i) \end{align} Setting the partial derivatives to zero we solve for $\hat{\beta_0},\hat{\beta_1}$ as follows \begin{align*} \frac{\partial l}{\partial \beta_0}&=0\\ \implies \sum_{i=1}^{10} y_i-10 \hat{\beta_0}-\hat{\beta_1}\left(\sum_{i=1}^{10} x_i\right)&=0\\ \implies \hat{\beta_0}&=\bar{y}-\hat{\beta_1}\bar{x} \end{align*} and, \begin{align*} \frac{\partial l}{\partial \beta_1}&=0\\ \implies \sum_{i=1}^{10} 2(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)(-x_i)&=0\\ \implies \sum_{i=1}^{10} (y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)(x_i)&=0\\ \implies \sum_{i=1}^{10} x_iy_i-\hat{\beta_0}\left(\sum_{i=1}^{10} x_i\right)-\hat{\beta_1}\left(\sum_{i=1}^{10} x_i^2\right)&=0\\ \implies \sum_{i=1}^{10} x_iy_i-\left(\bar{y}-\hat{\beta_1}\bar{x}\right)\left(\sum_{i=1}^{10} x_i\right)-\hat{\beta_1}\left(\sum_{i=1}^{10} x_i^2\right)&=0\\ \implies \sum_{i=1}^{10} x_iy_i-\bar{y}\left(\sum_{i=1}^{10} x_i\right)+\hat{\beta_1}\bar{x}\left(\sum_{i=1}^{10} x_i\right)-\hat{\beta_1}\left(\sum_{i=1}^{10} x_i^2\right)&=0\\ \implies \sum_{i=1}^{10} x_iy_i-\bar{y}\left(\sum_{i=1}^{10} x_i\right) -\hat{\beta_1}\left(\sum_{i=1}^{10}x_i^2-\bar{x}\sum_{i=1}^{10}x_i\right)&=0\\ \implies \sum_{i=1}^{10} x_iy_i-\bar{y}\left(\sum_{i=1}^{10} x_i\right) -\hat{\beta_1}\left(\sum_{i=1}^{10}x_i^2-10\bar{x}^2\right)&=0\\ \implies \sum_{i=1}^{10} x_iy_i-\bar{y}\left(\sum_{i=1}^{10} x_i\right) -\hat{\beta_1}\left(\sum_{i=1}^{10}x_i^2-2\times 10\times \bar{x}^2+10\bar{x}^2\right)&=0\\ \implies \hat{\beta_1}&=\frac{\sum_{i=1}^{10} x_iy_i-10\bar{x}\bar{y}}{\sum_{i=1}^{10}x_i^2-2\times 10\times \bar{x}^2+10\bar{x}^2}\\ \implies \hat{\beta_1}&=\frac{\sum_{i=1}^{10} x_iy_i -10\bar{x}\bar{y}-10\bar{x}\bar{y}+10\bar{x}\bar{y}}{\sum_{i=1}^{10}x_i^2-2\bar{x}\times 10\times\frac{1}{10}\sum_{i=1}^{10}x_i +10\bar{x}^2}\\ \implies \hat{\beta_1}&=\frac{\sum_{i=1}^{10} x_iy_i-\bar{y}\left(\sum_{i=1}^{10} x_i\right)-\bar{x}\left(\sum_{i=1}^{10} y_i\right)+10\bar{x}\bar{y}}{\sum_{i=1}^{10}(x_i-\bar{x})^2}\\ \implies \hat{\beta_1}&=\frac{\sum_{i=1}^{10}\left(x_iy_i-x_i\bar{y}-\bar{x}y_i+\bar{x}\bar{y}\right)}{\sum_{i=1}^{10}(x_i-\bar{x})^2}\\ \implies \hat{\beta_1}&=\frac{\sum_{i=1}^{10}(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{10}(x_i-\bar{x})^2}\\ \end{align*} Therefore, we have the following \begin{align*} \hat{\beta_0}&=\bar{y}-\hat{\beta_1}\bar{x}\\ \hat{\beta_1}&=\frac{\sum_{i=1}^{10}(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{10}(x_i-\bar{x})^2} \end{align*} Simple Linear Regression `slr` is applicable for a single feature data set with contineous response variable. ```{python} #| code-fold: false import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression ``` ## Assumptions of Linear Regressions - **Linearity:** The relationship between the feature set and the target variable has to be linear. - **Homoscedasticity:** The variance of the residuals has to be constant. - **Independence:** All the observations are independent of each other. - **Normality:** The distribution of the dependent variable $y$ has to be normal. ## Synthetic Data To implement the algorithm, we need some synthetic data. To generate the synthetic data we use the linear equation $y(x)=2x+\frac{1}{2}+\xi$ where $\xi\sim \mathbf{N}(0,1)$ ```{python} #| code-fold: false X=np.random.random(100) y=2*X+0.5+np.random.randn(100) ``` Note that we used two random number generators, `np.random.random(n)` and `np.random.randn(n)`. The first one generates $n$ random numbers of values from the range (0,1) and the second one generates values from the standard normal distribution with mean 0 and variance or standard deviation 1. ```{python} #| code-fold: false plt.figure(figsize=(9,6)) plt.scatter(X,y) plt.xlabel('$X$') plt.ylabel('y') plt.gca().set_facecolor('#f4f4f4') plt.gcf().patch.set_facecolor('#f4f4f4') plt.show() ``` ## Model We want to fit a simple linear regression to the above data. ```{python} #| code-fold: false slr=LinearRegression() ``` Now to fit our data $X$ and $y$ we need to reshape the input variable. Because if we look at $X$, ```{python} #| code-fold: false #| code-overflow: scroll X ``` It is a one-dimensional array/vector but the `slr` object accepts input variable as matrix or two-dimensional format. ```{python} #| code-fold: false #| code-overflow: wrap X=X.reshape(-1,1) X[:10] ``` Now we fit the data to our model ```{python} #| code-fold: false slr.fit(X,y) slr.predict([[2],[3]]) ``` We have our $X=2,3$ and the corresponding $y$ values are from the above cell output, which are pretty close to the model $y=2x+\frac{1}{2}$. ```{python} #| code-fold: false intercept = round(slr.intercept_,4) slope = slr.coef_ ``` Now our model parameters are: intercept $\beta_0=$ `{python} intercept` and slope $\beta_1=$ `{python} slope`. ```{python} #| code-fold: false plt.figure(figsize=(9,6)) plt.scatter(X,y, alpha=0.7,label="Sample Data") plt.plot(np.linspace(0,1,100), slr.predict(np.linspace(0,1,100).reshape(-1,1)), 'k', label='Model $\hat{f}$' ) plt.plot(np.linspace(0,1,100), 2*np.linspace(0,1,100)+0.5, 'r--', label='$f$' ) plt.xlabel('$X$') plt.ylabel('y') plt.legend(fontsize=10) plt.gca().set_facecolor('#f4f4f4') plt.gcf().patch.set_facecolor('#f4f4f4') plt.show() ``` So the model fits the data almost perfectly. Up next [multiple linear regression](/dsandml/multiplelinreg/index.qmd). **Share on** ::::{.columns} :::{.column width="33%"} <a href="https://www.facebook.com/sharer.php?u=https://mrislambd.github.io/dsandml/simplelinreg/" target="_blank" style="color:#1877F2; text-decoration: none;"> {{< fa brands facebook size=3x >}} </a> ::: :::{.column width="33%"} <a href="https://www.linkedin.com/sharing/share-offsite/?url=https://mrislambd.github.io/dsandml/simplelinreg/" target="_blank" style="color:#0077B5; text-decoration: none;"> {{< fa brands linkedin size=3x >}} </a> ::: :::{.column width="33%"} <a href="https://www.twitter.com/intent/tweet?url=https://mrislambd.github.io/dsandml/simplelinreg/" target="_blank" style="color:#1DA1F2; text-decoration: none;"> {{< fa brands twitter size=3x >}} </a> ::: :::: <script src="https://giscus.app/client.js" data-repo="mrislambd/mrislambd.github.io" data-repo-id="R_kgDOMV8crA" data-category="Announcements" data-category-id="DIC_kwDOMV8crM4CjbQW" data-mapping="pathname" data-strict="0" data-reactions-enabled="1" data-emit-metadata="0" data-input-position="bottom" data-theme="light" data-lang="en" crossorigin="anonymous" async> </script> <div id="fb-root"></div> <script async defer crossorigin="anonymous" src="https://connect.facebook.net/en_US/sdk.js#xfbml=1&version=v20.0"></script> <div class="fb-comments" data-href="https://mrislambd.github.io/dsandml/simplelinreg/" data-width="800" data-numposts="5"></div> **You may also like**

Categories

Simple Linear Regression

Assumptions of Linear Regressions

Synthetic Data

Model

Bayesian Probabilistic Models for Classification

Boosting Algorithm: Adaptive Boosting Method (AdaBoost)

Classification using Naive Bayes algorithm

Citation