Machine Learning Algorithm — Linear Regression

Chetna Shahi
5 min readOct 13, 2021

--

ML is the process of training model on the best possible parameters to show relationship between few features and target. There are three components to ML Model:

  1. Model : How to model the relationship between target and features. eg: linear equation, weighted sum, decision tree etc.
  2. Cost Function : To understand how good model parameters are. It gives how good/bad model is performing. Cost function can be computed using target and prediction.
  3. Optimizer : Optimize the model so that it better fits the data. Change the parameters of the model to optimize the cost so that it better fits data.

Convert categorical columns to numerical:

  1. Use 0/1 when the column has binary value
  2. Use one-hot encoding function to convert to numerical when column has more than two categorical values
  3. If column has natural order (eg: first rank, second rank …), then it can be converted to numbers (1,2,3….) preserving the order. Its called ordinals.

Linear Regression:

It finds linear relationship between target and one or more input features.

y=wx +b , where w & b are parameters of model. x is the input feature and y is the target feature. b is the bias and w is the weight.

We keep modifying w & b till the time we find best fit line. Best fit line has the least possible error (distance between points and regression line)

Assumptions of Linear Model:

  1. Linearity: The relationship between X and the mean of Y is linear.
  2. Homoscedasticity: The variance of residual is the same for any value of X.
  3. Independence: Observations are independent of each other.
  4. Normality: For any fixed value of X, Y is normally distributed.
  5. No Auto-Correlation : Autocorrelation occurs when the residuals are not independent from each other

Correlation indicates strength (The greater the absolute value of the correlation coefficient, the stronger the relationship) and direction (The sign of the correlation coefficient represents the direction of the relationship) of relationship between two or more variables. Its value ranges from -1 to +1

Image Courtesy: Jovian

Causation: Causation indicates that one event is the result of the occurrence of the other event. If two variables are correlated, it doesn’t always imply that one variable is causing other variable or vice-versa. Example smoking and alcoholism is directly correlated but it doesn’t imply one is causing other.

Loss/Cost Function:

Difference between target and predictions gives the error(RMSE). It is squared to make sure positives and negatives don’t cancel out each other.

Optimizer:

We need to modify ‘w’ & ‘b’ to minimize loss and find the best fit line. There are two methods to achieve this:

Ordinary Least Squares (OLS) : It computes best parameter values using matrix operations. Best suited when we have smaller datasets.

Stochastic Gradient Descent: It computes best parameter values using iterative operations by starting with random values of ‘w’ & ‘b’ and slowly improving using derivatives. Best suited when we have larger datasets.

Metrics for Model Evaluation:

R-squared Value/Adjusted R-squared : It ranges from 0 to 1. 1 being minimal error between target and input and 0 being maximum error. Its a relative measure to define how well the model fits between prediction and actual value. But it doesn’t take into consideration of over-fitting problems.

When there are multiple independent variables, it can behave too well with training set and perform poorly in test dataset. Hence, adjusted R-square is considered because it penalizes for the additional independent variables and adjust metric to prevent overfitting.

Mean Square Error (MSE)/Root Mean Square Error (RMSE): Its an absolute measure to define the goodness for the fit i.e. how much does the predicted results deviate from actual value. MSE value is too big sometimes, hence we check RMSE value. Also, RMSE makes interpretation easier as it takes square root and brings it to the same level of prediction error.

Mean Absolute Error (MAE) : Its same as MSE except that it takes absolute error value. MSE gives larger penalization to big prediction error by square it while MAE treats all errors the same.

R-squared Value/Adjusted R-squared is used to explain the model to others because it explains the output variability in percentage format. While MSE/RMSE/MAE are used to compare different regression models performance. If the value is not too big, you can use MSE else use MAE to penalize large prediction errors.

Model Improvements:

Feature Scaling: When we have to justify the predictions of our model, we will compare the importance of all features in model, our first instinct would be to compare weight of all features. But if we go by that , range of values of all our features is not same, for few it could be 0–100, others it could be 10000–20000. Hence, it won’t be right to compare the weights of different column to identify which features are important. Therefore, we scale our features in the range of 0 to 1 by using below standardization formula:

Null Hypothesis and p-value:

Null hypothesis is the initial claim that is specified using previous research or knowledge.

Low P-value: Rejects null hypothesis

High P-value: Changes in predictor are not associated with change in target

Over-fitting & Under-fitting:

Over-fitting is when model learns so much from training dataset that it learns from noise also. It doesn’t categorize data correctly. Training data has very minimal error but test data shows higher error rate. It can be avoided by using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees.

Under-fitting is when it cannot capture the underlying pattern in data. It usually happens when we have less data to train the model. It can be avoided by taking more data and reducing features by feature selection.

--

--

No responses yet