Simplified Linear Regression in Python

6 min readAug 10, 2020

In the area of Machine Learning, one of the first algorithms that someone can come across is Linear Regression. In general, Linear Regression lies in category of supervised types of learning algorithms, where we consider a number of X observations, accompanied with the corresponded same number of Y target values, while we will try to model relationships between input and output features.

Any related problems can be categorised in Regression and Classification. In this post I will cover in few simple steps how we can approach and implement Linear Regression in Python where we will try to predict samples with continuous output.

Basic Function

Linear regression is called linear because the simplest model involves a linear combination of the input variables that can be described in a polynomial function

Within the above, Linear Regression comes with a strong assumption of the depended variables which will make us to turn in more complex models such Neural Network in other kind of problems.

Implementation

In this example, a simple model learns to draw a straight line that fits the distributed data. Learning the data is a repetitive process where in every learning cycle, an assessment of the model is made in respect of parameter optimisation used for training and the residual error minimisation for each prediction.

Dataset

Construct a toy dataset with numpy library, and plot a random line.

In order to fit the line on the data, we need to measure the residual error between the line and the dataset, in simplified languaged the distance from the purple line to the actual data.

Loss Function

To quantify the distance, we need a performance metric in other words a Loss/Cost function. This metric will also measure how good or bad our model is doing with learning the data. As our problem is linear regression, meaning that the values we are trying to predict are continuous values, we are going to you use Mean Squared Error (MSE). Of course there are other loss functions that can be used in a linear regression problem such Mean Absolute Error (MAE) or Huber Loss, but for a toy example we can keep it simple.

Calculating the loss function, is the first step to keep track of the model performance. Now our goal is to minimise it, somehow.

Looking closer to the Loss function, we can see that the function is depended to w and b. In order to observe their impact in training process, we will plot the MSE keeping one of the two values constant interchangeably for both w and b.

The above figures depict the loss change between the initial values and the minimum(best) values of b and w.

The minima was found by just calculating the loss from a hardcoded range of values. Although, we need to find a smarter way in order to navigate from the initial position to the best position (lowest minima), optimising simultaneously both b and w.

Gradient Descent

We can tackle this problem using Gradient Descent algorithm.
Gradient descent computes the gradients in each of the coefficients b and w, which is actually the slope at the current position. Given the slope, we know the direction to follow in order to reduce(minimise) the cost function.

Partial derivatives

At each step the weight vector is moved in the direction of the greatest rate of decrease of the error function[1]. In other words we update the previous values of w and b with the new ones, by a defined strategy.

Here, a hyperparameter a for learning rate is introduced, which is a factor that defines how big or small will be the update towards minimizing the error[2]. Very small or high values of learning rate will or might never make the model to converge and reach the best possible minima, while it might need a large number of epochs or it will contantly keep missing the minima accordingly.

After updating with the new values, we then calculate the MSE after the new prediction is made. Τhese steps are parts of an iterative process which continue until the loss function stops decreasing, hope it finds a local minimum until it converges. Each learning cycle is called an epoch and the process is referred as training.

# Make a new prediction
def predict(x):
    return w * x + b

Gradient Descent is usually described as a plateau that we always seek the lowest minima.

Now that we have drilled down the problem we can develop the algorithm that optimises on partial derivatives of w and b.

Optimization/Training process ends when the MSE eventually stops reducing.

Learning Process:

Initialise w,b with random valuesFor a range of epochs:
 - Predict a new line
 - Compute partial derivatives (slope)
 - Update w_new, b_newEvaluate with Loss Function (MSE)

Initialise with random values

# Random initialisation
w = np.random.random()
b = np.random.random()# Compute derivatives
def compute_derivatives(x,y):
   
    dw = 0
    db = 0
    N = len(x)
    for i in range(N):
        x_i = x[i]
        y_i = y[i]
        y_hat = predict(x_i)
        dw += -(2/N) * x_i * (y_i - y_hat)        
        db += -(2/N) * (y_i - y_hat)
    
    return dw,db# Update with new values
def update(x,y, a=0.0002):
    
    dw,db = compute_derivatives(x,y)
    # Update previous w,b
    new_w = w - (a*dw)
    new_b = b - (a*db)
    
    return new_w, new_b

Now that we have formed the training algorithm, let’s put them all in a Linear_Regression class to fully automate the process.

And that’s it. Full implementation of the article can be found in my github repository. In later posts the problem will be approached with Tensorflow’s API as long with a Classification implementation. Feel free to comment for any oversights made.

Many thanks to Thanos Tagaris[3] for the amazing repository and work.

[1] Christopher Bishop, Pattern Recognition and Machine Learning,
Springer 2007

[2] https://machinelearningmastery.com/linear-regression-for-machine-learning/

[3] https://github.com/djib2011