Squared Error Cost Function

30 days have passed and I’m still practising ML, however haphazardly. Now I can stop counting and focus on keeping a rhythm.

I’ve gotten as far as linear regression and just learnt about the squared error cost function.

Recap

In a machine learning scenario, we’re given a dataset with m values of corresponding inputs x and outputs y.

The goal is to generate a function f(x) = wx + b that most “fits” (or best predicts) the dataset.

w and b are known as parameters of the function.

If the parameters are guessed correctly, the prediction of the function f(x) will always be close to or equal to the actual output y in the dataset.

In essence, the machine tries to guess what’s going on in the dataset through trial and error.

For a given input x(i), the model tries certain values of w and b, and compares the prediction y-hat(i) to the expected output from the dataset y(i).

By doing this over a number of values m, we can say the model fits the dataset if there is consistently little difference between the prediction and the output.

The model aggregates the errors by squaring them and dividing by the number of values in the dataset. This is the known as the cost function J.

The best combination of parameters generate a cost function that approaches zero.

Summary

Linear regression models intelligence by observing past data.

The better we can correctly predict past data, the better we can predict future data.

With the squared error cost function, the machine tries different parameters for the model and calculates how wrong each guess is.

The right model is the least wrong model.