Skip to content

Latest commit

 

History

History
38 lines (20 loc) · 2.86 KB

README.md

File metadata and controls

38 lines (20 loc) · 2.86 KB

Line of Best Fit

Lately, I have been studying machine learning and specifically looking at linear regression. In linear regression, part of the model-building procedure is to obtain the line of best fit for the correlation between the features and the labels.

I thought I'd build a simple program which reads data from a .csv file and calculates the equation of the line of best fit for the correlation of the data and uses this equation to plot the line.

The data for this program has been obtained from here.

Let X = the independant variable and y = the dependant variable
The following are the formulas I used to help me calculate the equation of the line of best fit and the value of R-squared:

Standard Deviation

Correlation Coefficient

After obtaining the line of best fit, the "closeness" of this line to the actual data can be determined with the coefficient of determination. The following is the formula of this value:

Coefficient of Determination (R-Squared)

Our final equation should be in the form where and

Assuming that we have obtained the standard deviation (sigma) of X as well as y and the correlation coefficient (r), we can calculate the values of b0 and b1 as follows:

The following is the output of the program when ran against the data from the 'data.csv' file:

The title of the plot is the obtained equation of the line of best fit, as well as the obtained value of R-squared.