The math theories themselves are not that hard to learn; tis the
In addition to the gradient descent method mentioned on 30th June, the linear regression model actually has a formula solution.
Back in the past section, we used the mean square error (MSE) to produce the equation L(y, ŷ) = 1/n Σ(y_i - ŷ_i)^2. It calculates the average squared difference between predicted values (ŷ_i) and actual target values (y_i) across n data points.
The least squares method aims to find the model parameters (θ) that minimize the overall MSE. It seeks the configuration of parameters that leads to the smallest possible sum of squared differences between predictions and actual values.
Averaging the squared errors (1/n) does not affect finding the minimum point of the loss function. This is because we are mathematically looking for the configuration that minimizes the error, regardless of the scale of the total error. The un-averaged term in the MSE equation Σ(y_i - ŷ_i)^2 is also named the residual sum of squares (RSS). This represents the total squared difference between the predicted values and the actual target values. It is another way to measure the overall error between the model's predictions and the data.
Linear regression is a powerful tool for modeling relationships between variables. However, to ensure reliable results, it is crucial to consider some underlying assumptions:
Linear Relationship: the core assumption that the relationship between the independent variable (X) and the dependent variable (Y) is linear. In other words, the expected change in Y is constant for a unit change in X.
Independence of Errors: errors (residuals, the difference between actual and predicted values) for each data point should be independent of each other. This means the error for one data point doesn't influence the error for another.
Homoscedasticity: variance of the errors (how spread out they are) should be constant across all levels of X. In simpler terms, the spread of the data points around the regression line should be consistent regardless of the X value.
No Autocorrelation: errors should not be correlated with each other. This means the error for one data point doesn't have a predictable relationship with the errors of previous or subsequent data points.
Normality of Errors: while not strictly necessary, assuming the errors are normally distributed can be beneficial for certain statistical tests used in linear regression analysis.
It is important to understand that these assumptions do not necessarily have to be perfectly met in every situation. However, substantial violations can lead to inaccurate or misleading results.
Linear regression relies on certain assumptions about the error terms (the difference between predicted and actual values). One of them is that the errors are normally distributed, also known as a Gaussian distribution. This is visualized in the plot from Wikipedia below, which shows a bell-shaped curve.
In a normal distribution, the most frequent errors fall around the average error (mean), with fewer errors occurring farther away in either direction. The standard deviation determines the spread of the distribution – a larger standard deviation indicates a wider spread of errors. Assuming normality simplifies the statistical analysis behind linear regression. Many common statistical tests used in regression rely on the normality of errors for accurate p-values and confidence intervals.
Perfect normality of errors is not always essential for linear regression to produce useful results, but substantial deviations from normality can affect the reliability of statistical tests used in the analysis.
The formula for normal distribution is P(x; μ, σ) = (1 / √(2π)) * e^(-[x - μ]^2 / 2σ^2]). P(x; μ, σ) represents the probability of a value x occurring in the distribution. μ (mu) is the mean of the distribution, indicating the center of the bell-shaped curve. σ (sigma) is the standard deviation, which controls the spread of the distribution.
The first formula looks complicated overall, though it can be simplified by zeroing the mean and setting the standard deviation to 1, creating the formula P(x; 0, 1) = (1 / √(2π)) * e^(-x^2 / 2).
Linear regression makes predictions by fitting a straight line to the data points. The line in the plot below minimizes the squared difference between the predicted values and the actual target values (y-axis).
Linear regression assumes that the errors are normally distributed around zero. This means most errors will be close to zero, with fewer errors farther away in either direction. The plot below shows a bell-shaped curve, which is a typical visualization of a normal distribution.
Linear regression aims to find the line that minimizes the total squared errors, not necessarily the average error.
The assumption of normality is important because it allows us to use statistical tests to assess the significance of the model's parameters and the overall fit of the line. These tests rely on the normal distribution of errors to provide accurate p-values and confidence intervals.