What is the difference between Linear Regression and Non-Linear Regression?
--
This was a question that I found myself asking recently and in an attempt to fully understand the answer, I am going to try to articulate it below.
What is regression?
Regression is a statistical measurement that attempts to determine the strength of the relationship between a dependent variable and a series of independent variables.
What is a linear regression?
Linear regression always uses a linear equation, Y = a +bx, where x is the explanatory variable and Y is the dependent variable.
In multiple linear regression, multiple equations are added together but the parameters are still linear.
What is a non-linear regression?
If the model equation does not follow the Y = a +bx form then the relationship between the dependent and independent variables will not be linear. There are many different forms of non-linear models. A random forest regression is considered a non-linear model. Random forest models are ensemble learning methods for regression which grow a forest of regression trees and then average the outcomes. This cannot be expressed as an equation.
In regression trees, the splitting decision is based on minimizing the Residual Sum of Squares (RSS). The variable which has the greatest possible reduction in RSS is chosen as the root node. The tree splitting takes a top-down greedy approach, meaning the algorithm makes the best split at the current step rather than saving a split for better results on future nodes.
A quick way to remember the key difference: linear equations will produce lines and non-linear equations will produce curves. This is not a completely accurate statement because there are ways to produce curves with a linear equation, but as a loose generalization, it does help me conceptually understand.