Linear regression refers to finding linear models which best approximate the relationship between a set of variables \(x_1, \dotsc, x_m\) and another set of variables \(y_1, \dotsc, y_n\) based on data samples.
In multiple linear regression, we want to find a linear model which best describes how a single variable \(y\) depends on some other variables \(x_1, \dotsc, x_m\).
Definition: Multiple Linear Regression
Let \((x_1^{(1)}, \dotsc, x_m^{(1)}, y^{(1)}), \dotsc, (x_1^{(p)}, \dotsc, x_m^{(p)}, y^{(p)})\) be a set of data points.
A multiple linear regression model is a linearfunction\(f: \mathbb{R}^m \to \mathbb{R}\).
For computational reasons, we exclusively work with a modified matrix representation
where \(\boldsymbol{\beta} = \begin{bmatrix} \beta_0, \beta_1,\dotsc, \beta_m\end{bmatrix}^{\mathsf{T}} \in \mathbb{R}^{m+1}\).
Definition: Residual
The residual of a sample \((x_1^{(k)}, \dotsc, x_m^{(k)}, y^{(k)})\) under \(f\) is a real scalar field\(r_k: \mathbb{R}^{m+1} \to \mathbb{R}\) defined as follows:
Since \(g\) has a local minimum at \(\boldsymbol{\beta}\), all of its directional derivatives there are zero and so \(\nabla g(\boldsymbol{\beta})\) must be zero:
If \(\boldsymbol{A}\) has full rank with \(p \ge m+1\) and \(\boldsymbol{A}^{\mathsf{T}}\boldsymbol{A} \boldsymbol{\beta} = \boldsymbol{A}^{\mathsf{T}}\boldsymbol{y}\), then \(g\) has a global minimum at \(\boldsymbol{\beta}\).
Since \(p \ge m+1\) and since \(\boldsymbol{A}\) has full rank, we know that \(\boldsymbol{A}^{\mathsf{T}}\boldsymbol{A}\) is positive definite and so \(H_g(\boldsymbol{\beta})\) is positive definite. Therefore, \(g\) has a local minimum at \(\boldsymbol{\beta}\).