2.9 Ordinary Least Squares
The method of least squares is an alternative to interpolation for fitting a function to a set of points. Unlike interpolation, it does not require the fitted function to intersect each point. The method of least squares is probably best known for its use in statistical regression, but it is used in many contexts unrelated to statistics. The method encompasses many techniques. We present a fairly general approach called ordinary least squares.
2.9.1 Example
Suppose researchers gather 10 data points (x[k], y[k]) related to some phenomenon. We interpolate a ninth-order polynomial based upon the data. See Exhibits 2.10 and 2.11.


Because the polynomial is forced to intercept every point, it weaves up and down. In some applications, data may reflect random errors or other sources of “noise.” Forcing a curve to pass through each point causes its shape to reflect such noise as much as any underlying process that generated the data. We say the interpolated function is overfit to the data. As an alternative, we may fit a curve to data without requiring that it intercept each point. A quadratic polynomial fit in this manner to the data of Exhibit 2.10 is illustrated in Exhibit 2.12.

The polynomial of Exhibit 2.12 was constructed with the method of ordinary least squares. The form of the polynomial was specified as
[2.94]
and the constants β1, β2, and β3 were determined in such a manner as to minimize the sum of squares
[2.95]
2.9.2 Ordinary Least Squares Methodology
Consider l points (x[k], y[k]) where x[k]
n and y[k]
. We wish to fit a function f :
n →
of form
[2.96]
to the data in such a manner as to minimize the sum of squares
[2.97]
As with the interpolation methodology of Section 2.4, functions fj : n →
can take any form. Unlike the interpolation methodology, we require that the number m of functions be less than the number l of points.
Let’s express our problem with matrices. Define
[2.98]
This is unknown. It is what we want to solve for. Define f as the l × m matrix comprising values of each function fj evaluated at each point x[k]:
[2.99]
Define the vector
[2.100]
Both the matrix f and vector y are constants. They are known. We express sum-of-squares formula [2.97] as
[2.101]

[2.102]

This is a quadratic polynomial in β of the form [2.91] with c = f ′f. It can be shown that, for any matrix h that has independent columns, the product h′h is positive definite. This is analogous to the fact that the square of any nonzero real number is a positive number. Accordingly, as long as f has linearly independent columns, our sum-of-squares formula [2.97] has a unique minimum. By [2.92], this occurs for
[2.103]
2.9.3 Example (Continued)
Continuing with our example of fitting a quadratic polynomial to the data of Exhibit 2.10, we seek a polynomial
[2.104]
where
- f1(x) = 1,
- f2(x) = x,
- f3(x) = x 2.
Let
[2.105]
We have
[2.106]

and
[2.107]
Applying [2.103], we obtain
[2.108]
and our quadratic polynomial is
[2.109]
which was graphed in Exhibit 2.12.
Exercises
Use ordinary least squares to fit a linear polynomial
[2.110]
to the five points indicated in Exhibit 2.13.

Use ordinary least squares to fit a function of the form
[2.111]
to the five points indicated in Exhibit 2.14.

Prove that, if the number m of functions fj equals the number l of points (x[k], y[k]) then the least squares solution [2.103] reduces to the interpolation solution [2.49]. In this regard, ordinary least squares is a generalization of ordinary interpolation.
Solution