4.3 Estimation

4.3  Estimation

Although statistics is employed for various purposes, we are primarily interested in using it to estimate parameters of distributions, which is the topic of this section.

4.3.1 Samples

Many people have an intuitive understanding of samples that does not conform to the technical definition, which is quite formal. We shall use samples extensively in this book, so it is worth embracing the formality of the technical definition.

Observations are made, resulting in a body of data {x[1], x[2], … , x[m]}. We perceive the data as “randomly generated.” Depending upon our application, we may apply some statistical formula to the data. The question that concerns us now is: how do we justify whatever statistical formula we use? The answer is: probability theory. We construct a probabilistic model for our data and use it to justify the formula.

Consider a random vector X. A realization of X is a vector x in the range of X. We might treat our data {x[1], x[2], … , x[m]} as a set of m realizations of X, but this model is not useful. A more useful model is to consider a set of m independent random vectors {X[1], X[2], … , X[m]}, each with the same distribution as X. We say the X[k] are IID—independent and identically distributed. We treat each value x[k] in our data as a realization of the corresponding random vector X[k]. We call the set of random vectors {X[1], X[2], … , X[m]} a sample. We call m the sample size and the set of values {x[1], x[2], … , x[m]} a realization of the sample.