###### 5.9.4 Stratified Sampling

Consider crude Monte Carlo estimator

[5.89]

for some quantity ψ = *E*[*f* (** U**)],

**~**

*U**U*((0,1)

_{n}*). Stratify the region (0,1)*

^{n}*into*

^{n}*w*disjoint subregions:

[5.90]

Define *w* random vectors *X** _{j}* ~

*U*(Ω

_{n}*). Then*

_{j}*f*(

**) can be represented as a mixture, in the sense of Section 3.11, of the**

*U**f*(

*X**) using weights*

_{j}*p*=

_{j}*Pr*(

**∈ Ω**

*X**). Consequently*

_{j}[5.91]

[5.92]

With the method of stratified sampling, we apply a separate Monte Carlo estimator for each expectation *E*[*f* (*X** _{j}*)] and use the probabilities

*p*to take the probability-weighted mean. The result is an unbiased estimator

_{j}[5.93]

where *m _{j}* is the sample size emplopyed on subregion Ω

*. The standard error for stratified sampling is:*

_{j}[5.94]

where σ* _{j}* is the standard deviation of

*f*(

*X**). This formula suggests that, when partitioning (0,1)*

_{j}*, we do so in a manner that minimizes the terms*

^{n}*p*σ

_{j}*. Consider Exhibit 5.15. This depicts a function*

_{j}*f*on an interval (0,1), which has been partitioned into three subintervals Ω

_{1}, Ω

_{2}, and Ω

_{3}. The variability of

*f*over the entire interval (0,1) is greater than its variability over any subinterval. We expect each standard deviation σ

*of*

_{j}*f*(

_{j}**) to be less than the standard deviation σ of**

*X**f*(

**). This will help minimize the terms**

*X**p*σ

_{j}*.*

_{j}*f*takes on a greater range of vales than it does over any of the individual subintervals Ω

_{1}, Ω

_{2}, or Ω

_{3}.

Formula [5.93] also suggests that we maximize each sample size *m _{j}*. If the total sample size

*m*=

*m*

_{1}+

*m*

_{2}+ … +

*m*is fixed, we can increase one term

_{w}*m*only at the expense of the others. An optimal choice of sample sizes is to set, for each

_{j}*j*,

[5.95]

The preceding optimization techniques depend upon the quantities σ* _{j}*, which typically will not be known. Accordingly, optimizing a stratified sampling analysis is often a matter of trial and error. A simple solution is to employ a preliminary Monte Carlo analysis to estimate the quantities σ

*.*

_{j}###### Exercises

Consider the definite integral

[5.96]

Use the following steps to estimate the integral with stratified sampling:

- Apply to the integral a change of variables
[5.97]

to obtain an integral of form[5.98]

- Stratify the new region of integration (0,1)
^{2}into three subregions:- Ω
_{1}= {:*u**u*_{1}≤ .5 and*u*_{2}≤ .5}, - Ω
_{2}= {: (*u**u*_{1}≤ .5 and*u*_{2}> .5) or (*u*_{1}> .5 and*u*_{2}≤ .5)}, - Ω
_{3}= {:*u**u*_{1}> .5 and*u*_{2}> .5}.

- Ω
- Sketch the three subregions.
- Explain in your own words why these subregions are a reasonable choice for stratified sampling.
- Define
~**Y**_{j}*U*_{2}(Ω) for_{j}*j*= 1, 2, 3. Estimate the mean μand standard deviation σ_{j}of_{j}*f*() for each**Y**_{j}*j*as follows:- Generate 50
*U*_{2}(Ω) pseudorandom vectors for each_{j}*j*. - For each
*j*, calculate sample means and sample standard deviations of the .

- Generate 50
- Based upon the estimated standard deviations , apply [5.96] to determine a suitable sample size
*m*to be used in each subregion Ω_{j}. Assume_{j}*m*= 1000. - Compare your three results
*m*_{1},*m*_{2}, and*m*_{3}with the expected number of pseudorandom vectors*u*^{[k]}that would fall in each of Ω_{1}, Ω_{2}, and Ω_{3}if stratified sampling were not used and crude Monte Carlo estimator [5.89] were used with the same total sample size of*m*= 1000. - Based upon your values
*m*from item (f), specify an estimator for [5.98] of form [5.93]._{j} - Based upon your estimated means and standard deviations from item (e), estimate the standard error of your estimator as well as the standard error of the corresponding crude Monte Carlo estimator. (Hint: For the crude Monte Carlo estimator, treat the random variable
*f*() as a mixture, in the sense of Section 3.11, of three distributions.)*U* - Based upon your results from item (i), how much would you need to increase the sample size for the crude Monte Carlo estimator in order for it to have the same standard error as your stratified sampling estimator from item (h)?
- Apply your estimator from item (h) to estimate [5.98].