• Document: Statistics 522: Sampling and Survey Techniques. Topic 4
  • Size: 69.43 KB
  • Uploaded: 2019-03-24 06:43:37
  • Status: Successfully converted

Some snippets from your converted document:

Statistics 522: Sampling and Survey Techniques Topic 4 Topic Overview This topic will cover • Stratified Sampling • Sampling Allocation What • We divide the population into H subpopulations called strata. • Each sampling unit is in exactly one stratum. • We draw independent probability samples from each stratum. • We pool the information to obtain population estimates. Why One or more of • We want to avoid taking a really bad sample. – We could take separate SRSs of females and males. • We want data of known precision for subgroups. – Suppose females and males are not be equally represented in the population but we want the same precision for each estimate. • A stratified sample may be more convenient to administer and cheaper. – Different methods for different strata • Stratified sampling can lead to estimates with smaller standard errors compared with an SRS with the same total number of observations. Example 4.1 • In Example 2.4 on page 31, we took an SRS of size 300 from the population of 3078 US counties in the Census of Agriculture data set. • We wanted to estimate the total acreage devoted to agriculture. • Consider stratifying by region of the US. 1 Regions • We believe that the agricultural acres may differ by region. • Consider a stratified sample of size 300 where the number of samples in each region is proportional to the number of counties in the population. • Other stratification allocations can also be used. The strata Region Population Sample Northeast 220 21 North Central 1054 103 South 1382 135 West 422 41 Total 3078 300 The samples • Take an SRS from each strata. • Find the mean and the variance for the data from each stratum. • Use the SRS methods to estimate the total number of acres for each stratum. • Add the totals for the population estimate. • The variances also add. Design Effect • The design effect is the ratio of the variance for the stratified sample to the variance for the corresponding (n = 300) SRS. • For this problem, the estimate of the design effect is 0.75 (page 98). • This means that a stratified sample of (0.75)300 = 225 would give the same variance as an SRS of 300. Questions about the Construction of Strata • If there is a choice, which stratification variable should be used? By stratification variable is meant the characteristics used for subdividing the population into strata. For example, would an age and sex stratification be preferred to a stratification by occupational groups? • How should strata be demarcated? If the stratification uses age groups, what age intervals should be used to set up the strata? 2 • How many strata should there be? How many age groups should there be, if age is a stratification variable? Choices of methods within strata • A sampling design and a sample size must be specified in each stratum. Often the same type of sampling design is applied in all of the strata. • An estimator must be specified for each stratum. Often this choice is also made uniformly for all strata. Notation • Strata are labeled 1 to H. • Nh is population size for stratum h. • N = N1 + N2 + . . . + NH • Stratified random sampling is the simplest form of stratified sampling; we take an SRS of size nh from each stratum. Population quantities • yh,j is the value of the jth unit in stratum h. • th is the total for stratum h. • t is the population total. • ȳh,U is the population mean in stratum h. • ȳU is the overall population mean. • Sh2 is the population variance in stratum h (uses Nh − 1 in the denominator). Sample quantities • ȳh is the sample mean for stratum h. • t̂h is the sample estimate of the total for stratum h. • s2h is the sample variance in stratum h. 3 Estimates of population parameters X t̂str = t̂h strata ȳstr = t̂str /N Note that t̂h estimates th . Properties t̂str and ȳstr are unbiased because t̂h is unbiased; equivalently, because ȳh is unbiased Variance • The variance of t̂str is the sum of the variances of the t̂h . • The variance of t̂h is obtained using the methods for SRS’s. • It is the product of three terms: – f pch = (1 − nh /Nh ) – Nh2 (we multiply the mea

Recently converted files (publicly available):