Math 443: The Mathematics and Statistics of Surveys

Assignment 3 



Due Date: October 8.
Reading: Lohr, Chapter 4, and J. Neyman's paper.

 
Problems:

 
  1. Lohr, Chapter 4, Problem 2, part b.  Compare the mean and standard error for the stratrified sample with those from the SRS

  2. in Problem 11 of Chapter 2.  You do not need to plot the data.
     
  3. Lohr, Chapter 4, Problem 12.

  4.  
  5. Addition to Chapter 4, Problem 12:  Since you are based in Massachusetts, it is probably reasonable to assume that it costs more for you to survey farms in the West than in the Northeast.  Assume that each farm in the West costs $70 to survey, each farm in the East costs $20 to survey, and farms in other regions cost $40 to survey.  Using the cost function in Equation 4.12 of Lohr and the estimated variances in Example 4.1, determine the optimal allocation of sample for these cost constraints and a budget of $15,000.  Assume fixed costs equal zero.

  6.  
  7. Lohr, Chapter 4, Problem 13. Use the allocation from Problem 12, not the allocation you determined in the additional problem above.  See directions below to choose stratified samples in Stata. Turn in Stata output here so that I can check results.  Also, tell me what weights and fpc you used for each stratum.

  8.  
  9. Lohr, Chapter 4, Problem 16, parts a, b, and c. Do not do part d, as it requires material from Chapter 3.

  10.  
  11. In three or four sentences, summarize the main points of Neyman's paper.

  12.  

     
     
     
     
     

    Relevant Stata commands:

    Pick any number between 1 and 123456789.  Type set seed yournumber, where yournumber is the random number you picked.  This tells Stata where to start when generating random numbers.  By setting a different seed each time you start Stata, you can be sure that you won't sample the same units over and over again. Write your seed value in your answer to the HW problem so that we can reproduce the sample later if we need to.

    To draw a stratified sample with simple random sampling in each stratum, we have to partition the population into the strata.  We then sample randomly in each strata.  For Lohr's Problem 13, I created four populations (one for each stratum) from the agpop data set: agpopW (West stratum), agpopS (South stratum), agpopNC (Northcentral stratum), and agpopNE (Northeast stratum).  To take a stratified sample, first decide what fraction, n_h / N_h, will be sampled in each stratum h.  Then, follow the steps below:

    Load in the data for the West region (type use agpopW).  Use sample perc_w to pick a random sample of  (perc_w)  percent of the N_west units in agpopW.  For example, to sample 5% of the units in the West, type sample 5.  Save this as a new data set on your home drive (type save temporary).

    Now, type clear.  Load in the data for the NE region (type use agpopNE).  Use sample perc_ne to pick a sample of  (perc_ne)  percent of the N_ne units in agpopNE.  Now, type append using temporary.  This adds on the sampled units in the West to the end of the data set containing the sampled units in the NE.  Save this data set as temporary again by typing save temporary, replace.  The replace option tells Stata to overwrite the old copy of the file temporary.

    Repeat the above process for the NC (agpopNC) and S (agpopS) regions.  The last data set that you save will be a stratified, simple random sample from the population.
     

    To analyze data from a stratified, simple random sample, you need to create a vector of weights.  The weight for the units in each stratum is N_h/n_h.  Type generate wts = 0 just to get a variable wts in the data set.  Now, for all the units in region W, you want to change the weight to N_west/n_west.  To do this, look at the editor to see what numbers of observations correspond to the units in the West (for an example, let's say units 64 through 120 are units in the West).  Then, type replace wts = N_west/n_west in 64/120.  This tells Stata to replace the 0 in wts with N_west/n_west for units 64 to 120.  Do similar commands to specify the weights for each region.

    You also need a vector of finite population correction factors.  This is easy.  Simply type generate fpcf = 1/wts.

    Finally, to use svytotal or svymean, simply type svytotal varname [weight = wts], fpc(fpcf) strata(region).  The strata option tells Stata what the stratum indicators are.  Be sure to type these in, as there will be no automatic selection of weights and fpc as there was in the last homework (when I saved the correct weights and fpc so that you automatically used them when typing svymean or svytotal).

     Stata handout.

     
     
       



Jerome.P.Reiter

Sat Sep 25 20:29:13 EDT 1999