Key Concepts Study Tool: Chapter 19

Click on each concept below to check your understanding.

1. Three Types of Non-Response in Social Surveys

  1. Household non-response: when an entire household does not complete a questionnaire. This is difficult to deal with since there is no information available whatsoever.
  2. Household non-response: when an entire household does not complete a questionnaire. This is difficult to deal with since there is no information available whatsoever.
  3. Person non-response: when an interview is obtained from at least one household member, but not from one or more others in that household. It is the result of unwillingness, inability, or unavailability of a chosen respondent to answer survey questions. It is dealt with through editing and imputation of values.
  4. Item non-response: when a respondent completes only part of a questionnaire, leaving blanks for some information.

2. Four possible reasons for missing values

  1. X2 is “missing completely at random” (MCAR), meaning the non-response on this question is entirely independent of patterns in either Y or X1 (we say that the reason for “missingness” is contained in the error term e).
  2. X2 is “missing at random” (MAR) but dependent on explanatory variable X1, so with certain values of X1, X2 is more likely to be missing.
  3. X2 is “missing at random” but dependent on the focal outcome, and certain values of Y increase the probability that X2 will be missing.
  4. X2 is “non-ignorable missing” value (NIM). That is, X2 is often missing when it is a certain value.

3. What to Do about Missing Data? Do Nothing.

  • List-wise deletion: Delete all observations with missing data.
  • Pair-wise deletion: Use all available data to compute these means. Observations with missing values will still contribute to coefficient estimates so the observations with missing X1 or X2 values will still be included in some calculations.

4. What to Do about Missing Data? Do Something.

  • Best Guess Imputation: Missing values are viewed in a quasi-subjective manner by the researcher, based on knowledge obtained from other variables.
  • Zero Imputation: Replaces missing values with either the arithmetic average (for continuous data), or the mode or most frequent value (for categorical data) of the variable, based on values from valid observations.
  • Mean Substitution: delete all observations with missing data.
  • Hot Deck: A value from another observation is used as a “donor” to replace the missing value.
  • Cold Deck: A missing value is derived using anything other than the same variable of that survey. Values from a covariate, or a previous survey, are often used to impute the missing value.
  • Y Regression Imputation: A preliminary regression is run on all observations with the problematic variable as the focal outcome. A model for predicting the values of the missing data is derived from the regression. Missing values are filled by predicted regression values.
  • Ŷ Regression Imputation with Random Error Term: Similar to regression imputation, except that error term is attached to the imputed value, allowing for an element of uncertainty in the estimate.

5. What to Do about Missing Data? Do Multiple Things.

  • Multiple Imputation: The most mathematically abstract and complex method of imputation, but also the most accurate and consistent because it “builds in” a level of uncertainty, preserving, to some degree, the integrity and accuracy of the standard errors and model fit statistics.
  • Unfortunately, multiple imputation is complicated to use and computer intensive. Moreover, each time multiple imputation is used, different estimates are produced. Since quasi-random draws are taken from each variable, different values for each of the M data sets are selected each time, producing different results when combined.
Back to top