Chapter 9: Heteroskedasticity

Detection, consequences, and robust inference

NoteLearning Objectives

By the end of this chapter you will be able to:

  • Define heteroskedasticity and explain why it matters for inference
  • Detect heteroskedasticity using graphical methods, the Breusch-Pagan test, and the White test
  • Explain why OLS remains unbiased but standard errors become invalid under heteroskedasticity
  • Compute and interpret heteroskedasticity-robust standard errors using sandwich and estimatr
  • Apply Weighted Least Squares (WLS) as an efficient estimator when the error variance structure is known

1 What Is Heteroskedasticity?

Homoskedasticity (SLR.5) requires \(\text{Var}(u \mid x) = \sigma^2\) — constant variance across all values of \(x\).

Heteroskedasticity occurs when this fails:

\[ \text{Var}(u_i \mid x_i) = \sigma_i^2 \quad \text{(varies across observations)} \]

This is common in cross-sectional data. For example: - Housing prices: larger houses have more variable prices - Wages: variation in wages tends to be higher among high-education workers - Firm-level data: large firms have more volatile outcomes than small firms

1.1 Consequences of Heteroskedasticity

Property Under Homoskedasticity Under Heteroskedasticity
OLS unbiased? Yes (SLR.1–4 sufficient) Yes — unbiasedness unchanged
OLS consistent? Yes Yes — consistency unchanged
OLS BLUE? Yes (Gauss-Markov) No — WLS is more efficient
Standard SE formula valid? Yes No — SEs are biased
t and F tests valid? Yes No — tests are wrong

The key point: heteroskedasticity does not bias the coefficients, but it invalidates the standard errors — and therefore all hypothesis tests and confidence intervals.


2 Visualising Heteroskedasticity

The most informative diagnostic is a plot of residuals against fitted values (or against \(x\)).

fit_hprice <- lm(price ~ sqrft + bdrms + lotsize, data = hprice1)

augment(fit_hprice) |>
  ggplot(aes(.fitted, .resid)) +
  geom_point(alpha = 0.6, colour = "#2c7be5") +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "#e63946") +
  geom_smooth(method = "loess", se = FALSE, colour = "#00c9a7", linewidth = 0.8) +
  labs(x = "Fitted values", y = "Residuals",
       title = "Residuals vs. Fitted: Housing Price Model",
       subtitle = "A fanning pattern indicates heteroskedasticity")
Figure 1: Residual plot for the housing price model. Fan shape suggests heteroskedasticity.
augment(fit_hprice) |>
  mutate(sqrt_abs_resid = sqrt(abs(.std.resid))) |>
  ggplot(aes(.fitted, sqrt_abs_resid)) +
  geom_point(alpha = 0.6, colour = "#2c7be5") +
  geom_smooth(method = "loess", se = FALSE, colour = "#e63946", linewidth = 0.8) +
  labs(x = "Fitted values", y = expression(sqrt("|Standardised residual|")),
       title = "Scale-Location Plot",
       subtitle = "An upward trend confirms increasing variance")
Figure 2: Scale-location plot: sqrt(|residuals|) against fitted values.

The clear upward trend confirms that variance increases with house price — a classic case of heteroskedasticity in housing data.


3 Formal Tests for Heteroskedasticity

3.1 Breusch-Pagan Test

The Breusch-Pagan (BP) test is a formal Lagrange Multiplier test of the null hypothesis that errors are homoskedastic.

Setup. Suppose the true error variance is a linear function of auxiliary variables \(\mathbf{z}_i = (z_{i1}, \ldots, z_{iq})'\):

\[H_0: \sigma_i^2 = \sigma^2 \quad (\text{constant for all } i)\] \[H_1: \sigma_i^2 = \sigma^2 + \alpha_1 z_{i1} + \cdots + \alpha_q z_{iq}\]

Often \(\mathbf{z}_i = \mathbf{x}_i\) (the original regressors), but the test is valid for any choice.

Procedure:

  1. Estimate the original model and obtain \(\hat{u}_i^2\)
  2. Regress \(\hat{u}_i^2\) on \(z_{i1}, \ldots, z_{iq}\) and record \(R^2_{\hat{u}^2}\)
  3. The test statistic \(LM = nR^2_{\hat{u}^2} \xrightarrow{d} \chi^2(q)\) under \(H_0\)

A significant result rejects the null of homoskedasticity.

bptest(fit_hprice)

    studentized Breusch-Pagan test

data:  fit_hprice
BP = 14, df = 3, p-value = 0.003
# Log specification often reduces heteroskedasticity
fit_log <- lm(log(price) ~ log(sqrft) + bdrms + log(lotsize), data = hprice1)
bptest(fit_log)

    studentized Breusch-Pagan test

data:  fit_log
BP = 4.2, df = 3, p-value = 0.2

The BP test strongly rejects homoskedasticity for the level model. The log-log specification reduces (but does not eliminate) heteroskedasticity — log transformations often stabilise variance.

3.2 White Test

The White (1980) test is more general than the BP test. The full White test regresses \(\hat{u}_i^2\) on all regressors, their squares, and all cross-products:

\[\hat{u}_i^2 = \alpha_0 + \sum_j \alpha_j x_{ij} + \sum_j \alpha_{jj} x_{ij}^2 + \sum_{j < \ell} \alpha_{j\ell} x_{ij} x_{i\ell} + v_i\]

With \(k\) regressors, there are \(k + k + \binom{k}{2} = k(k+3)/2\) auxiliary regressors, so the \(\chi^2\) degrees of freedom grow quickly.

The special form (using fitted values) avoids this proliferation. Note that \(\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_{1i} + \cdots + \hat{\beta}_k x_{ki}\) is a linear combination of the regressors, and \(\hat{y}_i^2\) is a linear combination of all squares and cross-products. So regressing \(\hat{u}_i^2\) on \(\hat{y}_i\) and \(\hat{y}_i^2\) captures a projection onto the same space as the full White test, but with only 2 auxiliary variables instead of \(k(k+3)/2\).

# White test (special form): regress u^2 on yhat and yhat^2
bptest(fit_hprice, ~ fitted(fit_hprice) + I(fitted(fit_hprice)^2))

    studentized Breusch-Pagan test

data:  fit_hprice
BP = 16, df = 2, p-value = 0.0003

The White test detects heteroskedasticity that the BP test misses when the variance depends non-linearly on the regressors.

TipBP vs. White
  • BP test: tests for heteroskedasticity that is a linear function of the regressors. Easy to interpret.
  • White test: tests for any form of heteroskedasticity, including non-linear. More general but harder to diagnose.
  • Both are large-sample tests justified by asymptotic theory (Chapter 8). The p-value comes from \(\chi^2\) critical values.

4 The Solution: Robust Standard Errors

4.1 The Sandwich Estimator: Derivation

Since Chapter 8 established that OLS is consistent under weak conditions, we can keep OLS for estimation. The fix for inference is to replace the biased standard SE formula with heteroskedasticity-robust standard errors.

The need arises from the general variance formula derived in Chapter 8. Under heteroskedasticity, let \(\Omega = \text{Var}(\mathbf{u} \mid \mathbf{X}) = \text{diag}(\sigma_1^2, \sigma_2^2, \ldots, \sigma_n^2)\) (the true error variances, each allowed to differ). Then:

\[\text{Var}(\hat{\boldsymbol{\beta}} \mid \mathbf{X}) = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}' \boldsymbol{\Omega} \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\]

The “filling” of the sandwich is:

\[\mathbf{X}'\boldsymbol{\Omega}\mathbf{X} = \sum_{i=1}^n \sigma_i^2 \mathbf{x}_i\mathbf{x}_i'\]

Since \(\sigma_i^2\) is unknown, we replace it with \(\hat{u}_i^2\) (the squared OLS residual for observation \(i\)). This gives the HC0 sandwich estimator:

\[\widehat{\text{Var}}_{\text{HC0}}(\hat{\boldsymbol{\beta}}) = (\mathbf{X}'\mathbf{X})^{-1} \left(\sum_{i=1}^n \hat{u}_i^2 \mathbf{x}_i\mathbf{x}_i'\right) (\mathbf{X}'\mathbf{X})^{-1}\]

Why is \(\hat{u}_i^2\) a valid replacement for \(\sigma_i^2\)? By consistency of OLS, \(\hat{u}_i \xrightarrow{p} u_i\), so \(\hat{u}_i^2 \xrightarrow{p} u_i^2\) and \(E[u_i^2 \mid \mathbf{x}_i] = \sigma_i^2\). The substitution is justified asymptotically by the WLLN applied to \(\sum \hat{u}_i^2 \mathbf{x}_i\mathbf{x}_i' / n \xrightarrow{p} E[\sigma_i^2 \mathbf{x}_i\mathbf{x}_i']\).

Contrast with the standard formula. Under homoskedasticity (\(\sigma_i^2 = \sigma^2\) for all \(i\)):

\[\mathbf{X}'\boldsymbol{\Omega}\mathbf{X} = \sigma^2 \mathbf{X}'\mathbf{X} \implies \text{Var}(\hat{\boldsymbol{\beta}}) = \sigma^2(\mathbf{X}'\mathbf{X})^{-1}\]

The standard formula is a special case of the sandwich when the diagonal of \(\boldsymbol{\Omega}\) is constant. When \(\sigma_i^2\) varies, the standard formula gets the “filling” wrong — typically understating the variance when high-\(x\) observations have large errors.

4.2 HC Variants

Variant Filling Recommended for
HC0 \(\hat{u}_i^2\) Large \(n\)
HC1 \(\frac{n}{n-k-1}\hat{u}_i^2\) Default in Stata (robust)
HC3 \(\frac{\hat{u}_i^2}{(1-h_{ii})^2}\) Best for small/moderate \(n\)

\(h_{ii} = [\mathbf{P}_X]_{ii}\) is the leverage of observation \(i\) — how much \(i\) influences its own fitted value. HC3 inflates the residuals for high-leverage observations, which tend to have understated residuals (they “pull” the fit toward themselves).

4.3 Method 1: sandwich + lmtest

# Standard OLS SEs
coeftest(fit_hprice)

t test of coefficients:

              Estimate Std. Error t value          Pr(>|t|)    
(Intercept) -21.770308  29.475042   -0.74            0.4622    
sqrft         0.122778   0.013237    9.28 0.000000000000017 ***
bdrms        13.852522   9.010145    1.54            0.1279    
lotsize       0.002068   0.000642    3.22            0.0018 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# HC3 robust SEs (default, recommended for small samples)
coeftest(fit_hprice, vcov = vcovHC(fit_hprice, type = "HC3"))

t test of coefficients:

             Estimate Std. Error t value Pr(>|t|)   
(Intercept) -21.77031   41.03269   -0.53   0.5971   
sqrft         0.12278    0.04073    3.01   0.0034 **
bdrms        13.85252   11.56179    1.20   0.2342   
lotsize       0.00207    0.00715    0.29   0.7731   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

4.4 Method 2: estimatr::lm_robust()

lm_robust() computes robust SEs directly and integrates with modelsummary:

fit_robust <- lm_robust(price ~ sqrft + bdrms + lotsize,
                        data = hprice1, se_type = "HC3")
tidy(fit_robust)

4.5 Side-by-Side Comparison

modelsummary(
  list(
    "OLS (conventional SEs)" = fit_hprice,
    "OLS (HC3 robust SEs)"   = fit_robust
  ),
  stars   = TRUE,
  gof_map = c("nobs", "r.squared"),
  title   = "Effect of Robust SEs on Housing Price Estimates"
)

OLS vs. Robust SEs for housing price model.

OLS (conventional SEs) OLS (HC3 robust SEs)
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) -21.770 -21.770
(29.475) (41.033)
sqrft 0.123*** 0.123**
(0.013) (0.041)
bdrms 13.853 13.853
(9.010) (11.562)
lotsize 0.002** 0.002
(0.001) (0.007)
Num.Obs. 88 88
R2 0.672 0.672

Notice: the coefficient estimates are identical — only the standard errors (and thus t-statistics and p-values) change. With robust SEs, inference is valid regardless of the form of heteroskedasticity.

ImportantWhich HC Variant?

The sandwich package provides several estimators:

Type Formula Recommended when
HC0 \(\hat{u}_i^2\) Large \(n\)
HC1 \(\frac{n}{n-k-1} \hat{u}_i^2\) Default in Stata
HC3 \(\frac{\hat{u}_i^2}{(1 - h_{ii})^2}\) Best for small/moderate \(n\)

Use HC3 by default — it corrects for leverage and performs well in finite samples. Stata’s robust option uses HC1.


5 Weighted Least Squares (WLS)

When you know (or can estimate) the form of the error variance, WLS is more efficient than OLS with robust SEs.

5.1 Derivation: Why WLS Restores Homoskedasticity

Suppose \(\text{Var}(u_i \mid \mathbf{x}_i) = \sigma^2 h_i\) where \(h_i > 0\) is known. Divide the regression equation for observation \(i\) by \(\sqrt{h_i}\):

\[\frac{y_i}{\sqrt{h_i}} = \frac{\beta_0}{\sqrt{h_i}} + \beta_1 \frac{x_{1i}}{\sqrt{h_i}} + \cdots + \beta_k \frac{x_{ki}}{\sqrt{h_i}} + \frac{u_i}{\sqrt{h_i}}\]

Define \(y_i^* = y_i/\sqrt{h_i}\), \(x_{ji}^* = x_{ji}/\sqrt{h_i}\), and \(u_i^* = u_i/\sqrt{h_i}\). The transformed error:

\[\text{Var}(u_i^* \mid \mathbf{x}_i) = \frac{\text{Var}(u_i \mid \mathbf{x}_i)}{h_i} = \frac{\sigma^2 h_i}{h_i} = \sigma^2\]

The transformed model \(y_i^* = \beta_0 x_{0i}^* + \beta_1 x_{1i}^* + \cdots + u_i^*\) (where \(x_{0i}^* = 1/\sqrt{h_i}\), not a constant) satisfies homoskedasticity. OLS on the transformed data is therefore BLUE by Gauss-Markov — this is WLS.

WLS minimises:

\[\sum_{i=1}^n \frac{1}{h_i} (y_i - \beta_0 - \beta_1 x_{1i} - \cdots)^2 = \sum_{i=1}^n w_i (y_i - \mathbf{x}_i'\boldsymbol{\beta})^2, \qquad w_i = 1/h_i\]

In R, this is just lm() with a weights argument:

# Suppose variance is proportional to sqrft (larger homes => more variable prices)
fit_wls <- lm(price ~ sqrft + bdrms + lotsize,
              data = hprice1,
              weights = 1 / hprice1$sqrft)

modelsummary(
  list("OLS" = fit_hprice, "WLS (1/sqrft)" = fit_wls),
  stars   = TRUE,
  gof_map = c("nobs", "r.squared"),
  title   = "OLS vs. WLS for Housing Prices"
)
OLS vs. WLS for Housing Prices
OLS WLS (1/sqrft)
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) -21.770 4.199
(29.475) (29.698)
sqrft 0.123*** 0.118***
(0.013) (0.014)
bdrms 13.853 10.607
(9.010) (8.659)
lotsize 0.002** 0.002**
(0.001) (0.001)
Num.Obs. 88 88
R2 0.672 0.591
TipOLS with Robust SEs vs. WLS
  • OLS + robust SEs: Always valid for inference; does not improve efficiency. The preferred choice when the variance structure is unknown.
  • WLS: More efficient (smaller SEs) when the variance structure is correctly specified. Misspecifying the weights can make things worse.

In practice, OLS with robust SEs is the default recommendation for cross-sectional data. WLS is useful when the variance structure is theoretically motivated (e.g., grouped data where \(h_i\) is the group size).


6 Tutorials

Tutorial 9.1 Using wooldridge::wage1, regress wage on educ, exper, and tenure.

  1. Plot the residuals against fitted values. Does the pattern suggest heteroskedasticity?
  2. Conduct a Breusch-Pagan test. What do you conclude?
  3. Report estimates with conventional and HC3 robust standard errors side by side. Which coefficients change meaningfully?
fit_wage <- lm(wage ~ educ + exper + tenure, data = wage1)

# a) Residual plot
augment(fit_wage) |>
  ggplot(aes(.fitted, .resid)) +
  geom_point(alpha = 0.5, colour = "#2c7be5") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_smooth(method = "loess", se = FALSE, colour = "#e63946") +
  labs(x = "Fitted wage", y = "Residuals", title = "Residuals vs. Fitted: Wage Model")

# b) BP test
bptest(fit_wage)

    studentized Breusch-Pagan test

data:  fit_wage
BP = 43, df = 3, p-value = 0.000000002
# c) Comparison
fit_wage_r <- lm_robust(wage ~ educ + exper + tenure, data = wage1, se_type = "HC3")
modelsummary(
  list("OLS (conventional)" = fit_wage, "OLS (HC3 robust)" = fit_wage_r),
  stars = TRUE, gof_map = c("nobs", "r.squared")
)
OLS (conventional) OLS (HC3 robust)
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) -2.873*** -2.873***
(0.729) (0.819)
educ 0.599*** 0.599***
(0.051) (0.062)
exper 0.022+ 0.022*
(0.012) (0.011)
tenure 0.169*** 0.169***
(0.022) (0.030)
Num.Obs. 526 526
R2 0.306 0.306

The residual plot typically shows a fan shape — higher earners have more variable wages, a classic pattern. The BP test should be significant. Robust SEs are often larger for educ, which matters most for policy interpretation.

Tutorial 9.2 Show that the log wage specification reduces (but may not eliminate) heteroskedasticity. Regress log(wage) on educ, exper, and tenure, then apply the BP test and compare residual plots with the level-wage model.

wage1 <- wage1 |> mutate(lwage = log(wage))
fit_lwage <- lm(lwage ~ educ + exper + tenure, data = wage1)

bptest(fit_wage)   # level model

    studentized Breusch-Pagan test

data:  fit_wage
BP = 43, df = 3, p-value = 0.000000002
bptest(fit_lwage)  # log model

    studentized Breusch-Pagan test

data:  fit_lwage
BP = 11, df = 3, p-value = 0.01
# Residual plots side by side
p1 <- augment(fit_wage) |>
  ggplot(aes(.fitted, .resid)) +
  geom_point(alpha = 0.4) + geom_smooth(method = "loess", se = FALSE, colour = "#e63946") +
  labs(title = "Level wage", x = "Fitted", y = "Residuals")

p2 <- augment(fit_lwage) |>
  ggplot(aes(.fitted, .resid)) +
  geom_point(alpha = 0.4) + geom_smooth(method = "loess", se = FALSE, colour = "#e63946") +
  labs(title = "Log wage", x = "Fitted", y = "Residuals")

library(patchwork)
p1 + p2

The log transformation substantially reduces heteroskedasticity. The BP test is often no longer significant (or much less so) for the log wage model. This is one reason why log specifications are so common in labour economics.

Tutorial 9.3 Using wooldridge::hprice1, estimate the log-price model: log(price) ~ log(sqrft) + bdrms + log(lotsize).

  1. Apply the BP test. Is heteroskedasticity still present?
  2. Compare OLS with HC3 robust standard errors for this log-log model.
  3. Which approach do you prefer, and why? Consider both the BP test result and the change in inference.
fit_log2 <- lm(log(price) ~ log(sqrft) + bdrms + log(lotsize), data = hprice1)

# a) BP test
bptest(fit_log2)

    studentized Breusch-Pagan test

data:  fit_log2
BP = 4.2, df = 3, p-value = 0.2
# b) Comparison
fit_log2_r <- lm_robust(log(price) ~ log(sqrft) + bdrms + log(lotsize),
                        data = hprice1, se_type = "HC3")

modelsummary(
  list("OLS (conventional)" = fit_log2, "OLS (HC3 robust)" = fit_log2_r),
  stars = TRUE, gof_map = c("nobs", "r.squared"),
  title = "Log-Price Model: Conventional vs. Robust SEs"
)
Log-Price Model: Conventional vs. Robust SEs
OLS (conventional) OLS (HC3 robust)
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) -1.297* -1.297
(0.651) (0.850)
log(sqrft) 0.700*** 0.700***
(0.093) (0.121)
bdrms 0.037 0.037
(0.028) (0.036)
log(lotsize) 0.168*** 0.168**
(0.038) (0.053)
Num.Obs. 88 88
R2 0.643 0.643

The BP test result for the log model is typically much weaker than for the level model. If heteroskedasticity is no longer significant, conventional SEs are adequate. However, as a matter of robustness practice, many applied economists report HC3 SEs regardless — the cost is minimal and the protection is real.