How to Do a Goodnessof Fit Test
A goodness of fit test is a statistical procedure used to determine whether a set of observed categorical data matches an expected distribution. Researchers and analysts apply this test when they want to assess if sample frequencies conform to a theoretical model, such as a uniform distribution, a binomial distribution, or any hypothesized probability pattern. By comparing observed counts with expected counts, the test quantifies the discrepancy and helps decide whether any difference is likely due to random sampling error or indicates a real mismatch.
Understanding the Goodness of Fit Test
At its core, the goodness of fit test evaluates the null hypothesis that the observed data follow a specified distribution. The alternative hypothesis states that the data do not follow that distribution. The most common implementation uses the χ² (chi‑square) statistic, which sums the squared differences between observed and expected frequencies, each divided by the expected frequency:
[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} ]
where (O_i) is the observed count for category i and (E_i) is the expected count under the null hypothesis.
Key assumptions for the chi‑square goodness of fit test include:
- The data are independent observations.
- Each category’s expected frequency is at least 5 (some texts allow a minimum of 1 if no more than 20% of cells fall below 5).
- The categories are mutually exclusive and exhaustive.
If these conditions are not met, consider combining sparse categories or using an exact test (e.g., Fisher’s exact test for small tables).
When to Use a Goodness of Fit Test
You would choose this test in scenarios such as:
- Checking whether a die is fair by comparing roll frequencies to the expected uniform distribution.
- Testing if genetic trait ratios in offspring match Mendelian expectations.
- Evaluating whether survey responses across demographic groups follow a presumed proportion.
- Validating that incoming call volumes per hour follow a Poisson distribution.
Steps to Perform a Goodness of Fit Test
Follow these systematic steps to conduct the test correctly:
-
State the Hypotheses
- Null hypothesis (H₀): The observed frequencies follow the expected distribution. - Alternative hypothesis (H₁): The observed frequencies do not follow the expected distribution.
-
Choose a Significance Level (α)
Common choices are 0.05, 0.01, or 0.10. This threshold determines the critical χ² value. -
Calculate Expected Frequencies
Multiply the total sample size (N) by the hypothesized proportion for each category:
[ E_i = N \times p_i ] where (p_i) is the expected proportion for category i. -
Verify Assumptions
Ensure each (E_i \ge 5). If not, combine adjacent categories until the condition holds. -
Compute the χ² Statistic
Use the formula above, summing over all categories. -
Determine Degrees of Freedom (df) [ df = (\text{number of categories}) - 1 - (\text{number of estimated parameters}) ] For a simple goodness of fit test with no parameters estimated from the data, df = k – 1, where k is the number of categories.
-
Find the Critical Value or p‑Value
- Compare the computed χ² to the critical value from the χ² distribution table with the appropriate df and α.
- Alternatively, calculate the p‑value: the probability of obtaining a χ² as extreme or more extreme than the observed value under H₀.
-
Make a Decision
- If χ² > critical value or p‑value < α, reject H₀ (evidence of poor fit).
- If χ² ≤ critical value or p‑value ≥ α, fail to reject H₀ (no evidence against the hypothesized distribution).
-
Interpret the Results in Context
Explain what rejecting or not rejecting H₀ means for the practical question at hand (e.g., “The die appears biased” or “There is no reason to doubt the Mendelian ratio”).
Example Calculation
Suppose you roll a six‑sided die 60 times and record the following outcomes:
| Face | Observed (O) |
|---|---|
| 1 | 8 |
| 2 | 12 |
| 3 | 9 |
| 4 | 11 |
| 5 | 10 |
| 6 | 10 |
You want to test whether the die is fair (uniform distribution).
-
Hypotheses
- H₀: Each face has probability 1/6.
- H₁: At least one face deviates from 1/6.
-
α = 0.05.
-
Expected Frequencies Total rolls N = 60. Expected for each face: (E_i = 60 \times \frac{1}{6} = 10).
-
Assumption Check
All (E_i = 10 \ge 5); condition satisfied. -
Compute χ² [ \begin{aligned} \chi^2 &= \frac{(8-10)^2}{10} + \frac{(12-10)^2}{10} + \frac{(9-10)^2}{10} \ &\quad + \frac{(11-10)^2}{10} + \frac{(10-10)^2}{10} + \frac{(10-10)^2}{10} \ &= \frac{4}{10} + \frac{4}{10} + \frac{1}{10} + \frac{1}{10} + 0 + 0 \ &= 0.4 + 0.4 + 0.1 + 0.1 = 1.0 \end{aligned} ]
-
Degrees of Freedom
df = 6 – 1 = 5. -
Critical Value
From χ² table, χ²₀.₀₅,₅ = 11.07 -
Decision
Since our calculated χ² (1.0) is less than the critical value (11.07), we fail to reject H₀. -
Interpretation
There is not enough evidence to conclude that the die is biased. The observed frequencies are reasonably close to what we would expect from a fair die.
Beyond the Basics: Considerations and Extensions
While the goodness-of-fit test is a powerful tool, several considerations are crucial for accurate application and interpretation. Firstly, the test is sensitive to sample size. With very large sample sizes, even small deviations from the expected distribution can lead to statistically significant results, even if those deviations are practically unimportant. Conversely, small sample sizes may lack the power to detect meaningful differences.
Secondly, the choice of expected distribution is paramount. The test assesses how well the observed data fit the specified expected distribution. If the hypothesized distribution is incorrect, the test results will be misleading. Careful consideration should be given to the theoretical basis for the expected distribution.
Furthermore, the goodness-of-fit test doesn’t tell why a distribution is a poor fit. It only indicates that a discrepancy exists. Further investigation, potentially involving other statistical tests or domain expertise, is often necessary to understand the nature of the deviation. For example, if testing for a normal distribution, a Q-Q plot can visually reveal departures from normality.
Extensions of the goodness-of-fit test exist to address more complex scenarios. The Pearson’s chi-squared test, used here, assumes that expected values are sufficiently large. For data with small expected values, Yates’ correction for continuity can be applied, although its use is debated. Likelihood ratio tests offer an alternative approach, particularly when dealing with nested models (where one model is a special case of another). Kolmogorov-Smirnov tests are also available, particularly useful for comparing an observed distribution to a known continuous distribution.
Software Implementation
Performing a goodness-of-fit test manually, as demonstrated in the example, can be tedious and prone to error, especially with larger datasets. Statistical software packages like R, Python (with libraries like SciPy), SPSS, and SAS provide built-in functions to automate the process. These functions typically require the observed frequencies and the expected frequencies (or the hypothesized distribution) as input and output the χ² statistic, degrees of freedom, p-value, and often a visual representation of the results. Using software ensures accuracy and efficiency, allowing researchers to focus on interpreting the findings rather than performing laborious calculations.
In conclusion, the chi-squared goodness-of-fit test is a versatile and widely used statistical method for assessing the compatibility of observed data with a hypothesized distribution. By carefully considering the assumptions, interpreting the results in context, and leveraging available software tools, researchers can effectively utilize this test to gain valuable insights from their data and draw meaningful conclusions. However, it’s crucial to remember that statistical significance does not always equate to practical significance, and a thorough understanding of the underlying data and research question is essential for responsible interpretation.