DigitHelm

Chi-Square Calculator

Perform chi-square goodness-of-fit and independence tests with p-value, critical value, effect size (Cramér's V, Cohen's w), Yates' correction, standardised residuals, and chi-square distribution curve. Supports 2–5×2–5 contingency tables with real-world presets.

Test Type

Significance Level

Quick Presets

Category Label
Observed (O)
Expected (E)

What Is the Chi-Square Calculator?

The chi-square (χ²) test is one of the most widely used non-parametric statistical tests for categorical data. It answers a deceptively simple question: are the frequencies you actually observed consistent with what you expected?

Karl Pearson introduced the chi-square goodness-of-fit test in 1900, one of the founding moments of modern mathematical statistics, by testing whether Weldon's dice data were consistent with a fair die. The same formula remains the workhorse of categorical data analysis in genetics, medicine, social science, and quality control today.

This calculator supports two test types. The goodness-of-fit test compares a single set of observed counts against theoretically expected counts. The independence test determines whether two categorical variables in a contingency table are associated or independent of each other.

Formula

Chi-Square Statistic

χ² = Σ (O − E)² / E
where:
O = observed frequency in each category
E = expected frequency under the null hypothesis
Σ = sum over all categories
χ² = chi-square statistic (always non-negative)

Degrees of Freedom

Goodness-of-Fit: df = k − 1 (k = number of categories)
Independence: df = (r − 1)(c − 1) (r = rows, c = columns)

Expected Cell Frequencies, Independence Test

E_ij = (row_i total × col_j total) / n
where n = grand total of all cells

Effect Size Measures

Cohen's w = √(χ² / n) (goodness-of-fit)
Cramér's V = √(χ² / (n × min(r−1, c−1))) (independence)

Yates' Continuity Correction (df = 1 only)

χ²_Yates = Σ (|O − E| − 0.5)² / E
Applied when df = 1 to improve accuracy with small samples

Critical Values (upper tail, common α levels)

dfα = 0.10α = 0.05α = 0.01α = 0.001
12.7063.8416.63510.828
24.6055.9919.21013.816
36.2517.81511.34516.266
47.7799.48813.27718.467
59.23611.07015.08620.515
610.64512.59216.81222.458
813.36215.50720.09026.125
1015.98718.30723.20929.588
1522.30724.99630.57837.697
2028.41231.41037.56645.315

How to Use

  1. 1
    Choose test type: Select Goodness of Fit (one distribution vs. expected) or Independence (two categorical variables in a contingency table).
  2. 2
    Set significance level: Choose α = 0.10, 0.05 (most common), 0.01, or 0.001. This controls how strong the evidence must be before you reject the null hypothesis.
  3. 3
    Enter your data: For GOF: fill in category labels, observed counts, and expected counts (or enable auto-equal). For Independence: set table dimensions, enter row/column labels, and fill in observed cell counts.
  4. 4
    Try a preset: Load a classic example, Fair Die, Mendel's Peas, Smoking × Cancer, or Party × Region, to see how the calculator works with real data.
  5. 5
    Calculate: Press Enter or click Calculate. Read the χ² statistic, p-value, critical value, decision, per-category contributions, chi-square distribution curve, and effect size.

Example Calculation

A die is rolled 60 times. Each face should appear exactly 10 times if the die is fair. The observed counts are:

Observed: 8, 9, 12, 11, 10, 10
Expected: 10, 10, 10, 10, 10, 10
Step 1, Compute contributions:
Face 1: (8−10)²/10 = 4/10 = 0.400
Face 2: (9−10)²/10 = 1/10 = 0.100
Face 3: (12−10)²/10 = 4/10 = 0.400
Face 4: (11−10)²/10 = 1/10 = 0.100
Face 5: (10−10)²/10 = 0/10 = 0.000
Face 6: (10−10)²/10 = 0/10 = 0.000
Step 2, Sum contributions:
χ² = 0.400 + 0.100 + 0.400 + 0.100 + 0 + 0 = 1.000
Step 3, Degrees of freedom:
df = 6 − 1 = 5
Step 4, Compare to critical value at α = 0.05:
χ²_crit(5, 0.05) = 11.070
1.000 < 11.070 → Fail to reject H₀ (die appears fair)
p-value ≈ 0.9626 (very weak evidence against fair die)

Why this result makes sense

A χ² of 1.0 with 5 df is extremely low, it falls well below even the α = 0.10 critical value of 9.236. The variation you see (faces 3 and 4 appearing slightly more often) is entirely within normal random fluctuation for 60 rolls. You would need χ² > 11.07 to call the die unfair at the 5% level.

Understanding Chi-Square

Goodness-of-Fit vs Independence, Which Test Do You Need?

Both tests use the same χ² formula, but they answer different questions and have different data structures:

  • Goodness-of-Fit (GOF): you have a single categorical variable and want to know whether its observed distribution matches a theoretical distribution (fair die, Mendelian ratios, uniform distribution, etc.). You need one row of observed counts and one row of expected counts.
  • Independence test: you have two categorical variables and want to know whether they are associated or independent. You arrange counts in an r×c contingency table. Expected values are computed automatically from row and column totals.
  • The independence test can be thought of as testing whether knowing one variable changes your prediction about the other, for example, whether knowing a patient smoked changes the probability they developed lung cancer.
  • Both tests share the same decision rule: if χ² exceeds the critical value at your chosen significance level (or equivalently, if p < α), reject the null hypothesis.

Assumptions, When Can You Use Chi-Square?

  • Count data: the entries in your table must be frequencies (counts of observations), not percentages, proportions, means, or other transformed values.
  • Independent observations: each subject or item contributes to exactly one cell. Repeated-measures data or matched pairs violate this assumption.
  • Expected cell frequency ≥ 5: this is the most commonly cited rule of thumb. When expected values fall below 5, the chi-square approximation becomes unreliable. The calculator flags these cells. Solutions include combining categories, collecting more data, or using Fisher's exact test.
  • Mutually exclusive, exhaustive categories: every observation must belong to exactly one category, and the categories must cover all possibilities.
  • Random sampling: the data should represent a random sample from the population of interest.

The rule of 5, when it can be relaxed

Some statisticians argue the rule of "expected ≥ 5" is overly conservative. With larger tables, it is generally acceptable if most cells have E ≥ 5 and none falls below 1. For 2×2 tables with small samples, Yates' correction (applied automatically here) reduces the test statistic and improves accuracy. Fisher's exact test remains the gold standard when any expected count is below 5.

Effect Size, What Significance Alone Doesn't Tell You

A statistically significant result only tells you that the observed pattern is unlikely under H₀, it says nothing about the magnitude of the difference. With large samples, even trivial differences become significant. Always report an effect size alongside the p-value:

Cohen's w / Cramér's VEffect sizeInterpretation
w < 0.10 / V < 0.10NegligibleThe observed frequencies differ from expected by a trivial amount
0.10 ≤ w < 0.30SmallDetectable difference; real but of limited practical importance
0.30 ≤ w < 0.50MediumModerate association; likely meaningful in most applied contexts
w ≥ 0.50LargeSubstantial difference or association; practically important
  • Cohen's w (for GOF): √(χ²/n). Benchmarks: 0.10 = small, 0.30 = medium, 0.50 = large. Independent of table dimensions.
  • Cramér's V (for independence): √(χ²/(n × min(r−1, c−1))). Ranges from 0 (no association) to 1 (perfect association). Comparable across tables of different sizes.
  • A study with n = 10,000 might yield χ² = 20 and p < 0.001 yet have Cohen's w = 0.045, a negligible effect that would be meaningless in practice.

Standardised Residuals, Finding What's Driving the Difference

Once you know the overall test is significant, the next question is: which categories are responsible? Standardised residuals answer this:

Standardised residual = (O − E) / √E
|residual| > 2 → noteworthy (highlighted in orange)
|residual| > 3 → strong evidence of local departure from H₀ (highlighted in red)

Positive residuals (O > E) indicate more observations than expected; negative residuals (O < E) indicate fewer. The calculator displays residuals for every category and highlights cells where the standardised residual exceeds ±2.

Real-World Applications of Chi-Square Testing

FieldTypical use caseTest type
GeneticsObserved vs. Mendelian ratios (e.g. 3:1, 9:3:3:1)Goodness-of-Fit
Market researchCustomer preference vs. market share targetsGoodness-of-Fit
Quality controlDefect rates vs. specification limitsGoodness-of-Fit
EpidemiologyDisease exposure vs. health outcome (2×2 or r×c tables)Independence
Social scienceSurvey responses vs. demographicsIndependence
FinanceReturn categories vs. market regimeIndependence
EducationGrade distribution vs. expected distributionGoodness-of-Fit
Clinical trialsTreatment outcome vs. control groupIndependence

When to Use Fisher's Exact Test Instead

  • Any expected cell frequency falls below 5 (especially in 2×2 tables).
  • The sample is small (total n < 20).
  • The data arise from a design with fixed marginal totals (rare in practice).
  • Fisher's exact test computes the exact probability of the observed table (and all more extreme tables) under H₀, bypassing the chi-square approximation entirely.
  • For larger tables (> 2×2) with small samples, simulation-based methods or exact permutation tests are preferred.

Frequently Asked Questions

What is the null hypothesis in a chi-square test?

  • In a goodness-of-fit test, H₀ states that the observed frequencies follow the specified distribution, i.e., the observed and expected frequencies are not significantly different.
  • In an independence test, H₀ states that the two categorical variables are independent, knowing the value of one gives no information about the other.
  • The alternative hypothesis H₁ is simply that H₀ is false: the distribution differs (GOF) or the variables are associated (independence).
  • Rejecting H₀ does not tell you how or why the distributions differ, only that they do, beyond what chance alone would produce.

What are the assumptions of the chi-square test?

  • Data must be counts (frequencies), not percentages, averages, or proportions.
  • Observations must be independent, each subject contributes to exactly one category or cell.
  • Expected cell frequencies should generally be ≥ 5 (rule of thumb).
  • Categories must be mutually exclusive and exhaustive, every observation belongs to exactly one category.
  • Data should be a random sample from the population of interest.
  • Violating these assumptions, especially the expected frequency rule, inflates the Type I error rate.

What does the p-value mean in a chi-square test?

  • The p-value is the probability of observing a χ² statistic at least as large as the one computed, assuming H₀ is true.
  • A small p-value (e.g. p < 0.05) means the observed data would be very unlikely if H₀ were true, evidence against H₀.
  • A large p-value does not prove H₀ is true; it only means the data are consistent with H₀.
  • The p-value does not measure the size or practical importance of the effect, that requires an effect size measure like Cohen's w or Cramér's V.
  • Common threshold: p < 0.05 is "statistically significant" by convention, but this is an arbitrary boundary.

What is Cramér's V, and how do I interpret it?

  • Cramér's V measures the strength of association between two categorical variables after a significant independence test.
  • Formula: V = √(χ² / (n × min(r−1, c−1))), where r = rows and c = columns.
  • V ranges from 0 (no association) to 1 (perfect association).
  • Cohen's benchmarks: V < 0.10 = negligible, 0.10–0.29 = small, 0.30–0.49 = medium, ≥ 0.50 = large.
  • Unlike χ², Cramér's V is not affected by sample size, a large n can make a tiny V statistically significant.
  • Always report Cramér's V alongside the p-value so readers can judge practical importance.

Why must expected frequencies be at least 5?

  • The chi-square test relies on the chi-square distribution as an approximation of the exact discrete distribution of the test statistic.
  • This approximation works well when expected counts are large, but breaks down when cells have small expected values.
  • With small expected values, the test statistic behaves erratically and the p-value is unreliable, typically the true Type I error rate exceeds α.
  • The ≥ 5 rule is a rough guideline. Many statisticians accept ≥ 1 for most cells if no cell is below 1.
  • Solutions for small expected counts: combine adjacent categories, collect more data, or switch to Fisher's exact test.

What is the chi-square test of independence?

  • The independence test examines whether two categorical variables are related or independent using a contingency table.
  • H₀: the variables are independent (knowing one gives no information about the other).
  • H₁: the variables are associated, the distribution of one changes depending on the other.
  • Example: does smoking status (smoker/non-smoker) affect the probability of lung cancer (yes/no)?
  • Expected cell frequencies are computed as E_ij = (row_i total × col_j total) / grand total, under the assumption of independence.
  • A significant result tells you the variables are associated, but not the nature or direction of the relationship.

What is Yates' continuity correction, and when should I use it?

  • Yates' correction is applied only when df = 1 (a 2×2 independence table or a GOF test with 2 categories).
  • It adjusts the formula to: χ²_Yates = Σ (|O − E| − 0.5)² / E, subtracting 0.5 from each |O − E| term.
  • The correction reduces the test statistic slightly, producing a more conservative (higher) p-value.
  • It was originally proposed to improve the chi-square approximation for small samples.
  • Modern statisticians disagree on its necessity, some argue it overcorrects and reduces statistical power.
  • The calculator displays both the standard χ² and the Yates-corrected value for df = 1, so you can compare both.

How do I report chi-square results in a paper or report?

  • APA format: χ²(df, N = n) = χ²_value, p = p-value.
  • Example: χ²(5, N = 60) = 1.00, p = .963, this is the fair die example.
  • For independence tests with significant results, report Cramér's V as the effect size.
  • For GOF tests, report Cohen's w.
  • Always state the significance level (α) used.
  • If any expected cells were below 5, note this as a limitation and report whether Yates' correction or Fisher's exact test was applied.
  • Include the actual observed and expected frequencies in a table for transparency.

Related Calculators