Chi-Square Calculator

Perform chi-square goodness-of-fit and independence tests with p-value, critical value, effect size (Cramér's V, Cohen's w), Yates' correction, standardised residuals, and chi-square distribution curve. Supports 2–5×2–5 contingency tables with real-world presets.

Test Type

Significance Level

Quick Presets

Auto-equal expected frequencies

Category Label

Observed (O)

Expected (E)

Enter calculate · Esc reset

What Is the Chi-Square Calculator?

The chi-square (χ²) test is one of the most widely used non-parametric statistical tests for categorical data. It answers a deceptively simple question: are the frequencies you actually observed consistent with what you expected?

Karl Pearson introduced the chi-square goodness-of-fit test in 1900, one of the founding moments of modern mathematical statistics, by testing whether Weldon's dice data were consistent with a fair die. The same formula remains the workhorse of categorical data analysis in genetics, medicine, social science, and quality control today.

This calculator supports two test types. The goodness-of-fit test compares a single set of observed counts against theoretically expected counts. The independence test determines whether two categorical variables in a contingency table are associated or independent of each other.

Formula

Chi-Square Statistic

χ² = Σ (O − E)² / E

where:

O = observed frequency in each category

E = expected frequency under the null hypothesis

Σ = sum over all categories

χ² = chi-square statistic (always non-negative)

Degrees of Freedom

Goodness-of-Fit: df = k − 1 (k = number of categories)

Independence: df = (r − 1)(c − 1) (r = rows, c = columns)

Expected Cell Frequencies, Independence Test

E_ij = (row_i total × col_j total) / n

where n = grand total of all cells

Effect Size Measures

Cohen's w = √(χ² / n) (goodness-of-fit)

Cramér's V = √(χ² / (n × min(r−1, c−1))) (independence)

Yates' Continuity Correction (df = 1 only)

χ²_Yates = Σ (|O − E| − 0.5)² / E

Applied when df = 1 to improve accuracy with small samples

Critical Values (upper tail, common α levels)

df	α = 0.10	α = 0.05	α = 0.01	α = 0.001
1	2.706	3.841	6.635	10.828
2	4.605	5.991	9.210	13.816
3	6.251	7.815	11.345	16.266
4	7.779	9.488	13.277	18.467
5	9.236	11.070	15.086	20.515
6	10.645	12.592	16.812	22.458
8	13.362	15.507	20.090	26.125
10	15.987	18.307	23.209	29.588
15	22.307	24.996	30.578	37.697
20	28.412	31.410	37.566	45.315

How to Use

1
Choose test type: Select Goodness of Fit (one distribution vs. expected) or Independence (two categorical variables in a contingency table).
2
Set significance level: Choose α = 0.10, 0.05 (most common), 0.01, or 0.001. This controls how strong the evidence must be before you reject the null hypothesis.
3
Enter your data: For GOF: fill in category labels, observed counts, and expected counts (or enable auto-equal). For Independence: set table dimensions, enter row/column labels, and fill in observed cell counts.
4
Try a preset: Load a classic example, Fair Die, Mendel's Peas, Smoking × Cancer, or Party × Region, to see how the calculator works with real data.
5
Calculate: Press Enter or click Calculate. Read the χ² statistic, p-value, critical value, decision, per-category contributions, chi-square distribution curve, and effect size.

Example Calculation

A die is rolled 60 times. Each face should appear exactly 10 times if the die is fair. The observed counts are:

Observed: 8, 9, 12, 11, 10, 10

Expected: 10, 10, 10, 10, 10, 10

Step 1, Compute contributions:

Face 1: (8−10)²/10 = 4/10 = 0.400

Face 2: (9−10)²/10 = 1/10 = 0.100

Face 3: (12−10)²/10 = 4/10 = 0.400

Face 4: (11−10)²/10 = 1/10 = 0.100

Face 5: (10−10)²/10 = 0/10 = 0.000

Face 6: (10−10)²/10 = 0/10 = 0.000

Step 2, Sum contributions:

χ² = 0.400 + 0.100 + 0.400 + 0.100 + 0 + 0 = 1.000

Step 3, Degrees of freedom:

df = 6 − 1 = 5

Step 4, Compare to critical value at α = 0.05:

χ²_crit(5, 0.05) = 11.070

1.000 < 11.070 → Fail to reject H₀ (die appears fair)

p-value ≈ 0.9626 (very weak evidence against fair die)

Why this result makes sense

A χ² of 1.0 with 5 df is extremely low, it falls well below even the α = 0.10 critical value of 9.236. The variation you see (faces 3 and 4 appearing slightly more often) is entirely within normal random fluctuation for 60 rolls. You would need χ² > 11.07 to call the die unfair at the 5% level.

Understanding Chi-Square

Goodness-of-Fit vs Independence, Which Test Do You Need?

Both tests use the same χ² formula, but they answer different questions and have different data structures:

›Goodness-of-Fit (GOF): you have a single categorical variable and want to know whether its observed distribution matches a theoretical distribution (fair die, Mendelian ratios, uniform distribution, etc.). You need one row of observed counts and one row of expected counts.
›Independence test: you have two categorical variables and want to know whether they are associated or independent. You arrange counts in an r×c contingency table. Expected values are computed automatically from row and column totals.
›The independence test can be thought of as testing whether knowing one variable changes your prediction about the other, for example, whether knowing a patient smoked changes the probability they developed lung cancer.
›Both tests share the same decision rule: if χ² exceeds the critical value at your chosen significance level (or equivalently, if p < α), reject the null hypothesis.

Assumptions, When Can You Use Chi-Square?

›Count data: the entries in your table must be frequencies (counts of observations), not percentages, proportions, means, or other transformed values.
›Independent observations: each subject or item contributes to exactly one cell. Repeated-measures data or matched pairs violate this assumption.
›Expected cell frequency ≥ 5: this is the most commonly cited rule of thumb. When expected values fall below 5, the chi-square approximation becomes unreliable. The calculator flags these cells. Solutions include combining categories, collecting more data, or using Fisher's exact test.
›Mutually exclusive, exhaustive categories: every observation must belong to exactly one category, and the categories must cover all possibilities.
›Random sampling: the data should represent a random sample from the population of interest.

The rule of 5, when it can be relaxed

Some statisticians argue the rule of "expected ≥ 5" is overly conservative. With larger tables, it is generally acceptable if most cells have E ≥ 5 and none falls below 1. For 2×2 tables with small samples, Yates' correction (applied automatically here) reduces the test statistic and improves accuracy. Fisher's exact test remains the gold standard when any expected count is below 5.

Effect Size, What Significance Alone Doesn't Tell You

A statistically significant result only tells you that the observed pattern is unlikely under H₀, it says nothing about the magnitude of the difference. With large samples, even trivial differences become significant. Always report an effect size alongside the p-value:

Cohen's w / Cramér's V	Effect size	Interpretation
w < 0.10 / V < 0.10	Negligible	The observed frequencies differ from expected by a trivial amount
0.10 ≤ w < 0.30	Small	Detectable difference; real but of limited practical importance
0.30 ≤ w < 0.50	Medium	Moderate association; likely meaningful in most applied contexts
w ≥ 0.50	Large	Substantial difference or association; practically important

›Cohen's w (for GOF): √(χ²/n). Benchmarks: 0.10 = small, 0.30 = medium, 0.50 = large. Independent of table dimensions.
›Cramér's V (for independence): √(χ²/(n × min(r−1, c−1))). Ranges from 0 (no association) to 1 (perfect association). Comparable across tables of different sizes.
›A study with n = 10,000 might yield χ² = 20 and p < 0.001 yet have Cohen's w = 0.045, a negligible effect that would be meaningless in practice.

Standardised Residuals, Finding What's Driving the Difference

Once you know the overall test is significant, the next question is: which categories are responsible? Standardised residuals answer this:

Standardised residual = (O − E) / √E

|residual| > 2 → noteworthy (highlighted in orange)

|residual| > 3 → strong evidence of local departure from H₀ (highlighted in red)

Positive residuals (O > E) indicate more observations than expected; negative residuals (O < E) indicate fewer. The calculator displays residuals for every category and highlights cells where the standardised residual exceeds ±2.

Real-World Applications of Chi-Square Testing

Field	Typical use case	Test type
Genetics	Observed vs. Mendelian ratios (e.g. 3:1, 9:3:3:1)	Goodness-of-Fit
Market research	Customer preference vs. market share targets	Goodness-of-Fit
Quality control	Defect rates vs. specification limits	Goodness-of-Fit
Epidemiology	Disease exposure vs. health outcome (2×2 or r×c tables)	Independence
Social science	Survey responses vs. demographics	Independence
Finance	Return categories vs. market regime	Independence
Education	Grade distribution vs. expected distribution	Goodness-of-Fit
Clinical trials	Treatment outcome vs. control group	Independence

When to Use Fisher's Exact Test Instead

›Any expected cell frequency falls below 5 (especially in 2×2 tables).
›The sample is small (total n < 20).
›The data arise from a design with fixed marginal totals (rare in practice).
›Fisher's exact test computes the exact probability of the observed table (and all more extreme tables) under H₀, bypassing the chi-square approximation entirely.
›For larger tables (> 2×2) with small samples, simulation-based methods or exact permutation tests are preferred.

Frequently Asked Questions

What is the null hypothesis in a chi-square test?

›In a goodness-of-fit test, H₀ states that the observed frequencies follow the specified distribution, i.e., the observed and expected frequencies are not significantly different.
›In an independence test, H₀ states that the two categorical variables are independent, knowing the value of one gives no information about the other.
›The alternative hypothesis H₁ is simply that H₀ is false: the distribution differs (GOF) or the variables are associated (independence).
›Rejecting H₀ does not tell you how or why the distributions differ, only that they do, beyond what chance alone would produce.

What are the assumptions of the chi-square test?

›Data must be counts (frequencies), not percentages, averages, or proportions.
›Observations must be independent, each subject contributes to exactly one category or cell.
›Expected cell frequencies should generally be ≥ 5 (rule of thumb).
›Categories must be mutually exclusive and exhaustive, every observation belongs to exactly one category.
›Data should be a random sample from the population of interest.
›Violating these assumptions, especially the expected frequency rule, inflates the Type I error rate.

What does the p-value mean in a chi-square test?

›The p-value is the probability of observing a χ² statistic at least as large as the one computed, assuming H₀ is true.
›A small p-value (e.g. p < 0.05) means the observed data would be very unlikely if H₀ were true, evidence against H₀.
›A large p-value does not prove H₀ is true; it only means the data are consistent with H₀.
›The p-value does not measure the size or practical importance of the effect, that requires an effect size measure like Cohen's w or Cramér's V.
›Common threshold: p < 0.05 is "statistically significant" by convention, but this is an arbitrary boundary.

What is Cramér's V, and how do I interpret it?

›Cramér's V measures the strength of association between two categorical variables after a significant independence test.
›Formula: V = √(χ² / (n × min(r−1, c−1))), where r = rows and c = columns.
›V ranges from 0 (no association) to 1 (perfect association).
›Cohen's benchmarks: V < 0.10 = negligible, 0.10–0.29 = small, 0.30–0.49 = medium, ≥ 0.50 = large.
›Unlike χ², Cramér's V is not affected by sample size, a large n can make a tiny V statistically significant.
›Always report Cramér's V alongside the p-value so readers can judge practical importance.

Why must expected frequencies be at least 5?

›The chi-square test relies on the chi-square distribution as an approximation of the exact discrete distribution of the test statistic.
›This approximation works well when expected counts are large, but breaks down when cells have small expected values.
›With small expected values, the test statistic behaves erratically and the p-value is unreliable, typically the true Type I error rate exceeds α.
›The ≥ 5 rule is a rough guideline. Many statisticians accept ≥ 1 for most cells if no cell is below 1.
›Solutions for small expected counts: combine adjacent categories, collect more data, or switch to Fisher's exact test.

What is the chi-square test of independence?

›The independence test examines whether two categorical variables are related or independent using a contingency table.
›H₀: the variables are independent (knowing one gives no information about the other).
›H₁: the variables are associated, the distribution of one changes depending on the other.
›Example: does smoking status (smoker/non-smoker) affect the probability of lung cancer (yes/no)?
›Expected cell frequencies are computed as E_ij = (row_i total × col_j total) / grand total, under the assumption of independence.
›A significant result tells you the variables are associated, but not the nature or direction of the relationship.

What is Yates' continuity correction, and when should I use it?

›Yates' correction is applied only when df = 1 (a 2×2 independence table or a GOF test with 2 categories).
›It adjusts the formula to: χ²_Yates = Σ (|O − E| − 0.5)² / E, subtracting 0.5 from each |O − E| term.
›The correction reduces the test statistic slightly, producing a more conservative (higher) p-value.
›It was originally proposed to improve the chi-square approximation for small samples.
›Modern statisticians disagree on its necessity, some argue it overcorrects and reduces statistical power.
›The calculator displays both the standard χ² and the Yates-corrected value for df = 1, so you can compare both.

How do I report chi-square results in a paper or report?

›APA format: χ²(df, N = n) = χ²_value, p = p-value.
›Example: χ²(5, N = 60) = 1.00, p = .963, this is the fair die example.
›For independence tests with significant results, report Cramér's V as the effect size.
›For GOF tests, report Cohen's w.
›Always state the significance level (α) used.
›If any expected cells were below 5, note this as a limitation and report whether Yates' correction or Fisher's exact test was applied.
›Include the actual observed and expected frequencies in a table for transparency.

Related Calculators

Binomial Distribution Calculator

Calculate binomial probability P(X=k), CDF, mean, variance, and distribution statistics for any n and p. Includes bar chart and full probability table.

Z-Score Calculator | Standard Score & Percentile

Calculate z-score, percentile rank, and p-values from any data point, mean, and standard deviation. Includes reverse calculation and step-by-step solutions.

Probability Calculator | Events

Calculate single event, multiple event, conditional, and complementary probabilities.

Standard Deviation Calculator

Calculate standard deviation, variance, and other statistics for a dataset.

Confidence Interval Calculator | Mean & Proportion

Calculate confidence intervals for means (z and t-test) and proportions (Wald and Wilson). Shows margin of error, critical value, SE, step-by-step working, and a visual bell curve diagram.