Chi-Square Calculator
Perform chi-square goodness-of-fit and independence tests with p-value, critical value, effect size (Cramér's V, Cohen's w), Yates' correction, standardised residuals, and chi-square distribution curve. Supports 2–5×2–5 contingency tables with real-world presets.
Test Type
Significance Level
Quick Presets
What Is the Chi-Square Calculator?
The chi-square (χ²) test is one of the most widely used non-parametric statistical tests for categorical data. It answers a deceptively simple question: are the frequencies you actually observed consistent with what you expected?
Karl Pearson introduced the chi-square goodness-of-fit test in 1900, one of the founding moments of modern mathematical statistics, by testing whether Weldon's dice data were consistent with a fair die. The same formula remains the workhorse of categorical data analysis in genetics, medicine, social science, and quality control today.
This calculator supports two test types. The goodness-of-fit test compares a single set of observed counts against theoretically expected counts. The independence test determines whether two categorical variables in a contingency table are associated or independent of each other.
Formula
Chi-Square Statistic
Degrees of Freedom
Expected Cell Frequencies, Independence Test
Effect Size Measures
Yates' Continuity Correction (df = 1 only)
Critical Values (upper tail, common α levels)
| df | α = 0.10 | α = 0.05 | α = 0.01 | α = 0.001 |
|---|---|---|---|---|
| 1 | 2.706 | 3.841 | 6.635 | 10.828 |
| 2 | 4.605 | 5.991 | 9.210 | 13.816 |
| 3 | 6.251 | 7.815 | 11.345 | 16.266 |
| 4 | 7.779 | 9.488 | 13.277 | 18.467 |
| 5 | 9.236 | 11.070 | 15.086 | 20.515 |
| 6 | 10.645 | 12.592 | 16.812 | 22.458 |
| 8 | 13.362 | 15.507 | 20.090 | 26.125 |
| 10 | 15.987 | 18.307 | 23.209 | 29.588 |
| 15 | 22.307 | 24.996 | 30.578 | 37.697 |
| 20 | 28.412 | 31.410 | 37.566 | 45.315 |
How to Use
- 1Choose test type: Select Goodness of Fit (one distribution vs. expected) or Independence (two categorical variables in a contingency table).
- 2Set significance level: Choose α = 0.10, 0.05 (most common), 0.01, or 0.001. This controls how strong the evidence must be before you reject the null hypothesis.
- 3Enter your data: For GOF: fill in category labels, observed counts, and expected counts (or enable auto-equal). For Independence: set table dimensions, enter row/column labels, and fill in observed cell counts.
- 4Try a preset: Load a classic example, Fair Die, Mendel's Peas, Smoking × Cancer, or Party × Region, to see how the calculator works with real data.
- 5Calculate: Press Enter or click Calculate. Read the χ² statistic, p-value, critical value, decision, per-category contributions, chi-square distribution curve, and effect size.
Example Calculation
A die is rolled 60 times. Each face should appear exactly 10 times if the die is fair. The observed counts are:
Why this result makes sense
Understanding Chi-Square
Goodness-of-Fit vs Independence, Which Test Do You Need?
Both tests use the same χ² formula, but they answer different questions and have different data structures:
- ›Goodness-of-Fit (GOF): you have a single categorical variable and want to know whether its observed distribution matches a theoretical distribution (fair die, Mendelian ratios, uniform distribution, etc.). You need one row of observed counts and one row of expected counts.
- ›Independence test: you have two categorical variables and want to know whether they are associated or independent. You arrange counts in an r×c contingency table. Expected values are computed automatically from row and column totals.
- ›The independence test can be thought of as testing whether knowing one variable changes your prediction about the other, for example, whether knowing a patient smoked changes the probability they developed lung cancer.
- ›Both tests share the same decision rule: if χ² exceeds the critical value at your chosen significance level (or equivalently, if p < α), reject the null hypothesis.
Assumptions, When Can You Use Chi-Square?
- ›Count data: the entries in your table must be frequencies (counts of observations), not percentages, proportions, means, or other transformed values.
- ›Independent observations: each subject or item contributes to exactly one cell. Repeated-measures data or matched pairs violate this assumption.
- ›Expected cell frequency ≥ 5: this is the most commonly cited rule of thumb. When expected values fall below 5, the chi-square approximation becomes unreliable. The calculator flags these cells. Solutions include combining categories, collecting more data, or using Fisher's exact test.
- ›Mutually exclusive, exhaustive categories: every observation must belong to exactly one category, and the categories must cover all possibilities.
- ›Random sampling: the data should represent a random sample from the population of interest.
The rule of 5, when it can be relaxed
Effect Size, What Significance Alone Doesn't Tell You
A statistically significant result only tells you that the observed pattern is unlikely under H₀, it says nothing about the magnitude of the difference. With large samples, even trivial differences become significant. Always report an effect size alongside the p-value:
| Cohen's w / Cramér's V | Effect size | Interpretation |
|---|---|---|
| w < 0.10 / V < 0.10 | Negligible | The observed frequencies differ from expected by a trivial amount |
| 0.10 ≤ w < 0.30 | Small | Detectable difference; real but of limited practical importance |
| 0.30 ≤ w < 0.50 | Medium | Moderate association; likely meaningful in most applied contexts |
| w ≥ 0.50 | Large | Substantial difference or association; practically important |
- ›Cohen's w (for GOF): √(χ²/n). Benchmarks: 0.10 = small, 0.30 = medium, 0.50 = large. Independent of table dimensions.
- ›Cramér's V (for independence): √(χ²/(n × min(r−1, c−1))). Ranges from 0 (no association) to 1 (perfect association). Comparable across tables of different sizes.
- ›A study with n = 10,000 might yield χ² = 20 and p < 0.001 yet have Cohen's w = 0.045, a negligible effect that would be meaningless in practice.
Standardised Residuals, Finding What's Driving the Difference
Once you know the overall test is significant, the next question is: which categories are responsible? Standardised residuals answer this:
Positive residuals (O > E) indicate more observations than expected; negative residuals (O < E) indicate fewer. The calculator displays residuals for every category and highlights cells where the standardised residual exceeds ±2.
Real-World Applications of Chi-Square Testing
| Field | Typical use case | Test type |
|---|---|---|
| Genetics | Observed vs. Mendelian ratios (e.g. 3:1, 9:3:3:1) | Goodness-of-Fit |
| Market research | Customer preference vs. market share targets | Goodness-of-Fit |
| Quality control | Defect rates vs. specification limits | Goodness-of-Fit |
| Epidemiology | Disease exposure vs. health outcome (2×2 or r×c tables) | Independence |
| Social science | Survey responses vs. demographics | Independence |
| Finance | Return categories vs. market regime | Independence |
| Education | Grade distribution vs. expected distribution | Goodness-of-Fit |
| Clinical trials | Treatment outcome vs. control group | Independence |
When to Use Fisher's Exact Test Instead
- ›Any expected cell frequency falls below 5 (especially in 2×2 tables).
- ›The sample is small (total n < 20).
- ›The data arise from a design with fixed marginal totals (rare in practice).
- ›Fisher's exact test computes the exact probability of the observed table (and all more extreme tables) under H₀, bypassing the chi-square approximation entirely.
- ›For larger tables (> 2×2) with small samples, simulation-based methods or exact permutation tests are preferred.
Frequently Asked Questions
What is the null hypothesis in a chi-square test?
- ›In a goodness-of-fit test, H₀ states that the observed frequencies follow the specified distribution, i.e., the observed and expected frequencies are not significantly different.
- ›In an independence test, H₀ states that the two categorical variables are independent, knowing the value of one gives no information about the other.
- ›The alternative hypothesis H₁ is simply that H₀ is false: the distribution differs (GOF) or the variables are associated (independence).
- ›Rejecting H₀ does not tell you how or why the distributions differ, only that they do, beyond what chance alone would produce.
What are the assumptions of the chi-square test?
- ›Data must be counts (frequencies), not percentages, averages, or proportions.
- ›Observations must be independent, each subject contributes to exactly one category or cell.
- ›Expected cell frequencies should generally be ≥ 5 (rule of thumb).
- ›Categories must be mutually exclusive and exhaustive, every observation belongs to exactly one category.
- ›Data should be a random sample from the population of interest.
- ›Violating these assumptions, especially the expected frequency rule, inflates the Type I error rate.
What does the p-value mean in a chi-square test?
- ›The p-value is the probability of observing a χ² statistic at least as large as the one computed, assuming H₀ is true.
- ›A small p-value (e.g. p < 0.05) means the observed data would be very unlikely if H₀ were true, evidence against H₀.
- ›A large p-value does not prove H₀ is true; it only means the data are consistent with H₀.
- ›The p-value does not measure the size or practical importance of the effect, that requires an effect size measure like Cohen's w or Cramér's V.
- ›Common threshold: p < 0.05 is "statistically significant" by convention, but this is an arbitrary boundary.
What is Cramér's V, and how do I interpret it?
- ›Cramér's V measures the strength of association between two categorical variables after a significant independence test.
- ›Formula: V = √(χ² / (n × min(r−1, c−1))), where r = rows and c = columns.
- ›V ranges from 0 (no association) to 1 (perfect association).
- ›Cohen's benchmarks: V < 0.10 = negligible, 0.10–0.29 = small, 0.30–0.49 = medium, ≥ 0.50 = large.
- ›Unlike χ², Cramér's V is not affected by sample size, a large n can make a tiny V statistically significant.
- ›Always report Cramér's V alongside the p-value so readers can judge practical importance.
Why must expected frequencies be at least 5?
- ›The chi-square test relies on the chi-square distribution as an approximation of the exact discrete distribution of the test statistic.
- ›This approximation works well when expected counts are large, but breaks down when cells have small expected values.
- ›With small expected values, the test statistic behaves erratically and the p-value is unreliable, typically the true Type I error rate exceeds α.
- ›The ≥ 5 rule is a rough guideline. Many statisticians accept ≥ 1 for most cells if no cell is below 1.
- ›Solutions for small expected counts: combine adjacent categories, collect more data, or switch to Fisher's exact test.
What is the chi-square test of independence?
- ›The independence test examines whether two categorical variables are related or independent using a contingency table.
- ›H₀: the variables are independent (knowing one gives no information about the other).
- ›H₁: the variables are associated, the distribution of one changes depending on the other.
- ›Example: does smoking status (smoker/non-smoker) affect the probability of lung cancer (yes/no)?
- ›Expected cell frequencies are computed as E_ij = (row_i total × col_j total) / grand total, under the assumption of independence.
- ›A significant result tells you the variables are associated, but not the nature or direction of the relationship.
What is Yates' continuity correction, and when should I use it?
- ›Yates' correction is applied only when df = 1 (a 2×2 independence table or a GOF test with 2 categories).
- ›It adjusts the formula to: χ²_Yates = Σ (|O − E| − 0.5)² / E, subtracting 0.5 from each |O − E| term.
- ›The correction reduces the test statistic slightly, producing a more conservative (higher) p-value.
- ›It was originally proposed to improve the chi-square approximation for small samples.
- ›Modern statisticians disagree on its necessity, some argue it overcorrects and reduces statistical power.
- ›The calculator displays both the standard χ² and the Yates-corrected value for df = 1, so you can compare both.
How do I report chi-square results in a paper or report?
- ›APA format: χ²(df, N = n) = χ²_value, p = p-value.
- ›Example: χ²(5, N = 60) = 1.00, p = .963, this is the fair die example.
- ›For independence tests with significant results, report Cramér's V as the effect size.
- ›For GOF tests, report Cohen's w.
- ›Always state the significance level (α) used.
- ›If any expected cells were below 5, note this as a limitation and report whether Yates' correction or Fisher's exact test was applied.
- ›Include the actual observed and expected frequencies in a table for transparency.