P-Value Calculator — Z, T, Chi-Square & F
Calculate the exact p-value from a z-score, t-statistic, chi-square statistic, or F-statistic. Supports one-tailed and two-tailed tests with significance interpretation.
Quick Presets
Test Type
Tail
What Is the P-Value Calculator — Z, T, Chi-Square & F?
This p-value calculator computes the exact p-value from four common test statistics — z, t, chi-square, and F — using accurate JavaScript approximations of the underlying probability distributions. Results include the significance verdict, interpretation text, and the complete decision framework.
- ›Four distribution types — standard normal (Z), Student's t, chi-square (χ²), and F-distribution, covering the most common hypothesis tests in statistics.
- ›One-tailed and two-tailed tests — choose upper, lower, or two-tailed depending on the direction of your hypothesis.
- ›Significance thresholds — compare your p-value against α = 0.001, 0.01, 0.05, 0.10, or 0.20 with instant reject/fail-to-reject verdict.
- ›Interpretation guide — strength-of-evidence language (very strong / strong / moderate / weak) updates based on the computed p-value.
- ›Common misinterpretations flagged — the result panel reminds you what the p-value does and does not mean (it is not the probability H₀ is true).
Formula
Z-Test (Standard Normal)
Two-tailed: p = 2 · Φ(−|z|) where Φ is the standard normal CDF
One-tailed: p = Φ(−z) (upper) or p = Φ(z) (lower)
Normal PDF: f(x) = (1/√(2π)) · exp(−x²/2)
T-Test (Student's t-distribution)
p computed from t(df) distribution with df degrees of freedom
Two-tailed: p = 2 · P(T > |t|) where T ~ t(df)
Chi-Square Test
p = P(χ²(df) > χ²) — upper tail probability
Requires χ² ≥ 0 and df ≥ 1
F-Test
p = P(F(df₁, df₂) > F) — upper tail probability
Transforms to beta distribution: x = df₁·F / (df₁·F + df₂)
| Symbol | Name | Description |
|---|---|---|
| p | P-value | Probability of obtaining a test statistic ≥ observed, given H₀ is true |
| α | Significance level | Threshold for rejection; typically 0.05 — reject H₀ when p < α |
| z | Z-score | Standard normal test statistic — units of standard deviations from mean |
| t | T-statistic | Test statistic from Student's t-distribution with df degrees of freedom |
| χ² | Chi-square stat | Non-negative test statistic from the chi-square distribution |
| F | F-statistic | Ratio of variances from the F-distribution with df₁ and df₂ |
| df | Degrees of freedom | Parameter controlling distribution shape; typically n − 1 for t-test |
| H₀ | Null hypothesis | The default assumption being tested — rejected when p < α |
How to Use
- 1Select test type: Choose Z-Test, T-Test, Chi-Square, or F-Test depending on your study design and data type.
- 2Choose tail direction: Two-tailed for "is there any difference?", upper one-tailed for "is A > B?", lower one-tailed for "is A < B?".
- 3Enter the test statistic: Type your z-score, t-statistic, χ² value, or F-statistic in the input. For t, also enter degrees of freedom (df = n − 1 for one-sample t). For F, enter df₁ and df₂.
- 4Set significance level α: Select your α threshold from the dropdown (0.05 is the most common in social science; 0.01 in medical research).
- 5Press Enter or click Calculate: The p-value, verdict (Reject H₀ / Fail to reject H₀), and interpretation text appear instantly.
Example Calculation
Example 1: Z-Test, two-tailed, z = 2.58
H₀: μ = μ₀, z = 2.58, α = 0.01, two-tailed
p = 2 · Φ(−|2.58|)
= 2 · Φ(−2.58)
= 2 · 0.00494
p ≈ 0.0099 < α = 0.01 → Reject H₀
Example 2: T-Test, two-tailed, t = 2.0, df = 20
H₀: μ = μ₀, t = 2.0, df = 20, α = 0.05, two-tailed
p = 2 · P(T₂₀ > 2.0)
≈ 2 · 0.02988 = 0.0598
p ≈ 0.0598 > α = 0.05 → Fail to reject H₀
Borderline results require context
p = 0.0598 is very close to the α = 0.05 threshold. In practice, report the exact p-value and effect size — a result just above α is not evidence for H₀, and a result just below α is not strong evidence against it. The p-value is a continuous measure, not a binary verdict.
Understanding P-Value — Z, T, Chi-Square & F
What Is a P-Value?
The p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one computed from your data, assuming the null hypothesis H₀ is true. It is a conditional probability: P(data this extreme | H₀ true).
A small p-value means the observed data would be rare under H₀ — which is evidence against H₀. How small is small enough? That is determined by the pre-chosen significance level α. If p < α, we reject H₀ and call the result statistically significant.
Key intuition
The p-value measures the strength of evidence against H₀ in your data. A p-value of 0.003 means results this extreme occur only 0.3% of the time under H₀ — strong evidence against it. A p-value of 0.42 means such results occur 42% of the time under H₀ — no reason to doubt it.
P-Value vs Statistical Significance
The p-value itself is a continuous measure of evidence. The significance level α converts it into a binary decision. Researchers typically set α before collecting data:
- ›α = 0.05 — most common in social science, psychology, and ecology. One in 20 chance of false positive.
- ›α = 0.01 — used in medical and pharmaceutical research where false positives are costly.
- ›α = 0.001 — used in physics (e.g. the Higgs boson discovery required p < 3×10⁻⁷, roughly 5σ).
- ›α = 0.10 — sometimes used in exploratory research or economic studies.
Common P-Value Misinterpretations
- ›"p = 0.03 means there is a 3% chance H₀ is true." — Wrong. The p-value is not the probability that H₀ is true. It is the probability of data this extreme if H₀ were true — a fundamentally different statement.
- ›"p > 0.05 means no effect exists." — Wrong. Failing to reject H₀ does not prove H₀. Low statistical power (small sample) can produce large p-values even when a real effect exists.
- ›"A smaller p-value means a larger effect." — Wrong. P-values depend on sample size. A huge sample can produce p < 0.001 for a trivially small, practically meaningless effect.
- ›"p = 0.049 and p = 0.051 are completely different." — Wrong. The α threshold is a convention, not a sharp boundary in nature. Report the actual p-value and effect size, not just "significant/not significant."
One-Tailed vs Two-Tailed Tests
The tail choice depends on your research question before you see the data:
- ›Two-tailed — "Is there any difference between A and B?" Tests for effects in either direction. Use when you have no strong prior reason to expect the effect to go one way. The p-value is doubled compared to one-tailed.
- ›Upper one-tailed — "Is A greater than B?" Tests only for positive effects. Use when theory or prior evidence strongly predicts the direction.
- ›Lower one-tailed — "Is A less than B?" Tests only for negative effects.
Switching from two-tailed to one-tailed after seeing data (to achieve p < 0.05) is a form of p-hacking and inflates the false-positive rate. Always specify the tail before analysis.
Effect Size Beyond the P-Value
Statistical significance does not imply practical importance. A study with n = 100,000 can detect a mean difference of 0.001 units with p < 0.001 — yet such a difference may be meaningless in practice. Always report an effect size alongside the p-value:
- ›Cohen's d for comparing means: d = (μ₁ − μ₂) / σ_pooled. Thresholds: small d = 0.2, medium d = 0.5, large d = 0.8.
- ›Pearson's r for correlation: small r = 0.1, medium r = 0.3, large r = 0.5.
- ›η² (eta-squared) for ANOVA: proportion of variance explained. Small: 0.01, medium: 0.06, large: 0.14.
- ›Confidence intervals — a 95% CI around the effect size gives much more information than a binary p-value verdict.
Applications
| Field | Typical Test | Example Question |
|---|---|---|
| Medicine | Two-sample t-test | Does drug A reduce blood pressure more than drug B? |
| Social science | Z-test on proportion | Is the approval rate of policy X higher than 50%? |
| Manufacturing | Chi-square test | Is the distribution of defects independent of shift? |
| Finance | F-test | Do two investment strategies have equal variance in returns? |
| Psychology | One-sample t-test | Is the average IQ of this group different from 100? |
Frequently Asked Questions
What does p < 0.05 mean?
p < 0.05 means: if H₀ were true, data this extreme would appear less than 5% of the time.
- ›By convention (Fisher, 1925), p < 0.05 is the threshold for "statistical significance."
- ›It does NOT mean there is a 95% chance the effect is real.
- ›It does NOT mean the effect is large or practically important.
- ›The threshold α = 0.05 is a convention, not a law of nature — some fields use 0.01 or 0.001.
What is the difference between the p-value and the significance level α?
- ›α is set BEFORE the experiment — it is the maximum tolerable false-positive rate.
- ›p-value is computed AFTER the experiment from the observed data.
- ›The decision rule is: reject H₀ if p < α.
- ›Changing α after seeing data is p-hacking and inflates false-positive rates.
When should I use a one-tailed vs two-tailed test?
- ›Two-tailed: "is there any difference?" — safest default, no directional assumption.
- ›One-tailed: "is the effect in this specific direction?" — requires strong prior justification.
- ›One-tailed tests have half the p-value of two-tailed for the same statistic.
- ›The tail choice must be pre-registered before seeing data, not chosen based on results.
Does a small p-value prove the alternative hypothesis H₁ is true?
Rejecting H₀ means the data are incompatible with H₀ — not that H₁ is certainly true.
- ›Sampling bias or confounding variables can produce small p-values without a real effect.
- ›Multiple comparisons inflate false-positive rates — use Bonferroni or FDR corrections.
- ›Replication is essential — a single p < 0.05 result has a meaningful false-positive rate.
- ›Always pair the p-value with an effect size and confidence interval.
What does a p-value of exactly 0 mean?
- ›p = 0 is a display artifact — the true p-value is extremely small but positive.
- ›Common software thresholds: R reports p < 2.2e-16; this calculator reports "< 0.0001" for very small values.
- ›Report as "p < 0.001" or give the exact test statistic so readers can verify.
- ›An astronomically small p-value is very strong evidence against H₀, but still not proof.
What should I report alongside the p-value?
- ›Exact p-value — not just "significant" or "p < 0.05".
- ›Test statistic with degrees of freedom: e.g. t(29) = 2.45, p = 0.021.
- ›Effect size — Cohen's d, Pearson r, or η² tells you how large the effect is.
- ›95% confidence interval — gives a plausible range for the true effect size.
- ›Sample size — small samples have low power and noisy p-values.
Why is α = 0.05 used so commonly?
- ›Fisher (1925) popularized α = 0.05 as a practical convenience threshold.
- ›It was never meant to be a universal law — it reflects a 1-in-20 false-positive rate.
- ›Replication crisis: many "p < 0.05" results in psychology and medicine failed to replicate.
- ›Some statisticians advocate for α = 0.005 as the new threshold for "significance."
- ›Others recommend abandoning binary thresholds and simply reporting p-values with effect sizes.