P-Value Calculator — Z, T, Chi-Square & F

Calculate the exact p-value from a z-score, t-statistic, chi-square statistic, or F-statistic. Supports one-tailed and two-tailed tests with significance interpretation.

Quick Presets

Test Type

Tail

What Is the P-Value Calculator — Z, T, Chi-Square & F?

This p-value calculator computes the exact p-value from four common test statistics — z, t, chi-square, and F — using accurate JavaScript approximations of the underlying probability distributions. Results include the significance verdict, interpretation text, and the complete decision framework.

  • Four distribution types — standard normal (Z), Student's t, chi-square (χ²), and F-distribution, covering the most common hypothesis tests in statistics.
  • One-tailed and two-tailed tests — choose upper, lower, or two-tailed depending on the direction of your hypothesis.
  • Significance thresholds — compare your p-value against α = 0.001, 0.01, 0.05, 0.10, or 0.20 with instant reject/fail-to-reject verdict.
  • Interpretation guide — strength-of-evidence language (very strong / strong / moderate / weak) updates based on the computed p-value.
  • Common misinterpretations flagged — the result panel reminds you what the p-value does and does not mean (it is not the probability H₀ is true).

Formula

Z-Test (Standard Normal)

Two-tailed: p = 2 · Φ(−|z|)   where Φ is the standard normal CDF

One-tailed: p = Φ(−z) (upper) or p = Φ(z) (lower)

Normal PDF: f(x) = (1/√(2π)) · exp(−x²/2)

T-Test (Student's t-distribution)

p computed from t(df) distribution with df degrees of freedom

Two-tailed: p = 2 · P(T > |t|) where T ~ t(df)

Chi-Square Test

p = P(χ²(df) > χ²) — upper tail probability

Requires χ² ≥ 0 and df ≥ 1

F-Test

p = P(F(df₁, df₂) > F) — upper tail probability

Transforms to beta distribution: x = df₁·F / (df₁·F + df₂)

SymbolNameDescription
pP-valueProbability of obtaining a test statistic ≥ observed, given H₀ is true
αSignificance levelThreshold for rejection; typically 0.05 — reject H₀ when p < α
zZ-scoreStandard normal test statistic — units of standard deviations from mean
tT-statisticTest statistic from Student's t-distribution with df degrees of freedom
χ²Chi-square statNon-negative test statistic from the chi-square distribution
FF-statisticRatio of variances from the F-distribution with df₁ and df₂
dfDegrees of freedomParameter controlling distribution shape; typically n − 1 for t-test
H₀Null hypothesisThe default assumption being tested — rejected when p < α

How to Use

  1. 1
    Select test type: Choose Z-Test, T-Test, Chi-Square, or F-Test depending on your study design and data type.
  2. 2
    Choose tail direction: Two-tailed for "is there any difference?", upper one-tailed for "is A > B?", lower one-tailed for "is A < B?".
  3. 3
    Enter the test statistic: Type your z-score, t-statistic, χ² value, or F-statistic in the input. For t, also enter degrees of freedom (df = n − 1 for one-sample t). For F, enter df₁ and df₂.
  4. 4
    Set significance level α: Select your α threshold from the dropdown (0.05 is the most common in social science; 0.01 in medical research).
  5. 5
    Press Enter or click Calculate: The p-value, verdict (Reject H₀ / Fail to reject H₀), and interpretation text appear instantly.

Example Calculation

Example 1: Z-Test, two-tailed, z = 2.58

H₀: μ = μ₀, z = 2.58, α = 0.01, two-tailed

p = 2 · Φ(−|2.58|)

= 2 · Φ(−2.58)

= 2 · 0.00494

p ≈ 0.0099 < α = 0.01 → Reject H₀

Example 2: T-Test, two-tailed, t = 2.0, df = 20

H₀: μ = μ₀, t = 2.0, df = 20, α = 0.05, two-tailed

p = 2 · P(T₂₀ > 2.0)

≈ 2 · 0.02988 = 0.0598

p ≈ 0.0598 > α = 0.05 → Fail to reject H₀

Borderline results require context

p = 0.0598 is very close to the α = 0.05 threshold. In practice, report the exact p-value and effect size — a result just above α is not evidence for H₀, and a result just below α is not strong evidence against it. The p-value is a continuous measure, not a binary verdict.

Understanding P-Value — Z, T, Chi-Square & F

What Is a P-Value?

The p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one computed from your data, assuming the null hypothesis H₀ is true. It is a conditional probability: P(data this extreme | H₀ true).

A small p-value means the observed data would be rare under H₀ — which is evidence against H₀. How small is small enough? That is determined by the pre-chosen significance level α. If p < α, we reject H₀ and call the result statistically significant.

Key intuition

The p-value measures the strength of evidence against H₀ in your data. A p-value of 0.003 means results this extreme occur only 0.3% of the time under H₀ — strong evidence against it. A p-value of 0.42 means such results occur 42% of the time under H₀ — no reason to doubt it.

P-Value vs Statistical Significance

The p-value itself is a continuous measure of evidence. The significance level α converts it into a binary decision. Researchers typically set α before collecting data:

  • α = 0.05 — most common in social science, psychology, and ecology. One in 20 chance of false positive.
  • α = 0.01 — used in medical and pharmaceutical research where false positives are costly.
  • α = 0.001 — used in physics (e.g. the Higgs boson discovery required p < 3×10⁻⁷, roughly 5σ).
  • α = 0.10 — sometimes used in exploratory research or economic studies.

Common P-Value Misinterpretations

  • "p = 0.03 means there is a 3% chance H₀ is true." — Wrong. The p-value is not the probability that H₀ is true. It is the probability of data this extreme if H₀ were true — a fundamentally different statement.
  • "p > 0.05 means no effect exists." — Wrong. Failing to reject H₀ does not prove H₀. Low statistical power (small sample) can produce large p-values even when a real effect exists.
  • "A smaller p-value means a larger effect." — Wrong. P-values depend on sample size. A huge sample can produce p < 0.001 for a trivially small, practically meaningless effect.
  • "p = 0.049 and p = 0.051 are completely different." — Wrong. The α threshold is a convention, not a sharp boundary in nature. Report the actual p-value and effect size, not just "significant/not significant."

One-Tailed vs Two-Tailed Tests

The tail choice depends on your research question before you see the data:

  • Two-tailed — "Is there any difference between A and B?" Tests for effects in either direction. Use when you have no strong prior reason to expect the effect to go one way. The p-value is doubled compared to one-tailed.
  • Upper one-tailed — "Is A greater than B?" Tests only for positive effects. Use when theory or prior evidence strongly predicts the direction.
  • Lower one-tailed — "Is A less than B?" Tests only for negative effects.

Switching from two-tailed to one-tailed after seeing data (to achieve p < 0.05) is a form of p-hacking and inflates the false-positive rate. Always specify the tail before analysis.

Effect Size Beyond the P-Value

Statistical significance does not imply practical importance. A study with n = 100,000 can detect a mean difference of 0.001 units with p < 0.001 — yet such a difference may be meaningless in practice. Always report an effect size alongside the p-value:

  • Cohen's d for comparing means: d = (μ₁ − μ₂) / σ_pooled. Thresholds: small d = 0.2, medium d = 0.5, large d = 0.8.
  • Pearson's r for correlation: small r = 0.1, medium r = 0.3, large r = 0.5.
  • η² (eta-squared) for ANOVA: proportion of variance explained. Small: 0.01, medium: 0.06, large: 0.14.
  • Confidence intervals — a 95% CI around the effect size gives much more information than a binary p-value verdict.

Applications

FieldTypical TestExample Question
MedicineTwo-sample t-testDoes drug A reduce blood pressure more than drug B?
Social scienceZ-test on proportionIs the approval rate of policy X higher than 50%?
ManufacturingChi-square testIs the distribution of defects independent of shift?
FinanceF-testDo two investment strategies have equal variance in returns?
PsychologyOne-sample t-testIs the average IQ of this group different from 100?

Frequently Asked Questions

What does p < 0.05 mean?

p < 0.05 means: if H₀ were true, data this extreme would appear less than 5% of the time.

  • By convention (Fisher, 1925), p < 0.05 is the threshold for "statistical significance."
  • It does NOT mean there is a 95% chance the effect is real.
  • It does NOT mean the effect is large or practically important.
  • The threshold α = 0.05 is a convention, not a law of nature — some fields use 0.01 or 0.001.

What is the difference between the p-value and the significance level α?

  • α is set BEFORE the experiment — it is the maximum tolerable false-positive rate.
  • p-value is computed AFTER the experiment from the observed data.
  • The decision rule is: reject H₀ if p < α.
  • Changing α after seeing data is p-hacking and inflates false-positive rates.

When should I use a one-tailed vs two-tailed test?

  • Two-tailed: "is there any difference?" — safest default, no directional assumption.
  • One-tailed: "is the effect in this specific direction?" — requires strong prior justification.
  • One-tailed tests have half the p-value of two-tailed for the same statistic.
  • The tail choice must be pre-registered before seeing data, not chosen based on results.

Does a small p-value prove the alternative hypothesis H₁ is true?

Rejecting H₀ means the data are incompatible with H₀ — not that H₁ is certainly true.

  • Sampling bias or confounding variables can produce small p-values without a real effect.
  • Multiple comparisons inflate false-positive rates — use Bonferroni or FDR corrections.
  • Replication is essential — a single p < 0.05 result has a meaningful false-positive rate.
  • Always pair the p-value with an effect size and confidence interval.

What does a p-value of exactly 0 mean?

  • p = 0 is a display artifact — the true p-value is extremely small but positive.
  • Common software thresholds: R reports p < 2.2e-16; this calculator reports "< 0.0001" for very small values.
  • Report as "p < 0.001" or give the exact test statistic so readers can verify.
  • An astronomically small p-value is very strong evidence against H₀, but still not proof.

What should I report alongside the p-value?

  • Exact p-value — not just "significant" or "p < 0.05".
  • Test statistic with degrees of freedom: e.g. t(29) = 2.45, p = 0.021.
  • Effect size — Cohen's d, Pearson r, or η² tells you how large the effect is.
  • 95% confidence interval — gives a plausible range for the true effect size.
  • Sample size — small samples have low power and noisy p-values.

Why is α = 0.05 used so commonly?

  • Fisher (1925) popularized α = 0.05 as a practical convenience threshold.
  • It was never meant to be a universal law — it reflects a 1-in-20 false-positive rate.
  • Replication crisis: many "p < 0.05" results in psychology and medicine failed to replicate.
  • Some statisticians advocate for α = 0.005 as the new threshold for "significance."
  • Others recommend abandoning binary thresholds and simply reporting p-values with effect sizes.

Related Calculators