P-Value Calculator — Z, T, Chi-Square & F

Calculate the exact p-value from a z-score, t-statistic, chi-square statistic, or F-statistic. Supports one-tailed and two-tailed tests with significance interpretation.

Quick Presets

Test Type

Tail

Z-score

α (significance level)

Enter calculate · Esc reset

What Is the P-Value Calculator — Z, T, Chi-Square & F?

This p-value calculator computes the exact p-value from four common test statistics — z, t, chi-square, and F — using accurate JavaScript approximations of the underlying probability distributions. Results include the significance verdict, interpretation text, and the complete decision framework.

›Four distribution types — standard normal (Z), Student's t, chi-square (χ²), and F-distribution, covering the most common hypothesis tests in statistics.
›One-tailed and two-tailed tests — choose upper, lower, or two-tailed depending on the direction of your hypothesis.
›Significance thresholds — compare your p-value against α = 0.001, 0.01, 0.05, 0.10, or 0.20 with instant reject/fail-to-reject verdict.
›Interpretation guide — strength-of-evidence language (very strong / strong / moderate / weak) updates based on the computed p-value.
›Common misinterpretations flagged — the result panel reminds you what the p-value does and does not mean (it is not the probability H₀ is true).

Formula

Z-Test (Standard Normal)

Two-tailed: p = 2 · Φ(−|z|) where Φ is the standard normal CDF

One-tailed: p = Φ(−z) (upper) or p = Φ(z) (lower)

Normal PDF: f(x) = (1/√(2π)) · exp(−x²/2)

T-Test (Student's t-distribution)

p computed from t(df) distribution with df degrees of freedom

Two-tailed: p = 2 · P(T > |t|) where T ~ t(df)

Chi-Square Test

p = P(χ²(df) > χ²) — upper tail probability

Requires χ² ≥ 0 and df ≥ 1

F-Test

p = P(F(df₁, df₂) > F) — upper tail probability

Transforms to beta distribution: x = df₁·F / (df₁·F + df₂)

Symbol	Name	Description
p	P-value	Probability of obtaining a test statistic ≥ observed, given H₀ is true
α	Significance level	Threshold for rejection; typically 0.05 — reject H₀ when p < α
z	Z-score	Standard normal test statistic — units of standard deviations from mean
t	T-statistic	Test statistic from Student's t-distribution with df degrees of freedom
χ²	Chi-square stat	Non-negative test statistic from the chi-square distribution
F	F-statistic	Ratio of variances from the F-distribution with df₁ and df₂
df	Degrees of freedom	Parameter controlling distribution shape; typically n − 1 for t-test
H₀	Null hypothesis	The default assumption being tested — rejected when p < α

How to Use

1
Select test type: Choose Z-Test, T-Test, Chi-Square, or F-Test depending on your study design and data type.
2
Choose tail direction: Two-tailed for "is there any difference?", upper one-tailed for "is A > B?", lower one-tailed for "is A < B?".
3
Enter the test statistic: Type your z-score, t-statistic, χ² value, or F-statistic in the input. For t, also enter degrees of freedom (df = n − 1 for one-sample t). For F, enter df₁ and df₂.
4
Set significance level α: Select your α threshold from the dropdown (0.05 is the most common in social science; 0.01 in medical research).
5
Press Enter or click Calculate: The p-value, verdict (Reject H₀ / Fail to reject H₀), and interpretation text appear instantly.

Example Calculation

Example 1: Z-Test, two-tailed, z = 2.58

H₀: μ = μ₀, z = 2.58, α = 0.01, two-tailed

p = 2 · Φ(−|2.58|)

= 2 · Φ(−2.58)

= 2 · 0.00494

p ≈ 0.0099 < α = 0.01 → Reject H₀

Example 2: T-Test, two-tailed, t = 2.0, df = 20

H₀: μ = μ₀, t = 2.0, df = 20, α = 0.05, two-tailed

p = 2 · P(T₂₀ > 2.0)

≈ 2 · 0.02988 = 0.0598

p ≈ 0.0598 > α = 0.05 → Fail to reject H₀

Borderline results require context

p = 0.0598 is very close to the α = 0.05 threshold. In practice, report the exact p-value and effect size — a result just above α is not evidence for H₀, and a result just below α is not strong evidence against it. The p-value is a continuous measure, not a binary verdict.

Understanding P-Value — Z, T, Chi-Square & F

What Is a P-Value?

The p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one computed from your data, assuming the null hypothesis H₀ is true. It is a conditional probability: P(data this extreme | H₀ true).

A small p-value means the observed data would be rare under H₀ — which is evidence against H₀. How small is small enough? That is determined by the pre-chosen significance level α. If p < α, we reject H₀ and call the result statistically significant.

Key intuition

The p-value measures the strength of evidence against H₀ in your data. A p-value of 0.003 means results this extreme occur only 0.3% of the time under H₀ — strong evidence against it. A p-value of 0.42 means such results occur 42% of the time under H₀ — no reason to doubt it.

P-Value vs Statistical Significance

The p-value itself is a continuous measure of evidence. The significance level α converts it into a binary decision. Researchers typically set α before collecting data:

›α = 0.05 — most common in social science, psychology, and ecology. One in 20 chance of false positive.
›α = 0.01 — used in medical and pharmaceutical research where false positives are costly.
›α = 0.001 — used in physics (e.g. the Higgs boson discovery required p < 3×10⁻⁷, roughly 5σ).
›α = 0.10 — sometimes used in exploratory research or economic studies.

Common P-Value Misinterpretations

›"p = 0.03 means there is a 3% chance H₀ is true." — Wrong. The p-value is not the probability that H₀ is true. It is the probability of data this extreme if H₀ were true — a fundamentally different statement.
›"p > 0.05 means no effect exists." — Wrong. Failing to reject H₀ does not prove H₀. Low statistical power (small sample) can produce large p-values even when a real effect exists.
›"A smaller p-value means a larger effect." — Wrong. P-values depend on sample size. A huge sample can produce p < 0.001 for a trivially small, practically meaningless effect.
›"p = 0.049 and p = 0.051 are completely different." — Wrong. The α threshold is a convention, not a sharp boundary in nature. Report the actual p-value and effect size, not just "significant/not significant."

One-Tailed vs Two-Tailed Tests

The tail choice depends on your research question before you see the data:

›Two-tailed — "Is there any difference between A and B?" Tests for effects in either direction. Use when you have no strong prior reason to expect the effect to go one way. The p-value is doubled compared to one-tailed.
›Upper one-tailed — "Is A greater than B?" Tests only for positive effects. Use when theory or prior evidence strongly predicts the direction.
›Lower one-tailed — "Is A less than B?" Tests only for negative effects.

Switching from two-tailed to one-tailed after seeing data (to achieve p < 0.05) is a form of p-hacking and inflates the false-positive rate. Always specify the tail before analysis.

Effect Size Beyond the P-Value

Statistical significance does not imply practical importance. A study with n = 100,000 can detect a mean difference of 0.001 units with p < 0.001 — yet such a difference may be meaningless in practice. Always report an effect size alongside the p-value:

›Cohen's d for comparing means: d = (μ₁ − μ₂) / σ_pooled. Thresholds: small d = 0.2, medium d = 0.5, large d = 0.8.
›Pearson's r for correlation: small r = 0.1, medium r = 0.3, large r = 0.5.
›η² (eta-squared) for ANOVA: proportion of variance explained. Small: 0.01, medium: 0.06, large: 0.14.
›Confidence intervals — a 95% CI around the effect size gives much more information than a binary p-value verdict.

Applications

Field	Typical Test	Example Question
Medicine	Two-sample t-test	Does drug A reduce blood pressure more than drug B?
Social science	Z-test on proportion	Is the approval rate of policy X higher than 50%?
Manufacturing	Chi-square test	Is the distribution of defects independent of shift?
Finance	F-test	Do two investment strategies have equal variance in returns?
Psychology	One-sample t-test	Is the average IQ of this group different from 100?

Frequently Asked Questions

What does p < 0.05 mean?

p < 0.05 means: if H₀ were true, data this extreme would appear less than 5% of the time.

›By convention (Fisher, 1925), p < 0.05 is the threshold for "statistical significance."
›It does NOT mean there is a 95% chance the effect is real.
›It does NOT mean the effect is large or practically important.
›The threshold α = 0.05 is a convention, not a law of nature — some fields use 0.01 or 0.001.

What is the difference between the p-value and the significance level α?

›α is set BEFORE the experiment — it is the maximum tolerable false-positive rate.
›p-value is computed AFTER the experiment from the observed data.
›The decision rule is: reject H₀ if p < α.
›Changing α after seeing data is p-hacking and inflates false-positive rates.

When should I use a one-tailed vs two-tailed test?

›Two-tailed: "is there any difference?" — safest default, no directional assumption.
›One-tailed: "is the effect in this specific direction?" — requires strong prior justification.
›One-tailed tests have half the p-value of two-tailed for the same statistic.
›The tail choice must be pre-registered before seeing data, not chosen based on results.

Does a small p-value prove the alternative hypothesis H₁ is true?

Rejecting H₀ means the data are incompatible with H₀ — not that H₁ is certainly true.

›Sampling bias or confounding variables can produce small p-values without a real effect.
›Multiple comparisons inflate false-positive rates — use Bonferroni or FDR corrections.
›Replication is essential — a single p < 0.05 result has a meaningful false-positive rate.
›Always pair the p-value with an effect size and confidence interval.

What does a p-value of exactly 0 mean?

›p = 0 is a display artifact — the true p-value is extremely small but positive.
›Common software thresholds: R reports p < 2.2e-16; this calculator reports "< 0.0001" for very small values.
›Report as "p < 0.001" or give the exact test statistic so readers can verify.
›An astronomically small p-value is very strong evidence against H₀, but still not proof.

What should I report alongside the p-value?

›Exact p-value — not just "significant" or "p < 0.05".
›Test statistic with degrees of freedom: e.g. t(29) = 2.45, p = 0.021.
›Effect size — Cohen's d, Pearson r, or η² tells you how large the effect is.
›95% confidence interval — gives a plausible range for the true effect size.
›Sample size — small samples have low power and noisy p-values.

Why is α = 0.05 used so commonly?

›Fisher (1925) popularized α = 0.05 as a practical convenience threshold.
›It was never meant to be a universal law — it reflects a 1-in-20 false-positive rate.
›Replication crisis: many "p < 0.05" results in psychology and medicine failed to replicate.
›Some statisticians advocate for α = 0.005 as the new threshold for "significance."
›Others recommend abandoning binary thresholds and simply reporting p-values with effect sizes.

Related Calculators

Z-Score Calculator — Standard Score & Percentile

Calculate z-score, percentile rank, and p-values from any data point, mean, and standard deviation. Includes reverse calculation and step-by-step solutions.

Chi-Square Calculator

Perform chi-square goodness-of-fit and independence tests with p-value, critical value, effect size (Cramér's V, Cohen's w), Yates' correction, standardised residuals, and chi-square distribution curve. Supports 2–5×2–5 contingency tables with real-world presets.

Confidence Interval Calculator — Mean & Proportion

Calculate confidence intervals for means (z and t-test) and proportions (Wald and Wilson). Shows margin of error, critical value, SE, step-by-step working, and a visual bell curve diagram.

Binomial Distribution Calculator

Calculate binomial probability P(X=k), CDF, mean, variance, and distribution statistics for any n and p. Includes bar chart and full probability table.

T-Test Calculator — One-Sample

Perform a one-sample t-test to determine if a sample mean differs significantly from a known value.