Effect Size Calculator | Cohen's d, η² & r
Calculate Cohen's d, Hedges' g, Glass's Δ, r, and η² effect sizes from means, t-statistics, F-statistics, or correlation r. Includes 95% CI, power, and required sample size.
| Measure | Small | Medium | Large |
|---|---|---|---|
| Cohen's d / Hedges' g / Glass's Δ | 0.2 | 0.5 | 0.8 |
| r (correlation / point-biserial) | 0.1 | 0.3 | 0.5 |
| η² / ω² (ANOVA variance explained) | 0.01 | 0.06 | 0.14 |
| Cohen's f | 0.10 | 0.25 | 0.40 |
| Cohen's f² | 0.02 | 0.15 | 0.35 |
All calculations are performed live in your browser using Cohen (1988) benchmarks and standard formulas. No data leaves your device. Power estimates use the normal approximation to the noncentral t distribution.
What Is the Effect Size Calculator | Cohen's d, η² & r?
Effect size measures how big a difference is, independently of sample size. A p-value can only tell you whether an effect is statistically distinguishable from noise, it says nothing about whether that effect is practically meaningful. Two studies of the same phenomenon can have wildly different p-values simply because one used 30 participants and the other used 3,000, even if the underlying effect is identical.
Cohen's d is the most widely used effect size for comparing two group means. It expresses the difference in units of the pooled standard deviation, a d of 0.5 means the groups differ by half a standard deviation. This makes results comparable across studies using different measurement scales, which is why effect sizes are the building block of meta-analysis.
Hedges' g applies a small-sample bias correction (factor J) to Cohen's d. When each group has more than ~20 observations, d and g are nearly identical. Below that threshold, g is the preferred estimate. Glass's Δ is useful when the two groups have different population variances, it standardises using only the control group's SD, treating it as the natural baseline spread.
The 95% confidence interval for d (Hedges & Olkin approximation) shows the plausible range of the true effect, accounting for sampling variability. A wide CI with a medium d still leaves substantial uncertainty; a narrow CI with a small d may be ruling out practically meaningful effects. Always report both the point estimate and its CI.
Formula
| Symbol | Name | Description |
|---|---|---|
| M₁, M₂ | Group means | Arithmetic mean of each group |
| SD₁, SD₂ | Std deviations | Standard deviation within each group |
| n₁, n₂ | Sample sizes | Number of observations per group |
| SD_pooled | Pooled SD | Weighted average SD across both groups |
| d | Cohen's d | Mean difference in units of pooled SD |
| g | Hedges' g | d with small-sample bias correction factor J |
| Δ | Glass's Delta | Mean difference divided by the control group SD |
| r | Point-biserial r | Correlation equivalent of d; bounded ±1 |
| η² | Eta-squared | Proportion of total variance explained by group (ANOVA) |
| ω² | Omega-squared | Less biased η² estimator for small samples |
| f | Cohen's f | √(η²/(1−η²)); used in power analysis for ANOVA |
How to Use
- 1Choose an input mode: Select "From Means & SDs" (most common), "From t-statistic" (if you have a t-test output), "From F-statistic" (one-way ANOVA), or "From correlation r" to convert between measures.
- 2Try a preset: In Means mode, click a preset, vocabulary test, therapy trial, drug study, or reading program, to load realistic example values and see what each effect size level looks like.
- 3Enter your values: Fill in means, standard deviations, and sample sizes for both groups. Sample sizes default to 30 if left blank (affects power and CI width, not d itself). Press Enter or click Calculate.
- 4Read the main result: The orange card shows Cohen's d and its interpretation badge. Below it, the stat grid shows Hedges' g, Glass's Δ, point-biserial r, 95% CI bounds, and observed statistical power.
- 5Check the required n panel: The blue panel tells you how many participants per group you would need to detect this effect with 80% power at α = 0.05, useful for planning future studies.
- 6Inspect the step trace: Click "Show calculation steps" to see every intermediate value: pooled SD, bias correction J, SE for d, and CI computation, useful for coursework and replication.
- 7Use the benchmark table: The Cohen (1988) benchmark table at the bottom shows small/medium/large thresholds for every measure. Remember these are rough domain-independent guidelines, a d of 0.2 may be large in some fields.
- 8Reset or revisit: Press Reset or Esc to clear all inputs. Your last values are saved in browser storage and reload automatically on your next visit.
Example Calculation
Example 1: Maths tutoring intervention
A tutoring programme raises average exam scores from 68 to 75. Both groups have SD ≈ 11, n = 30 each.
Example 2: Converting from a reported t-statistic
A paper reports t(58) = 2.41, n = 30 per group. What is the effect size?
Example 3: ANOVA, three teaching methods
One-way ANOVA with 3 groups (k = 3), N = 90 total. F(2, 87) = 4.85.
Understanding Effect Size | Cohen's d, η² & r
Why Effect Size Matters More Than p-Values
A p-value answers a narrow question: given this sample size, how surprising is this result if the null hypothesis were true? It conflates sample size with evidence strength. With n = 10,000, a trivially small difference, say, a 0.01-point IQ gap, will yield p < 0.001. With n = 15, even a large real effect might not reach significance.
Effect size separates signal from sample size. It tells you the magnitude of the difference in standardised, comparable units. The APA Publication Manual (7th edition), the British Psychological Society, and journals including Nature and JAMA now require or strongly recommend reporting effect sizes alongside p-values.
Cohen's d vs Hedges' g vs Glass's Δ
- ›Cohen's d, the default choice when both groups are large (n > 20 each) and have similar spread. Uses the pooled SD as the denominator, which weights larger groups more heavily. The most commonly reported effect size in psychology and medicine.
- ›Hedges' g, identical to d but multiplied by a correction factor J that shrinks toward zero as n decreases. For small samples (n < 20 per group), g is notably less biased. For n > 50, d and g differ by less than 1%.
- ›Glass's Δ, uses only the control group's SD in the denominator. Appropriate when the intervention itself changes the variance (e.g., a drug that both shifts the mean and reduces spread in the treated group). Preserves the interpretation "difference relative to untreated variability."
Effect Size from ANOVA, η² and ω²
- ›Eta-squared (η²) is the proportion of total variance explained by the group factor: SS_between / SS_total. Simple to compute from F and df, but it is positively biased, it overestimates the population effect, especially with small n and few groups.
- ›Omega-squared (ω²) corrects for this bias and gives a closer estimate of the population η². For large samples the two converge; for small samples or many groups, ω² can be meaningfully smaller. Always prefer ω² when reporting ANOVA effect sizes.
- ›Cohen's f is √(η²/(1−η²)), equivalent to the standardised between-group spread divided by the within-group spread. It is the direct input to power calculations for ANOVA (e.g., G*Power software).
Power and Sample Size Planning
Statistical power is the probability that your study will detect a true effect, given that one exists. A power of 80% (β = 0.20) is the conventional minimum. This calculator computes observed power from d and your sample sizes, and also estimates the per-group n required to achieve 80% power, essential before running a study.
For example, detecting a small effect (d = 0.2) at 80% power requires approximately 394 participants per group, 788 total. A medium effect (d = 0.5) requires about 64 per group. A large effect (d = 0.8) needs only 26 per group. Underpowered studies waste resources and produce inflated effect sizes when they do find significance ("winner's curse").
Confidence Intervals for Cohen's d
The 95% CI for d uses the Hedges & Olkin (1985) standard error approximation: SE_d ≈ √((n₁+n₂)/(n₁×n₂) + d²/(2×df)). The exact CI uses the noncentral t distribution and is computationally intensive; this approximation is accurate for moderate to large samples. Narrow CIs indicate high precision; wide CIs call for more data before drawing conclusions. A CI that spans zero means the data are consistent with no difference.
Applications Across Disciplines
- ›Clinical research: Effect sizes determine whether a statistically significant treatment difference is clinically meaningful. A drug that reduces pain scores by d = 0.1 may be real but irrelevant to patients.
- ›Meta-analysis: Combining results across studies requires a common effect size metric. Cohen's d and r are the currency of meta-analysis, p-values cannot be directly pooled.
- ›Education research: The What Works Clearinghouse uses a minimum effect size threshold (typically d ≥ 0.25) to classify interventions as having a meaningful positive effect.
- ›A/B testing and product: Conversion rate differences between UI variants are often expressed as Cohen's h (for proportions) or d. Knowing the effect size helps decide whether to ship a change at current traffic levels.
- ›Psychometrics and validation: Test-retest reliability studies report r as an effect size; instrument sensitivity is evaluated via the minimum detectable d given target sample sizes.
Data and Methods
Benchmarks (small/medium/large) follow Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Omega-squared formula follows Kirk (1996). The Hedges & Olkin SE approximation is from Hedges, L.V. & Olkin, I. (1985), Statistical Methods for Meta-Analysis. All computations are performed live in your browser, no data is transmitted to any server. Power calculations use the normal approximation to the noncentral t distribution (accurate for n > 10 per group).
Frequently Asked Questions
What is Cohen's d and what does it measure?
Cohen's d is the difference between two group means divided by their pooled standard deviation:
d = (M₁ − M₂) / SD_pooled
It tells you how far apart the groups are in units of their typical spread. A d of 1.0 means the groups differ by one full standard deviation, roughly the difference between the 50th and 84th percentile of the same distribution.
- d = 0.2, Small: visible only in careful studies
- d = 0.5, Medium: noticeable to an informed observer
- d = 0.8, Large: apparent without statistical analysis
These thresholds are rough guidelines from Cohen (1988), what counts as "large" depends heavily on the field and measurement context.
What is the difference between Cohen's d, Hedges' g, and Glass's Δ?
All three standardise a mean difference, but they differ in what goes in the denominator:
- Cohen's d, uses the pooled SD (weights both groups). Default choice for large, roughly equal-variance samples.
- Hedges' g, multiplies d by a bias-correction factor J. Preferred for small samples (n < 20 per group) where d systematically overestimates the true effect.
- Glass's Δ, uses only the control group's SD. Best when the treatment itself is expected to change the variance, so the control SD better represents the natural baseline spread.
For most published research with adequate sample sizes, the three measures are nearly identical. The choice mainly matters when n is small or when treatment-vs-control variance equality is questionable.
Why do I need effect size if I already have a p-value?
A p-value tells you the probability of your data (or more extreme results) given no true effect. It does not tell you how big the effect is. Two problems arise:
- Large n, tiny effect: With 10,000 participants, a 0.01-point mean difference can yield p < 0.001, statistically significant but practically worthless.
- Small n, real effect: A large genuine effect may not reach p < 0.05 with n = 15, leading to a false negative conclusion.
Effect size is independent of sample size. It answers the scientifically important question: how meaningful is this difference? Major journals and APA guidelines now require effect sizes alongside p-values.
How do I calculate effect size from a t-statistic?
If you have the t-value and both sample sizes:
d = t × √(1/n₁ + 1/n₂)
If you only have the degrees of freedom (df = n₁ + n₂ − 2) and assume equal groups:
d ≈ 2t / √(df + 2)
The point-biserial correlation r (equivalent to the effect size for a two-group comparison) is:
r = t / √(t² + df)
Use the "From t-statistic" mode in this calculator, enter the t-value and either df or both sample sizes.
What is eta-squared (η²) and when should I use it instead of Cohen's d?
Use eta-squared when you have three or more groups (ANOVA), not just two:
η² = SS_between / SS_total = (F × df_b) / (F × df_b + df_e)
- η² represents the proportion of total variance explained by group membership.
- It ranges from 0 to 1: η² = 0.06 means the groups explain 6% of all variance.
- Cohen's benchmarks: small = 0.01, medium = 0.06, large = 0.14.
Prefer omega-squared (ω²) over η² for small samples, η² is positively biased and tends to overestimate the population effect size.
What does the 95% confidence interval for Cohen's d tell me?
The CI gives a plausible range for the true population d based on your sample. If you repeated the study many times, 95% of the resulting CIs would contain the true effect.
- CI excludes zero, consistent with a real nonzero effect (similar to p < 0.05).
- CI includes zero, data are consistent with no effect; do not conclude the effect is zero, just that your study lacks the precision to detect it.
- Narrow CI, high precision; larger sample or smaller SD.
- Wide CI, low precision; more data needed before drawing firm conclusions.
This calculator uses the Hedges & Olkin (1985) standard error approximation: SE_d ≈ √((n₁+n₂)/(n₁n₂) + d²/(2df)).
How do I interpret observed statistical power?
Power is the probability that your study will detect a true effect of size d, given α = 0.05 and your sample sizes. The conventional target is 80% power (β = 0.20).
- Power < 50%, your study is more likely to miss a real effect than find it. Results should be interpreted cautiously.
- Power ≈ 80%, the standard minimum for published research.
- Power > 95%, well-powered; a negative result meaningfully argues against the specified effect size.
The "required n" panel shows how many participants per group you would need to reach 80% power, use this for study planning before data collection begins.
Can I compare effect sizes across different studies or outcomes?
Yes, this is one of the main reasons effect sizes exist. Because d, g, and r are dimensionless, you can directly compare:
- A drug study measuring blood pressure and an education study measuring test scores.
- Studies using different measurement instruments (e.g., two different depression scales).
- Results across cultures, time periods, or populations in a meta-analysis.
One important caveat: the same d value has different practical significance in different contexts. d = 0.2 may be negligible in a lab study but substantial in a public health intervention affecting millions of people. Always interpret effect sizes alongside subject-matter knowledge.