Statistics & Probability

Logistic Regression Calculator

Log-Odds, Odds Ratios & ROC AUC

Fit a binary logistic regression model using iterative gradient descent. Computes log-likelihood, coefficients, standard errors, odds ratios, predicted probabilities, and ROC AUC. Includes confusion matrix with accuracy, precision, recall, and F1 score.

Instant Results100% FreeAny DeviceNo Sign-up

Number of Predictors (1–2)

Each row: y(0/1) x₁ x₂

Overview

What Is the Logistic Regression Calculator?

Logistic regression models the probability of a binary outcome using the sigmoid function to constrain predictions to [0,1]. Instead of minimising squared errors like OLS, it maximises the log-likelihood of the observed binary labels via gradient descent.

Coefficients are expressed as log-odds; exp(β) gives the odds ratio, representing how the odds of outcome=1 multiply for a one-unit increase in the predictor. The ROC AUC measures the model's ability to discriminate between classes across all thresholds (0.5 = random, 1.0 = perfect).

How it works

Logistic Regression Calculator Formula and Method

Sigmoid: σ(z) = 1 / (1 + e⁻ᶻ) where z = β₀ + β₁x₁ + β₂x₂

Log-likelihood: ℓ(β) = Σ[yᵢ·log(σ(zᵢ)) + (1−yᵢ)·log(1−σ(zᵢ))]

Odds ratio = exp(β) | SE via Fisher information: I = XᵀWX, W = diag(p̂(1−p̂))

AUC (trapezoidal rule over sorted probabilities)

Instructions

How to Use

1
Choose the number of predictors (1 or 2) from the dropdown.
2
Enter data in the textarea — one row per observation with y (0 or 1) first.
3
Use the preset (study hours vs pass/fail) as a reference format.
4
Click Fit Model to run gradient descent optimisation.
5
Read the classification metrics: accuracy, precision, recall, F1, and AUC.
6
Check the Coefficient Table for β (log-odds), exp(β) (odds ratio), and p-values.
7
Review the Confusion Matrix to see true/false positive and negative counts at threshold 0.5.

Real example

Logistic Regression Calculator Example

Example 1 — Pass/fail prediction: With y=pass(1)/fail(0), x₁=hours studied, x₂=prior GPA: β₁=0.8 means each extra study hour multiplies the odds of passing by e^0.8 ≈ 2.23. AUC=0.91 indicates excellent discrimination.

Example 2 — Medical diagnosis: y=disease(1)/healthy(0), x₁=biomarker level. β₁=1.2, exp(β₁)=3.32: each unit increase in the biomarker triples the odds of disease. Precision=0.80 means 80% of predicted-positive patients truly have the disease.

Guide

Understanding Logistic Regression

Classification Metrics Reference

Metric	Formula	Interpretation
Accuracy	(TP + TN) / N	Overall fraction correct; misleading for imbalanced classes
Precision	TP / (TP + FP)	Of predicted positives, what fraction are truly positive
Recall (Sensitivity)	TP / (TP + FN)	Of actual positives, what fraction were correctly identified
F1 Score	2·P·R / (P + R)	Harmonic mean of precision and recall; best for imbalanced data
Specificity	TN / (TN + FP)	Of actual negatives, what fraction were correctly identified

AUC Benchmark Table

AUC Range	Discrimination	Typical Use
0.50	No discrimination (random)	Baseline
0.50 – 0.70	Poor	Model needs improvement
0.70 – 0.80	Acceptable	Useful in many applications
0.80 – 0.90	Excellent	Good for clinical / business use
0.90 – 1.00	Outstanding	Exceptional, check for data leakage

Key Logistic Regression Concepts

▸Log-odds (logit): log(p/(1-p)) = β₀ + β₁x₁ + … — the linear predictor on the logit scale.
▸Separation problem: if classes are perfectly separable, MLE coefficients diverge to ±∞; gradient descent will converge very slowly or not at all.
▸Class imbalance: with highly unequal class proportions, accuracy is misleading — prefer F1, AUC, or precision-recall curves.
▸Regularisation (L1/L2) is not included in this calculator but is crucial for high-dimensional data to prevent overfitting.
▸McFadden pseudo-R² = 1 − ℓ(full)/ℓ(null) is a common goodness-of-fit measure for logistic models.
▸The decision threshold of 0.5 is not always optimal — lower it to increase recall; raise it to increase precision.

Answers

Frequently Asked Questions

What is an odds ratio and how do I interpret it?

The odds ratio exp(β) describes how the odds of the outcome being 1 change for a one-unit increase in a predictor. OR > 1 means increased odds; OR < 1 means decreased odds. For example, OR=2.5 means the outcome is 2.5 times as likely (in odds terms) per unit increase in that predictor, holding others constant.

What is ROC AUC and what values are good?

AUC (Area Under the ROC Curve) measures classification ability across all thresholds. AUC=0.5 is random chance; 0.7–0.8 is acceptable; 0.8–0.9 is excellent; above 0.9 is outstanding. High AUC does not mean the model is well-calibrated — it only measures discrimination.

Why use gradient descent instead of the closed-form solution?

Logistic regression has no closed-form solution for maximum likelihood estimation. Gradient descent iteratively updates the coefficient vector in the direction that increases the log-likelihood. With lr=0.1 and up to 1000 iterations, convergence is typically achieved within 50–200 steps for well-scaled data.

What does the confusion matrix show?

The 2×2 confusion matrix shows counts of: True Positives (TP, correctly predicted 1), True Negatives (TN), False Positives (FP, predicted 1 but actually 0), and False Negatives (FN, predicted 0 but actually 1). Precision = TP/(TP+FP) and Recall = TP/(TP+FN) reveal trade-offs important for imbalanced classes.

When should I use logistic regression vs linear regression?

Use logistic regression when your outcome is binary (0/1, yes/no, pass/fail). Linear regression on a binary outcome can produce probabilities outside [0,1] and violates homoscedasticity. Logistic regression is the standard baseline for binary classification before exploring more complex models like decision trees or neural networks.

Related Calculators

Linear Regression Calculator

Find the best-fit line (y = mx + b) using least squares regression.

Chi-Square Calculator

Perform chi-square goodness-of-fit and independence tests with p-value, critical value, effect size (Cramér's V, Cohen's w), Yates' correction, standardised residuals, and chi-square distribution curve. Supports 2–5×2–5 contingency tables with real-world presets.

Correlation Coefficient Calculator

Calculate Pearson r and Spearman ρ between two datasets with p-value, R², regression line, and scatter plot.

Bayes' Theorem Calculator

Calculate posterior probability P(A|B) from prior, sensitivity, and false positive rate. Includes natural frequencies, Bayes factor, and step-by-step derivation.

Normal Distribution Calculator

Compute exact probabilities for any normal distribution: P(X < a), P(X > a), P(a ≤ X ≤ b), inverse CDF (quantile), and PDF at a point. Works with standard normal (μ=0, σ=1) and custom parameters. Bell curve visualization included.