The central limit theorem is why statistics works
The underlying idea
The central limit theorem (CLT) is the single most important result in statistics. It explains why the normal distribution appears everywhere, not because nature loves bell curves, but because of what happens when you average things together.
Take any population with any shape of distribution. It can be right-skewed, bimodal, uniform. It doesn’t matter. Draw many random samples and compute the mean of each one. The distribution of those sample means will approach a normal distribution. The larger each sample, the tighter the approximation.
This is why we can run hypothesis tests on real-world data without knowing the true distribution of the thing we’re measuring. As long as we’re working with means of reasonably-sized samples, the CLT guarantees the sampling distribution behaves normally.
Historical root
The CLT has a long lineage. Abraham de Moivre first described a version of it in 1733, showing that the binomial distribution approaches the normal for large . Laplace generalized it in 1812. The modern formulation, proved rigorously for independent and identically distributed variables, is credited to Lyapunov (1901) and later Lindeberg (1922).
The name “central” doesn’t mean it’s the most central theorem in statistics (though it arguably is). It refers to the theorem’s concern with the center (the mean) of a distribution, and to the central role it plays in connecting probability to inference.
Key assumptions
Independence. Each observation must be drawn independently. Non-random sampling, clustering, or autocorrelation in time-series data violates this. The CLT does not automatically apply to your weekly sales figures.
Finite variance. The population must have a finite mean and variance. Distributions with very heavy tails (like the Cauchy) have infinite variance, and the CLT breaks down entirely for them.
Sample size. The common rule of thumb is , but this is context-dependent. If the underlying distribution is already close to normal, may suffice. If it’s severely skewed, or more may be needed before the approximation is reliable.
The math
Let be independent random variables, each with mean and variance . Define the sample mean:
The CLT states that as , the standardized sample mean converges in distribution to a standard normal:
In the practical form used for inference, this means:
The term is the standard error, the standard deviation of the sampling distribution. Quadrupling your sample size halves the standard error, which is why larger studies produce narrower confidence intervals.
The code
Here’s a direct demonstration. We’ll draw from an exponential distribution (skewed right, nothing like a bell curve), compute sample means repeatedly, and watch the CLT force them into normality.
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(42)
# Population: exponential distribution (skewed right, mean = 1)
population = rng.exponential(scale=1.0, size=100_000)
sample_sizes = [5, 30, 100]
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
for ax, n in zip(axes, sample_sizes):
# Draw 5000 samples of size n, compute the mean of each
sample_means = [
rng.choice(population, size=n).mean()
for _ in range(5_000)
]
ax.hist(sample_means, bins=60, density=True, color='#1B9E77', alpha=0.8)
ax.set_title(f'n = {n}', fontsize=12)
ax.set_xlabel('Sample mean')
ax.set_ylabel('Density')
plt.suptitle(
'CLT: sample means from an exponential population',
y=1.02, fontsize=13
)
plt.tight_layout()
plt.show()
Run this and watch the distributions transform: is visibly skewed, looks nearly normal, and is indistinguishable from a bell curve. The population never changed. Only the aggregation did.
Business application
The CLT underlies virtually every A/B test run in industry. When a product team runs an experiment (measuring average session length, conversion rate, or revenue per user), they are comparing two sample means. The validity of the t-test or z-test they use rests entirely on the CLT making those means approximately normally distributed.
When CLT assumptions break down, standard tests mislead. Correlated users (network effects in social products), heavy-tailed metrics (revenue with large outliers), or tiny sample sizes all violate the guarantees. This is why platforms like Netflix and LinkedIn publish research on variance reduction techniques: they are engineering around CLT edge cases at scale, where a broken assumption in a significance test translates directly into a bad product decision shipped to millions of users.