Power laws: when averages lie

mechanism

Why most events are small and a few are catastrophic

Pick any day and count forest fires. Most are small enough to go unreported. A handful burn a few hectares. Once in a while, one takes out half a valley. Plot the sizes and you get a curve that slopes steeply down and then drags off to the right in a long, stubborn tail. The same curve fits earthquakes, internet outages, stock-market drawdowns, and the follower counts of musicians. The word for it is power law: a heavy-tailed distribution where the frequency of an event drops as a fixed power of its size.

What is a power law?

A power law is a distribution where the probability of an event drops proportionally to the event's size raised to some negative exponent. Double the size, the frequency drops by a fixed factor. Double it again, it drops by the same factor. The shape has no characteristic scale. There is no "typical" event to anchor an expectation on, which is why averages mislead: one giant event can dwarf the sum of every small event combined, so the mean tells you about the rare, not the common.

Mark Newman's 2005 paper on power laws, Pareto distributions, and Zipf's law is the canonical plain-English review of where this shape shows up and how to test for it honestly. Spoiler: many things people casually call power laws are something else (log-normal, stretched exponential), but the real cases are plenty striking.

What a power law is not

The label gets glued onto any long-tailed chart. A few things that look power-law-ish but aren't.

Not a Gaussian. A bell curve has a sharp peak at the mean and vanishing tails. Human heights, measurement errors, and IQ scores live there. Power laws have no peak and no safe average. If you can name a "typical" value, the data isn't power-law.
Not rare outliers. A magnitude-8 earthquake is not an anomaly bolted onto a Gaussian of small tremors. It sits on the same curve as the magnitude-2s, just further down the tail. Treat giant events as predicted, not exceptional.
Not measurement error. Heavy tails routinely get cleaned out of datasets as "noise" before analysis. They aren't noise. The extreme values are the signal, and throwing them away destroys the shape you were trying to measure.
Not the 80/20 cliché. Pareto's 80/20 is one specific ratio on one specific distribution. Real power laws can be 90/10, 99/1, or anything else depending on the exponent, and they span many orders of magnitude, not one management-book soundbite.

Where do you see power laws in the wild?

In any system that builds up energy slowly and releases it through a connected substrate. Earthquake magnitudes follow the Gutenberg-Richter law, a power law that spans nine orders of magnitude. City populations follow Zipf's law: the second-biggest city is roughly half the size of the biggest, the third is a third, and so on. Word frequencies in any language follow Zipf as well. Asset returns, the sizes of internet avalanches, the reach of viral posts, file-sizes on your laptop: all fat-tailed.

The deep reason is usually some combination of slow buildup, local connectivity, and a system sitting near a phase transition. Put differently: power laws come from criticality. That is the engine underneath most of the examples above.

Why do power laws matter?

Because they change what "worst case" means. In a Gaussian world, planning for three standard deviations above the mean covers almost every case you will ever see. In a power-law world, the biggest event in the record so far is almost never the biggest event possible; the tail just hasn't had enough time to draw one yet. Risk models that assume a bell curve quietly under-price the rare catastrophe by orders of magnitude, and that is where most real losses come from.

The Gutenberg-Richter law is the classic worked example. It says the frequency of earthquakes drops by a fixed factor every time the magnitude goes up by one. A magnitude-7 is about ten times rarer than a magnitude-6, a magnitude-8 ten times rarer again. The law fits data from the 1930s to today across every seismic zone measured, which is why seismologists can predict the statistics of shaking over the next century even though no one can predict a specific quake. Mark Newman's 2005 review lines this example up next to Zipf's law for cities and words, solar-flare energies, and forest-fire sizes. Same curve, same implication: stop asking "how big on average?" and start asking "how heavy is the tail?"

Try it in the sim

The Forest Fire simulation produces a power law in front of you. Trees grow slowly, lightning ignites rarely, fires spread to connected neighbours. The histogram panel keeps a running record of recent fire sizes.

Run the Critical regime preset for a couple of minutes. The histogram fills with many small burns and the occasional long bar. That long tail is the power law.
Push Growth rate toward Dense overload. The density overshoots criticality. Every fire is now huge. The distribution collapses onto the right side of the histogram. The tail eats everything.
Push it toward Sparse regrowth. Fires stay tiny because the forest never connects. The tail disappears; only the left side remains. Power laws need both extremes to co-exist.

Where power laws connect on this site

Power laws are the signature of criticality, so anywhere one shows up, the other is usually nearby. Cascades are the mechanism that populates the tail: one event trips the next, which trips two, and the occasional chain reaches across the whole system. Emergence is the reason no designer planned the curve; it falls out of the collective behaviour. The library holds the whole family. Free to link from a course on complex systems or use the sim in a class discussion about fat tails.