I keep running across statements saying things like “The average self-published author only sells 500 books in his lifetime” or “The average author earns less than X dollars/euros/pounds a year.”
Here’s the problem, and it goes back to the idea of the “long tail.”
Average (or arithmetic mean) is a measure of central tendency. It’s useful for determining how normally distributed data (the bell curve) clusters up. The average is going to be somewhere near the middle of the curve. The median of a distribution tells you where there are equal numbers of data points above and below the median. That’s really useful information when you’re working with a data set that’s normally distributed—like the average price per gallon of gasoline in the U.S., the average weight of a steer on your farm, or the average price per bushel of grain.
Gasoline, steers, and grain don’t have blockbusters. They don’t have flops. This station’s gasoline is pretty much the same as the next one’s. This grain may have a little higher or lower gluten factor, but by and large, winter wheat produces a particular kind of flour. The price per unit is generally normally distributed across a relatively narrow range, so knowing whether you’re getting a good deal on gas is simply a matter of knowing what the average for your area is.
Those tools—mean and median—aren’t as useful on data that’s not normally distributed, that’s actually on a completely different distribution—like the “power law” curve, or what we know better as the “long tail.”
Think of movies. A few movies are blockbusters. They make millions—even billions—of dollars, but in any given period you can usually count the number of blockbusters on your fingers. There just aren’t that many that have such high sales that the FAA needs to issue a warning to general aviation anytime somebody draws the graph. There are a lot of other movies that do really well, and way more movies that don’t actually produce that much revenue. Indie movies, underground films, some studio movies that tank for one reason or another.
Yes, you can calculate that the average new release in the U.S. makes some millions of dollars. But you can’t use that information to determine whether or not you should produce a movie. It could be a blockbuster or it could be a flop. The average doesn’t really matter because it’s the execution that counts.
It’s because movies are art. You can’t substitute a Thelma and Louise for an Empire Strikes Back. They’re not—to use an economic term—fungible.
Which brings us to authors and books.
Author earnings are based on book sales and any contract agreements that control the commerce.
Like movies, the power law curve predicts book sales. A very tiny percentage of all authors earn multiple billions. A slightly larger percentage earn multiple millions. A larger percentage earn hundreds of thousands, while even more earn tens of thousands, but almost everyone else earns less than a thousand dollars a year.
When you plot the sales curve by title or author against units, you see something like the long tail emerge. Same with earnings. A tiny slice has the biggest numbers, and the curve falls off fast until you’re left with a very long tail that ends at zero, with thousands and thousands of authors selling zero books and earning zero dollars. That’s even before we count the people who want to write but never actually finish anything and offer it to the market. Average and median mean nothing against that curve. There’s no useful information you can glean from a statistic that measures central tendency on a data set that has no center. Books, like movies, are not fungible. Execution matters above all else.
The next time somebody poops on your parade by pointing out that the average author only makes enough to go out to dinner once a year or that they never sell a book to somebody who isn’t family, remember the ancient statistician’s curse.
“May you drown crossing a river that is—on average—only six inches deep.”
Picture by Hay Kranen