
In the last post we learned that by some measures Lance Armstrong is the best Tour rider of all time. In an earlier post it looked very much like Eddt Merckx deserves that appelation.
Even though it's not realistic to compare athletes from different eras in an absolute sense, let’s take a look at what an application of simple, straight-forward statistical methods can do to help us get an answer. Without sliding too far down the treacherous path of boring theory and equally numbing math, let’s summarize the techniques we can use to get an objective answer to the question posed at the beginning of this article.
The science of Statistics tells us that most events in nature occur randomly, but with frequencies that, when plotted, resemble a bell. These bell-shaped curves are called normal distributions. For each such distribution, we can calculate a mean and a standard deviation, characteristics of a bell-shaped curve that help us understand, amongst many other things, the rarity of any data point in the population represented by the curve. The standard deviation, which is a measure of dispersion or variation in a population, is commonly referred to by the Greek name of the symbol commonly used to represent it: sigma. When you hear someone refer to a characteristic, observation or performance as a second-sigma or third-sigma result, they are commenting on its rarity. For a reasonably symmetric and bell-shaped population, one sigma (in each direction from the mean) takes in 68% of the results, 2 sigmas account for 95% of the data and 3 sigmas will cover close to 99% of all results. Another way to look at this is to say that a 3-sigma data point will happen about once in every 100 observations.
With the definitions in the previous paragraph in mind, if the riders in a race such as the Tour de France represent a normal distribution of professional cycling talent, and if we assume that each year’s peloton of Tour racers is representative of the best group of professional racers available, we can use standard deviations to calculate which rider’s performance is farthest ahead of his peers. Here is a simple thesis: he who has the largest standard deviation (in the “positive” direction) is the best rider.
An example from another (but equally drug-plagued) sport might help clarify this notion. Babe Ruth hit 60 home runs in 1927, a single season record. In 2001, Barry Bonds set a new record by hitting 73 homers, 20% more than the Babe could muster some 74 years earlier. Is Barry’s home run total the best ever? In terms of raw numbers, there can be no argument about it. But if you knew a little more about the normal levels of performance in the two years in question, perhaps you would think not.
In 1927, the average number of home runs hit by the population of batters with more than 300 at bats was 6.23. The average number of home runs by the same population in 2001 was 17.89. Both populations approximate somewhat skewed normal distributions, but the standard deviation in 1927 was 8.58, while it was 12.13 for 2001. A little additional calculating brings us to this startling realization: in 2001, Barry Bond’s performance was a beyond-the-4th-sigma result! But that extraordinary feat pales by comparison with the Babe’s. He slugged his way past the 6th sigma in 1927, an accomplishment so rare, it should be expected approximately never in a strict, reproducible normal distribution. Because seasonal home run totals are not rigid normal distributions and are subject to year-on-year variations in mean and standard deviation, it’s beyond our scope here to calculate how rare the Babe’s performance was, but it’s highly unlikely that any other statistic in the entire 130+ years of recorded baseball history comes close to Babe’s 6+ sigma effort in 1927.
Having already risked putting every reader staring glaze-eyed at this article into preternatural slumber, we’ll dispense with the laborious calculations necessary to prove anything, and simply ask you to take it on faith that Tour results tend to take on bell-shaped characteristics. After calculating distribution characteristics for the final general classification (gc) results for every Tour ever run, then applying very minor adjustments to help smooth the results over the course of the Tour’s 100+ year history, it’s possible to come up with a ranking for every rider-year (all 7694 of them). A rider-year is meant to imply that Lance Armstrong’s performance in 1999 is ranked separately from his performance in 2001.
Let’s cut to the chase. So who is the greatest Tour rider of all time, based on the “best” standard deviation in a single Tour? The list of the top 21 single-year performances follows, but fair warning, you’re not going to like it!
| Rank | Sigma | Year | Rider |
| 1 | 2.68 | 1986 | Greg LeMond |
| 2 | 2.60 | 1986 | Bernard Hinault |
| 3 | 2.39 | 1986 | Urs Zimmermann |
| 4 | 2.38 | 1969 | Eddy Merckx |
| 5 | 2.35 | 1971 | Eddy Merckx |
| 6 | 2.33 | 1958 | Charly Gaul |
| 7 | 2.31 | 1997 | Jan Ullrich |
| 8 | 2.28 | 1979 | Bernard Hinault |
| 9 | 2.27 | 1958 | Vito Favero |
| 10 | 2.26 | 1958 | Raphael Geminiani |
| 11 | 2.24 | 1972 | Eddy Merckx |
| 12 | 2.20 | 1958 | Jan Adriaensens |
| 13 | 2.18 | 1986 | Andrew Hampsten |
| 14 | 2.18 | 1989 | Greg LeMond |
| 15 | 2.18 | 1989 | Laurent Fignon |
| 16 | 2.17 | 1997 | Richard Virenque |
| 17 | 2.14 | 1987 | Stephen Roche |
| 18 | 2.13 | 1987 | Pedro Delgado |
| 19 | 2.12 | 1975 | Bernard Thevenet |
| 20 | 2.12 | 1991 | Miguel Indurain |
| 21 | 2.12 | 1999 | Lance Armstrong |
You might also enjoy these: