While chatting on Instagram with the authors of the great Extra Runde podcast (although most of their pods are in German, they have some special editions in English too!) they suggested researching the change in skiing score over the seasons: *In terms of research ideas it would be interesting to see what a -6% in skiing is worth today vs 5 or 10 years ago, if that’s possible. So how did the competitive level change in biathlon.*

It took me a while to fully understand what we are trying to research here, and **some assumptions and definitions need to be made**. These are my assumptions and definitions I used in the article and visualization, hoping they align with what the guys from Extra Runde intended:

1. “*-6% in skiing*” refers to: **back from median (in %):** Arithmetic mean of percent back from each race’s median *Course Time*

2. I assumed “*is worth today vs 5 or 10 years ago*” meant **how many seconds did you gain or lose in the different seasons**. I calculated both the seconds behind or ahead of the** course median time**, as well as the **course fastest time**. For example, in the current season, if an athlete has a -6% in skiing, he or she is 103.02 seconds faster than the course median time, and 7.9 seconds faster than the fastest skier of the races. These values are calculated per race, and then we look at the mean per season.

3. “*how did the competitive level change*” is not so much a number but a difference between seasons of what athletes would gain or lose compared to the median and fastest times.

## The Data and Dashboard

The data is from Real Biathlon’s Patreon subscription. You only need access to two data sets, *Race Data Old Races (1958 – 2017)* and *Race Data Current Races (2018 – present)*. The dashboard referenced below contains **data from the 2009-2010 season all the way up to the first trimester of races of the 2020-2021 season**.

The dashboard can be found on my Tableau Public account, where it can be used interactively. Since it uses a chart type that is less common and sometimes confusion at first sight (sometimes referred to as a connected scatterplot used for showing the evolution of two variables based on a third variable), below or some examples to better understand the dashboard.

By default, the dashboard shows a -6% skiing score, so 6% faster than the course median time (Wikipedia: a **median** is a value separating the higher half from the lower half of a data sample, a population or a probability distribution. For a data set, it may be thought of as “the middle” value.) Note that this value can be changed by you by sliding the white dot over to the left or right, or using the two arrows:

The full dashboard looks like this and can be found on Tableau Public:

Since the data points are shown by season but the skiing times are calculated per individual race, the **data points show seasonal averages**. Let’s look at the first chart, specifically the two highlighted sections, and explain what they tell us (ft = fastest time, mt = median time):

For the highlighted women section, if you had a skiing score of -6% in the 2010-2011 season, you would have been 14.2 seconds slower than the Fastest Time, but 104.07 seconds faster than the Median Time. Now again, this is an average for all races in the 2010-2011 season. For the next season, a -6% skiing score would have gotten you 29.51 seconds behind the lead skier and almost 103 seconds ahead of the median time. This tells us that between the 2010-2011 and 2011-2012 seasons the general competition (difference Median Time) stayed roughly the same, but the lead skiers got stronger.

For the highlighted men section, for a -6% skier score, one would be ahead of the lead skier by 11.54 and 7.90 seconds respectively, but the difference with the Median Time stayed roughly the same.

Perhaps an even better example is the following, where between the 2014-2015 and 2015-2016, a -6% skier score would have been slower than the fastest skier in 2014-2015, but faster than the fastest skier in the following season!

Hopefully, this is now a bit more clear, and we’ll move on to the next chart which is really just a copy of the first chart with one difference: the minimum and maximum values for the vertical and horizontal axis are set, where in the first chart they change based on the data to be shown. The second chart just shows better how the skiing score impacts the seconds behind or ahead from one value (-6%) to the other (+6%):

The third chart is a more traditional line chart showing men’s and women’s time behind or ahead of the fastest (top) and median (bottom) course times per season:

The final combination of charts shows the median time behind for every athlete for every race in the season. The boxplot (or box-and-whisker plot) indicates the distribution of values or spread of the data. Boxes indicate the middle 50 percent of the data (that is, the middle two quartiles of the data’s distribution), the black lines, called *whiskers*, display all points within 1.5 times the interquartile range (in other words, all points within 1.5 times the width of the adjoining box). Further, the combined chart shows the average fastest course time difference, the average median course time difference, the standard deviation (another way to define the spread of the data) and the variance, yet another way that tells us how data points in a specific population are spread out.

Above we can see the 2011-2012 season had a larger spread than the previous season.

## Conclusions

Although I acknowledge the main chart type is not easy to read at first sight, after reading the above I hope you better understand how to read it, and that it does a good job to further analyze the question “How did the competitive (ski) level change in biathlon?”. Specifically for the -6% skiing score, the trends tell a clear story:

With a -6% skiing score in the women’s field, the athletes spread is narrower and the fastest skiers are closer to the median: before the 2015-2016 season it would have you behind the lead skier(s) where since that season it makes you the fastest skier (so there are less skiers that stand out based on speed). Your lead over the median is declining which means the general ski speed is becoming faster.

For the men, there is also a declining lead over the median meaning the general skiing speed is improving for the field, but the -6% skiing score gets you less of a lead over the fastest skiers, meaning there are more really fast skiers.

As one would expect, the opposite skiiing score (6%) shows the arrows going in the opposite direction; slower athletes are getting closer to the median and since they are slower skiers are moving towards the fastest skiers as well:

So a big thank you to the guys from Extra Runde Biathlon for suggesting the research idea, and go give them a listen! And as usual, any feedback is appreciated.