This is a research project about when to start in individual and sprint races based on impact of conditions. I wanted to see in the data if starting early or late in the race had a positive or negative impact on the skiing and shooting of competitors.
For skiing I will use the Z score, and compare the athletes’ season average score to their actual race score, per athlete bib. This gives a much fairer number as strictly looking at the Z score per athlete, the skiing ability comes into play. When comparing to the season average of the athlete, you can say, regardless of skiing ability, if an athlete was faster (neg. or slower (+) than his or her average.
I also added the three weather data points at the start, after the start (+30 min.) and at the end of the race, around +80 minutes. like so:
The logic of the weather timing is a follows: “at the start” is time 0, or when the first athlete leaves. “After the start” is 30 minutes after the start of the race. In 30 minutes 60 athletes will start (half-minute intervals), and about 30 of them will spend most of their time in “at the start” weather. The the “at the start” group is bibs 1-30. Then the last measure point is when the last athlete finishes, so quite a bit later, depending on the type of race. On average I’d say bibs 31-80 spend most of their time in the “after the start” weather, and after that, they spend the most time on the “finish” weather.
So to see how every athlete performed compared to their season average, I subtracted the season value from the actual race value; any score above zero means slower than season average, and anything below would be faster than season average:
But this is still not giving much information, just lots of data. So to simplify and aggregate some data I looked at the bib numbers of athletes that fall into the different weather groups:
Well, it’s telling us more but perhaps a little to aggregated? Also we should have a look at the axis, as this image suggests large differences, the at the start group is only 0.0859 faster than season average, so the actual impact in this race example is actually very small.
To get to a detail level somewhere in between, I grouped the athletes by 10s of bib numbers, 1-10, 11-20, 21-30, etc., both for the Actual vs season average and the delta where I average the 10 athletes within each group:
This level of detail looks about right, here we can generally see the “weather groups” but still have a bit more details. The average lines also provide useful information when comparing the three weather groups, or the bib groups within a weather group.
Now we can do the same for Shooting Z Scores:
Now that you have read this please play with the dashboard located on Tableau Public and see where the starting bib combined with conditions had a positive or negative impact on the athletes.
About Post Author
Proud dad&husband; analyst & visualization specialist (Tableau, SQL & R); creator of Biathlon Analytics; blog poster on realbiathlon.com; passionate about biathlon, cross country skiing and canoeing