real biathlon
    • Athletes
    • Teams
    • Races
    • Seasons
    • Scores
    • Records
    • Blog(current)
    • More
      Patreon Content Course Profiles Explanations Shortcuts
      Error Report
      Privacy Policy About
    •     
  • Forum
  • Patreon
  • Twitter
  • YouTube
    Instagram
    Facebook

Recent Articles

  • Most improved athletes this winter
  • New biathlon point system
  • Historic biathlon results create expectations. But what about points?
  • What do you expect? Practical applications of the W.E.I.S.E.
  • Introducing W. E. I. S. E: the Win Expectancy Index based on Statistical Exploration, version 1

Categories

  • Biathlon Media
  • Biathlon News
  • Long-term trends
  • Statistical analysis
  • Website updates

Archives

  • 2022
    • December
    • June
    • May
    • March
    • February
    • January
  • 2021
    • December
    • November
    • September
    • July
    • June
    • May
    • April
    • March
    • February
    • January
  • 2020
    • December
    • November
    • August
    • June
    • March
  • 2015
    • December
  • 2013
    • August
    • July
  • 2012
    • July

Search Articles

Recent Tweets

Tweets by realbiathlon

Year: 2020

Is Oberhof the most challenging venue on the World Cup tour?

Posted on 2020-12-28 | by real biathlon | Leave a Comment on Is Oberhof the most challenging venue on the World Cup tour?

During the Christmas break, I worked on compiling a new data set: Statistics for each World Cup location. The full stats are available as bonus content (if you are interested in that you might have a look at the real biathlon Patreon page). Here’s a summary and some examples.

The upcoming World Cup stop, Oberhof, is probably not the most popular location among athletes, due to its notoriously bad weather, but the Oberhof shooting range (in parts because of the weather) has always been one of the most interesting. Here’s the data to back that up. Not only is Oberhof the venue with the lowest average hit rate (75.1%), it also has the highest average shooting time (36.8s) of regular World Cup venues (not including Brezno-Osrblie, which held its last race in 2006, when shooting times where generally slower than they are now).

All-time shooting results for regular Biathlon World Cup venues

VenueNationFirst
Year
Last
Year
RacesTotal
hit rate
(in %)
Prone
hit rate
(in %)
Standing
hit rate
(in %)
Shooting
Time
(in sec)
Prone
Time
(in sec)
Standing
Time
(in sec)
Antholz-AnterselvaITA
1975202023877.681.274.132.933.531.9
RuhpoldingGER
1978202023580.683.977.432.933.632.3
HochfilzenAUT
1978202118978.381.675.135.436.134.7
Oslo HolmenkollenNOR
1983201917280.082.577.630.531.829.2
OestersundSWE
1970202016878.481.875.135.337.533.1
OberhofGER
1984202016175.178.971.436.836.736.4
PokljukaSLO
1993202015079.682.976.333.733.434.1
KontiolahtiFIN
199020219578.581.775.433.534.232.4
Khanty-MansiyskRUS
200020167979.282.376.132.933.532.2
Brezno-OsrblieSVK
199620066079.582.976.138.435.341.4
LahtiFIN
198020075578.780.876.531.633.130.0
Nove MestoCZE
201220204179.282.875.731.632.630.6
PyeongChangKOR
200820183377.881.374.334.835.434.2
CanmoreCAN
198720192776.879.574.134.535.533.5
Soldier Hollow, UtahUSA
200120192080.583.977.233.834.333.3
Annecy-Le Grand BornandFRA
201420201883.285.880.728.930.127.7
SochiRUS
201320141783.686.081.130.731.529.9
Cesana San SicarioITA
200520061678.180.975.434.135.332.9
WhistlerCAN
200920101682.184.979.433.434.032.8
Fort Kent, MEUSA
200420111281.184.278.029.230.627.8
Presque Isle, MEUSA
201120161176.980.673.233.934.832.9
TrondheimNOR
20092009683.385.680.929.030.427.7
TyumenRUS
20182018683.785.581.928.629.827.4

Although I didn’t include the data here, it’s worth pointing out that Oberhof isn’t just challenging at the range, it also has one of the most difficult and selective tracks: on average, 13.1 of the top 30 athletes ski outside a +/- 30 sec range of the median – also the highest for any venue with more than 30 World Cup races.

The other German location, Ruhpolding, is almost the polar opposite; arguably the easiest regular World Cup range (average shooting percentage of 80.6%). Le Grand Bornand has an even higher hit rate (83.2%), but has also staged over 200 races less; it’s likely that percentage will regress to the mean at least somewhat if more events are held there. Antholz is noteworthy as well, having a relatively fast average shooting time, but a poor average hit rate; apparently the nice weather there combined with the altitude is deceptive.

Overall hit rate (including relays) | Oberhof vs. Hochfilzen

In the chart above you can see a comparison for overall hit rates (per race, 10 race moving average) for the last and the upcoming World Cup stops. Hochfilzen (on average) had roughly 5% better shooting results in the last 15 years.

Ski Speed (in km/h) in Oberhof | Men’s Non-Team races

Here’s the winner’s ski speed (in km/h) for men’s non-team events in Oberhof. Clearly, there are huge differences between seasons (a good example why the physical speed isn’t a great data point for long-term ski speed comparisons).

Total Shooting Time comparison | Hochfilzen vs. Oberhof

Lastly, I added a chart of the average total shooting times (per race, 10 race moving average). Hochfilzen and Oberhof are actually closer in that category, however, the shooting times in Hochfilzen got faster over the last decade, while there is no such trend in Oberhof.

Posted in Long-term trends, Statistical analysis | Tagged shooting

How did the competitive (ski) level change in biathlon?

Posted on 2020-12-27 | by biathlonanalytics | Leave a Comment on How did the competitive (ski) level change in biathlon?

While chatting on Instagram with the authors of the great Extra Runde podcast (although most of their pods are in German, they have some special editions in English too!) they suggested researching the change in skiing score over the seasons: In terms of research ideas it would be interesting to see what a -6% in skiing is worth today vs 5 or 10 years ago, if that’s possible. So how did the competitive level change in biathlon.

It took me a while to fully understand what we are trying to research here, and some assumptions and definitions need to be made. These are my assumptions and definitions I used in the article and visualization, hoping they align with what the guys from Extra Runde intended:

1. “-6% in skiing” refers to: back from median (in %): Arithmetic mean of percent back from each race’s median Course Time

2. I assumed “is worth today vs 5 or 10 years ago” meant how many seconds did you gain or lose in the different seasons. I calculated both the seconds behind or ahead of the course median time, as well as the course fastest time. For example, in the current season, if an athlete has a -6% in skiing, he or she is 103.02 seconds faster than the course median time, and 7.9 seconds faster than the fastest skier of the races. These values are calculated per race, and then we look at the mean per season.

3. “how did the competitive level change” is not so much a number but a difference between seasons of what athletes would gain or lose compared to the median and fastest times.

The Data and Dashboard

The data is from Real Biathlon’s Patreon subscription. You only need access to two data sets, Race Data Old Races (1958 – 2017) and Race Data Current Races (2018 – present). The dashboard referenced below contains data from the 2009-2010 season all the way up to the first trimester of races of the 2020-2021 season.

The dashboard can be found on my Tableau Public account, where it can be used interactively. Since it uses a chart type that is less common and sometimes confusion at first sight (sometimes referred to as a connected scatterplot used for showing the evolution of two variables based on a third variable), below or some examples to better understand the dashboard.

By default, the dashboard shows a -6% skiing score, so 6% faster than the course median time (Wikipedia: a median is a value separating the higher half from the lower half of a data sample, a population or a probability distribution. For a data set, it may be thought of as “the middle” value.) Note that this value can be changed by you by sliding the white dot over to the left or right, or using the two arrows:

The full dashboard looks like this and can be found on Tableau Public:

Since the data points are shown by season but the skiing times are calculated per individual race, the data points show seasonal averages. Let’s look at the first chart, specifically the two highlighted sections, and explain what they tell us (ft = fastest time, mt = median time):

For the highlighted women section, if you had a skiing score of -6% in the 2010-2011 season, you would have been 14.2 seconds slower than the Fastest Time, but 104.07 seconds faster than the Median Time. Now again, this is an average for all races in the 2010-2011 season. For the next season, a -6% skiing score would have gotten you 29.51 seconds behind the lead skier and almost 103 seconds ahead of the median time. This tells us that between the 2010-2011 and 2011-2012 seasons the general competition (difference Median Time) stayed roughly the same, but the lead skiers got stronger.

For the highlighted men section, for a -6% skier score, one would be ahead of the lead skier by 11.54 and 7.90 seconds respectively, but the difference with the Median Time stayed roughly the same.

Perhaps an even better example is the following, where between the 2014-2015 and 2015-2016, a -6% skier score would have been slower than the fastest skier in 2014-2015, but faster than the fastest skier in the following season!

Hopefully, this is now a bit more clear, and we’ll move on to the next chart which is really just a copy of the first chart with one difference: the minimum and maximum values for the vertical and horizontal axis are set, where in the first chart they change based on the data to be shown. The second chart just shows better how the skiing score impacts the seconds behind or ahead from one value (-6%) to the other (+6%):

The third chart is a more traditional line chart showing men’s and women’s time behind or ahead of the fastest (top) and median (bottom) course times per season:

The final combination of charts shows the median time behind for every athlete for every race in the season. The boxplot (or box-and-whisker plot) indicates the distribution of values or spread of the data. Boxes indicate the middle 50 percent of the data (that is, the middle two quartiles of the data’s distribution), the black lines, called whiskers, display all points within 1.5 times the interquartile range (in other words, all points within 1.5 times the width of the adjoining box). Further, the combined chart shows the average fastest course time difference, the average median course time difference, the standard deviation (another way to define the spread of the data) and the variance, yet another way that tells us how data points in a specific population are spread out. 

Above we can see the 2011-2012 season had a larger spread than the previous season.

Conclusions

Although I acknowledge the main chart type is not easy to read at first sight, after reading the above I hope you better understand how to read it, and that it does a good job to further analyze the question “How did the competitive (ski) level change in biathlon?”. Specifically for the -6% skiing score, the trends tell a clear story:

With a -6% skiing score in the women’s field, the athletes spread is narrower and the fastest skiers are closer to the median: before the 2015-2016 season it would have you behind the lead skier(s) where since that season it makes you the fastest skier (so there are less skiers that stand out based on speed). Your lead over the median is declining which means the general ski speed is becoming faster.

For the men, there is also a declining lead over the median meaning the general skiing speed is improving for the field, but the -6% skiing score gets you less of a lead over the fastest skiers, meaning there are more really fast skiers.

As one would expect, the opposite skiiing score (6%) shows the arrows going in the opposite direction; slower athletes are getting closer to the median and since they are slower skiers are moving towards the fastest skiers as well:

So a big thank you to the guys from Extra Runde Biathlon for suggesting the research idea, and go give them a listen! And as usual, any feedback is appreciated.

Posted in Long-term trends, Statistical analysis

When something’s gotta give: Shooting Speed vs Accuracy

Posted on 2020-12-23 | by biathlonanalytics | Leave a Comment on When something’s gotta give: Shooting Speed vs Accuracy

Depending on the event, biathlon athletes fire 10 or 20 shots per race, with the goal to a) hit the target and b) do it as fast as possible. At first thought, you would think the longer you take per shot, the higher your chances of hitting it. But further analysis shows that this train of thought could be wrong.

First things first. For this analysis, I used Real Biathlon data for the 2017-2018, 2018-2019, 2019-2020 and the first trimester of the 2020-2021 seasons, non-team races only. We’re talking 172,765 shots fired by both male and female athletes, of which 32,162 missed the target, a ratio of just over 18.5%.

But were those misses only influenced by how fast the shooter pulled the trigger after getting ready or the previous shot? And what other factors would influence the result of a shot, that we do and do not have reliable data on? First lap or last lap, being in front or behind, the weather (think wind, temperature, precipitation), visibility, the pressure resulting from race situations or proximity of other athletes, distractions from the crowd (when present), officials, other athletes, a miss disruption the flow, etc. With so many other factors, the shooting time clearly isn’t the only external factor influencing shot results. But it is probably one that athletes at least can try to control.

To analyze the shooting I looked at the result of each shot and how many seconds it was taken after getting ready (first shot) or after the previous shot (shots two, three, four and five for every shoot), the shot intervals. Those shot intervals are measured in tenths of seconds, so to make the data more manageable and understandable, the intervals are binned: group multiple similar shot intervals together in buckets. The following chart shows the number of shots per bucket (top) and the shooting percentages per bucket (bottom):

For example, bucket 4 shows all shots with an interval between 4 and 4.99 seconds: 8,120 shots in total fall in this category, and 23.8% of those missed the target.

If we look at the top part of the chart, we can see that the majority of the shots are fired within five seconds, or between 11 and 18 seconds. Clearly this is the general distinction between shots 2, 3, 4 and 5 and the first shots respectively.

The bottom of the chart shows the miss-rate: what percentage of shots were missed for a particular bucket. And we can see that the longer the athlete waits, the higher the miss rate becomes, up to seven seconds. My guess is that after holding their breath for seven seconds they have to take one or two breaths, which has a positive effect on the hit rate making it go down for a number of seconds. The increase in miss-rate starting around the 13-second bucket likely shows the same impact from holding the breath for the first shot, typically taken after about 11-12 seconds. Even longer shot intervals resulting in more misses could be explained by lack of confidence, legs starting to shake, athletes starting to think too much about their shot, self-doubt, etc. Also rifle malfunction can play a role here, specifically for the long shot intervals.

Now let’s look at some examples of individual athletes. To avoid confusion the following charts only look at shots 2, 3, 4 and 5, as the first shot takes much longer than these other shots.

Wierer shoots fast and well, the majority of her shots in the one or two second buckets. It’s clear when things don’t go as planned and take over five seconds: the miss-rate goes up fast.

Eckhoff clearly takes more time, yet with a higher miss-rate. Only her 4, 6 and 7 second buckets have a miss-rate under 15%.

Hanna Oeberg is another fast shooter who does well in all buckets under 6 seconds except for the three-second bucket.

Alimbekava is also a fast shooter with almost all shots under 4 seconds. The miss-rate varies.

Roeiseland clearly takes a bit more time than say Wierer, and she shows a clear pattern: the longer the shot takes the higher the miss-rate.

Hauser is another fast shooter who shows a clear pattern: under 3 seconds misses roughly 10%, the longer the shot the more she misses.

Super-fast skier Herrmann shows her generally higher miss-rate, but when shooting under 2 seconds actually misses less than 10%.

Davidova shoots quite slowly compared to the athletes above, with the miss-rate increasing as she takes more time.

Lastly, one of the best shooters in the current season, Zdouc initially shows the “expected” pattern: more time leads to less misses. But that only applies for the first three buckets.

Generally, it can be argued that the general expectation that taking longer for a shot leads to better results is not true for these women above. Now let’s look at the men:

JT Boe shows good consistency (did I mention he leads the World Cup Standings?) up to 6 seconds, but he rarely takes that long.

Brother Tarjei shows a very similar pattern, but with a higher miss-rate.

QFM shows the opposite again of the expected trend, the longer he waits the higher the miss-rate.

Fellow Frenchman Jacquelin never shoots under a second, but almost all shots under four. Again the rate goes up as the shots take longer.

Loginov is an extremely fast shooter, with almost all shots under three seconds. Based on his chart taking 4 seconds per shot could lead to great results (though based on a small sample size).

Peiffer clearly is one of the slower shooters, hardly ever taking less than two seconds. His miss-rates are good though, especially in the 2-3 seconds range which he is in the most, by far.

Samuelsson shoots very consistently in two or three seconds per shot, but as we have now seen many times, the longer the shot takes the higher the miss-rate.

Eder, the leader of the Shooting Statistics list of the IBU, simply shoots very fast with a very low miss-rate. Again, even with the best shooter, the longer he takes, the higher the miss-rate.

Although all examples above don’t give a clear answer to what influences shot results, it is clear that taking more time per shot does not lead to better results. Of course, the athletes above are only a very small subset of all participants and represent the upper regions of the standings. But when going through the athletes in the lower regions of the standings the trend doesn’t change: taking more time per shot does NOT lead to better results.

The last chart doesn’t use the buckets but looks at all shots per specific shot interval (remember, down to a tenth of a second) and the miss-ratio for that shot interval. Here too, I took out all first shots.

The trend is going up rather than going down, meaning more seconds per shot means more misses. Removing some of the shot intervals with less than ten shots gives a nicer picture, but with the same conclusion: taking more time per shot does NOT lead to better results:

On Tableau Public I uploaded a dashboard that shows the same data in a slightly different visual presentation, but it allows to filter for athletes, disciplines, stand or prone, etc. Go have a look a play around with it, and let me know if you find anything interesting.

Posted in Long-term trends, Statistical analysis | Tagged shot accuracy, shot speed
Page 3 from the Athletes Research Tool dashboard

Using Real Biathlon data to create a dashboard in Tableau

Posted on 2020-12-16 | by biathlonanalytics | 1 Comment on Using Real Biathlon data to create a dashboard in Tableau

Football and baseball are huge sports in the fantasy sports world. Biathlon is not, however that doesn’t mean it is not there at all. For example, the sports department of the German television corporation ARD has what they call the Biathlon Tipp Spiel, freely translated as the biathlon guessing game. It allows participants to predict the top 5 of any upcoming race in the IBU World Cup circuit, and although thankfully biathlon is unpredictable enough to make this pretty hard, I wanted to have a quick look into previous results to see “who’s hot and who’s not”. The following blog-post describes the steps I took to create the Puck Possessed Biathlon Athletes Research Tool on Tableau Public. For those of you who eagerly clicked on the link, please be patient as the data loads 3+ season of detailed race results. Update: I created a clone that eliminates the 2017-2018 season, resulting in better performance of the dashboards.

The data

Since the Real Biathlon data is now available through Patreon, I downloaded some of the more current race results using R. Now, there are many other coding languages and ways to do it, but since I’m most familiar with R, that is what I used. The following paragraph is a description of how the get the data using R (assuming you have a subscription). If you’re not interested in the technical stuff, skip right ahead to the Data Visualization section below.

First we need to connect to the Mongo Data base with the username and password that comes with the Patreon subscription:

install.packages("mongolite", "tidyverse", "dplyr", "jsonlite")
library(mongolite)
library(dplyr)
library(tidyverse)
library(jsonlite)

# Set username and pasword
mongousr <- "--your username--"
mongopw <- "--your password--"

# Set the collection, database and prefix to create the url
rbcol <- "Races"
rbdb <- "Results"
rbpref <- "biathloncluster-ay3ak"
rburl <- paste("mongodb+srv://",mongousr,":",mongopw,"@",rbpref,".mongodb.net/<dbname>?retryWrites=true&w=majority", sep="")

# Use the URL created above to connect to the correct MongoDB data
rbmongo <- mongo(collection = rbcol, db = rbdb, url = rburl, verbose = TRUE)

Now we can connect to the database. To gather all the data I wanted for my dashboard, I first got data that had all raceIds I wanted to download. Then I created a loop to go through these raceIds one by one and download the file. Below is just the code to get one single file into Tableau. Perhaps I’ll show the loop code in another blog post sometime.

# Get data from the Mongo connection created above by searching for one specific raceId
RaceBT2021SWRLCP01SWSP <- rbmongo$find('{"raceId" : "BT2021SWRLCP01SWSP"}')

# Convert the file to json
RaceBT2021SWRLCP01SWSPjson <- toJSON(RaceBT2021SWRLCP01SWSP)

# Write the json file to your computer
write(RaceBT2021SWRLCP01SWSPjson, "RaceBT2021SWRLCP01SWSPjson.json")

And that is all it takes to connect, load a file and save it as a json file. One could also save as a flat csv file here, but to do that you will have to manipulate the loaded file first as it comes with nested data, multiple levels deep. Since Tableau Public reads json files natively, I decided that using the power of Tableau Public is far more time-efficient.

Data visualization

Although the above code generates one json file for one race, for my specific dashboard I got a file for every race since the 2017-2018 season, creating over 200 files. With those sitting on my hard drive, eagerly awaiting to be visualized, do the following:

Open Tableau Public and connect to a Json file

Select one json file specifically

Drag all other json files (I assume all files are in the same folder as the first file) right below the one file from the screenshot above

Select the Schema Levels to only get the data I want to use (resist the temptation to select all when you see all the goodness that is available in these files, and stick to the KISS principle)

Now you can create a new Sheet and start on your visualization. I must admit working with the nested json files takes a little time to get used to if you are used to dealing with flat files, but in the end it works quite well!


Since I wanted to have information on athletes specifically to help me pick future winners, I wanted to make three levels of information, or dashboards: one for one race, specifically the most recent one or the most recent of the same type and on the same location as the one I’m predicting for, one to show me current form by looking at the results for the current season to date, and one for similar events in the past (so all sprint races in the last couple of seasons, or all races in Hochfilzen, etc.)

Tab 1 Race Details shows infomartion for one race, while highlighting one athlete of choice

Tab 2 Current Season Information shows information about the selected athlete that gives the reader an idea if the athlete is hot or not, or on an upward or downward trend.

Tab 3 Similar Events Results shows how athletes have performed in previous similar races as the one you are predicting for.

So please go have a look at the dashboards (full and small) and let me know what you think. And good luck making your own dashboards based on the real biathlon Patreon data subscription!

Posted in Statistical analysis | Tagged Data subscription, data visualization, Patreon, R, Tableau

New features: box plots and course profiles

Posted on 2020-12-10 | by real biathlon | Leave a Comment on New features: box plots and course profiles

I made a few updates to the site, adding box plots to athlete and team stats pages, course profiles for all World Cup 3.3 km loops and an explanation page for the most used stats (courses and explanations can be found in the navigation bar ▷ More).

The box plot allows quick graphical examination of one or more data sets and is useful for comparing distributions between several groups or sets of data. Mathematically speaking, it offers a more robust measure than a single value, which is otherwise used on this site. A box plot is a standardized way of displaying a data set based on a five-number summary: minimum, lower quartile (Q1), median, upper quartile (Q3) and maximum. The box is drawn from Q1 to Q3 with a horizontal line drawn in the middle to denote the median.

The distance between the upper and lower quartiles is known as the Interquartile range (IQR). From above the upper quartile, a distance of 1.5 times the IQR is measured out and a whisker is drawn up to the largest observed point from the dataset that falls within this distance. Similarly, a distance of 1.5 times the IQR is measured out below the lower quartile and a whisker is drawn up to the lower observed point from the dataset that falls within this distance. All other observed points are plotted as outliers.

The data for each athlete’s box plots can be filtered by season, discipline or even more precisely with a time range slider if you select “Specified Range“. Every single stat category (all except the first five in the dropdown list) also allow a per Season series visualization (the one you can see above).

Forum member PolitiskTeoriFan made these nice looking course profiles and agreed to have them posted here. Thanks a lot for that! I created a new page where you can click through all of them. Unfortunately, visualizations exist only for the 3.3km loops right now. However, they should still be useful, even for other races. At most venues this 3.3km loop is usually just an extension of shorter loops and you can use the split time positions for orientation; they rarely change between races.

Lastly, I added a page with general explanations for all major statistics. This was previously only available (hidden) under the info icon on the seasons stats page.

Posted in Website updates

Posts navigation

Older posts

Recent Articles

  • Most improved athletes this winter
  • New biathlon point system
  • Historic biathlon results create expectations. But what about points?
  • What do you expect? Practical applications of the W.E.I.S.E.
  • Introducing W. E. I. S. E: the Win Expectancy Index based on Statistical Exploration, version 1

Categories

  • Biathlon Media
  • Biathlon News
  • Long-term trends
  • Statistical analysis
  • Website updates

Archives by Month

  • 2022: J F M A M J J A S O N D
  • 2021: J F M A M J J A S O N D
  • 2020: J F M A M J J A S O N D
  • 2015: J F M A M J J A S O N D
  • 2013: J F M A M J J A S O N D
  • 2012: J F M A M J J A S O N D

Search Articles