real biathlon
    • Athletes
    • Teams
    • Races
    • Seasons
    • Scores
    • Records
    • Blog(current)
    • More
      Patreon Content Course Profiles Explanations Shortcuts
      Error Report
      About
  • Forum
  • Patreon
  • Twitter
  • Facebook

Recent Articles

  • Norwegian Dominance
  • Overall performance scores, season-to-season improvements
  • “Whether the weather is better or worse, the race is still always made on the course”
  • Fehlerfrei – a quick article on shooting clean
  • Shooting Efficiency comparison: First trimester 2019–20 vs. First trimester 2020–21

Categories

  • Biathlon News
  • Long-term trends
  • Statistical analysis
  • Website updates

Archives

  • 2021
    • January
  • 2020
    • December
    • November
    • August
    • June
    • March
  • 2015
    • December
  • 2013
    • August
    • July
  • 2012
    • July

Search Articles

Recent Tweets

Tweets by realbiathlon

Author: Najtrebor

When something’s gotta give: Shooting Speed vs Accuracy

Posted on 2020-12-23 | by Najtrebor | Leave a Comment on When something’s gotta give: Shooting Speed vs Accuracy

Depending on the event, biathlon athletes fire 10 or 20 shots per race, with the goal to a) hit the target and b) do it as fast as possible. At first thought, you would think the longer you take per shot, the higher your chances of hitting it. But further analysis shows that this train of thought could be wrong.

First things first. For this analysis, I used Real Biathlon data for the 2017-2018, 2018-2019, 2019-2020 and the first trimester of the 2020-2021 seasons, non-team races only. We’re talking 172,765 shots fired by both male and female athletes, of which 32,162 missed the target, a ratio of just over 18.5%.

But were those misses only influenced by how fast the shooter pulled the trigger after getting ready or the previous shot? And what other factors would influence the result of a shot, that we do and do not have reliable data on? First lap or last lap, being in front or behind, the weather (think wind, temperature, precipitation), visibility, the pressure resulting from race situations or proximity of other athletes, distractions from the crowd (when present), officials, other athletes, a miss disruption the flow, etc. With so many other factors, the shooting time clearly isn’t the only external factor influencing shot results. But it is probably one that athletes at least can try to control.

To analyze the shooting I looked at the result of each shot and how many seconds it was taken after getting ready (first shot) or after the previous shot (shots two, three, four and five for every shoot), the shot intervals. Those shot intervals are measured in tenths of seconds, so to make the data more manageable and understandable, the intervals are binned: group multiple similar shot intervals together in buckets. The following chart shows the number of shots per bucket (top) and the shooting percentages per bucket (bottom):

For example, bucket 4 shows all shots with an interval between 4 and 4.99 seconds: 8,120 shots in total fall in this category, and 23.8% of those missed the target.

If we look at the top part of the chart, we can see that the majority of the shots are fired within five seconds, or between 11 and 18 seconds. Clearly this is the general distinction between shots 2, 3, 4 and 5 and the first shots respectively.

The bottom of the chart shows the miss-rate: what percentage of shots were missed for a particular bucket. And we can see that the longer the athlete waits, the higher the miss rate becomes, up to seven seconds. My guess is that after holding their breath for seven seconds they have to take one or two breaths, which has a positive effect on the hit rate making it go down for a number of seconds. The increase in miss-rate starting around the 13-second bucket likely shows the same impact from holding the breath for the first shot, typically taken after about 11-12 seconds. Even longer shot intervals resulting in more misses could be explained by lack of confidence, legs starting to shake, athletes starting to think too much about their shot, self-doubt, etc. Also rifle malfunction can play a role here, specifically for the long shot intervals.

Now let’s look at some examples of individual athletes. To avoid confusion the following charts only look at shots 2, 3, 4 and 5, as the first shot takes much longer than these other shots.

Wierer shoots fast and well, the majority of her shots in the one or two second buckets. It’s clear when things don’t go as planned and take over five seconds: the miss-rate goes up fast.

Eckhoff clearly takes more time, yet with a higher miss-rate. Only her 4, 6 and 7 second buckets have a miss-rate under 15%.

Hanna Oeberg is another fast shooter who does well in all buckets under 6 seconds except for the three-second bucket.

Alimbekava is also a fast shooter with almost all shots under 4 seconds. The miss-rate varies.

Roeiseland clearly takes a bit more time than say Wierer, and she shows a clear pattern: the longer the shot takes the higher the miss-rate.

Hauser is another fast shooter who shows a clear pattern: under 3 seconds misses roughly 10%, the longer the shot the more she misses.

Super-fast skier Herrmann shows her generally higher miss-rate, but when shooting under 2 seconds actually misses less than 10%.

Davidova shoots quite slowly compared to the athletes above, with the miss-rate increasing as she takes more time.

Lastly, one of the best shooters in the current season, Zdouc initially shows the “expected” pattern: more time leads to less misses. But that only applies for the first three buckets.

Generally, it can be argued that the general expectation that taking longer for a shot leads to better results is not true for these women above. Now let’s look at the men:

JT Boe shows good consistency (did I mention he leads the World Cup Standings?) up to 6 seconds, but he rarely takes that long.

Brother Tarjei shows a very similar pattern, but with a higher miss-rate.

QFM shows the opposite again of the expected trend, the longer he waits the higher the miss-rate.

Fellow Frenchman Jacquelin never shoots under a second, but almost all shots under four. Again the rate goes up as the shots take longer.

Loginov is an extremely fast shooter, with almost all shots under three seconds. Based on his chart taking 4 seconds per shot could lead to great results (though based on a small sample size).

Peiffer clearly is one of the slower shooters, hardly ever taking less than two seconds. His miss-rates are good though, especially in the 2-3 seconds range which he is in the most, by far.

Samuelsson shoots very consistently in two or three seconds per shot, but as we have now seen many times, the longer the shot takes the higher the miss-rate.

Eder, the leader of the Shooting Statistics list of the IBU, simply shoots very fast with a very low miss-rate. Again, even with the best shooter, the longer he takes, the higher the miss-rate.

Although all examples above don’t give a clear answer to what influences shot results, it is clear that taking more time per shot does not lead to better results. Of course, the athletes above are only a very small subset of all participants and represent the upper regions of the standings. But when going through the athletes in the lower regions of the standings the trend doesn’t change: taking more time per shot does NOT lead to better results.

The last chart doesn’t use the buckets but looks at all shots per specific shot interval (remember, down to a tenth of a second) and the miss-ratio for that shot interval. Here too, I took out all first shots.

The trend is going up rather than going down, meaning more seconds per shot means more misses. Removing some of the shot intervals with less than ten shots gives a nicer picture, but with the same conclusion: taking more time per shot does NOT lead to better results:

On Tableau Public I uploaded a dashboard that shows the same data in a slightly different visual presentation, but it allows to filter for athletes, disciplines, stand or prone, etc. Go have a look a play around with it, and let me know if you find anything interesting.

Posted in Long-term trends, Statistical analysis | Tagged shot accuracy, shot speed
Page 3 from the Athletes Research Tool dashboard

Using Real Biathlon data to create a dashboard in Tableau

Posted on 2020-12-16 | by Najtrebor | 1 Comment on Using Real Biathlon data to create a dashboard in Tableau

Football and baseball are huge sports in the fantasy sports world. Biathlon is not, however that doesn’t mean it is not there at all. For example, the sports department of the German television corporation ARD has what they call the Biathlon Tipp Spiel, freely translated as the biathlon guessing game. It allows participants to predict the top 5 of any upcoming race in the IBU World Cup circuit, and although thankfully biathlon is unpredictable enough to make this pretty hard, I wanted to have a quick look into previous results to see “who’s hot and who’s not”. The following blog-post describes the steps I took to create the Puck Possessed Biathlon Athletes Research Tool on Tableau Public. For those of you who eagerly clicked on the link, please be patient as the data loads 3+ season of detailed race results. Update: I created a clone that eliminates the 2017-2018 season, resulting in better performance of the dashboards.

The data

Since the Real Biathlon data is now available through Patreon, I downloaded some of the more current race results using R. Now, there are many other coding languages and ways to do it, but since I’m most familiar with R, that is what I used. The following paragraph is a description of how the get the data using R (assuming you have a subscription). If you’re not interested in the technical stuff, skip right ahead to the Data Visualization section below.

First we need to connect to the Mongo Data base with the username and password that comes with the Patreon subscription:

install.packages("mongolite", "tidyverse", "dplyr", "jsonlite")
library(mongolite)
library(dplyr)
library(tidyverse)
library(jsonlite)

# Set username and pasword
mongousr <- "--your username--"
mongopw <- "--your password--"

# Set the collection, database and prefix to create the url
rbcol <- "RacesList"
rbdb <- "Results"
rbpref <- "biathloncluster-ay3ak"
rburl <- paste("mongodb+srv://",mongousr,":",mongopw,"@",rbpref,".mongodb.net/<dbname>?retryWrites=true&w=majority", sep="")

# Use the URL created above to connect to the correct MongoDB data
rbmongo <- mongo(collection = rbcol, db = rbdb, url = rburl, verbose = TRUE)

Now we can connect to the database. To gather all the data I wanted for my dashboard, I first got data that had all raceIds I wanted to download. Then I created a loop to go through these raceIds one by one and download the file. Below is just the code to get one single file into Tableau. Perhaps I’ll show the loop code in another blog post sometime.

# Get data from the Mongo connection created above by searching for one specific raceId
RaceBT2021SWRLCP01SWSP <- rbmongo$find('{"raceId" : "BT2021SWRLCP01SWSP"}')

# Convert the file to json
RaceBT2021SWRLCP01SWSPjson <- toJSON(RaceIdString)

# Write the json file to your computer
write(RaceBT2021SWRLCP01SWSPjson, "RaceBT2021SWRLCP01SWSPjson.json")

And that is all it takes to connect, load a file and save it as a json file. One could also save as a flat csv file here, but to do that you will have to manipulate the loaded file first as it comes with nested data, multiple levels deep. Since Tableau Public reads json files natively, I decided that using the power of Tableau Public is far more time-efficient.

Data visualization

Although the above code generates one json file for one race, for my specific dashboard I got a file for every race since the 2017-2018 season, creating over 200 files. With those sitting on my hard drive, eagerly awaiting to be visualized, do the following:

Open Tableau Public and connect to a Json file

Select one json file specifically

Drag all other json files (I assume all files are in the same folder as the first file) right below the one file from the screenshot above

Select the Schema Levels to only get the data I want to use (resist the temptation to select all when you see all the goodness that is available in these files, and stick to the KISS principle)

Now you can create a new Sheet and start on your visualization. I must admit working with the nested json files takes a little time to get used to if you are used to dealing with flat files, but in the end it works quite well!


Since I wanted to have information on athletes specifically to help me pick future winners, I wanted to make three levels of information, or dashboards: one for one race, specifically the most recent one or the most recent of the same type and on the same location as the one I’m predicting for, one to show me current form by looking at the results for the current season to date, and one for similar events in the past (so all sprint races in the last couple of seasons, or all races in Hochfilzen, etc.)

Tab 1 Race Details shows infomartion for one race, while highlighting one athlete of choice

Tab 2 Current Season Information shows information about the selected athlete that gives the reader an idea if the athlete is hot or not, or on an upward or downward trend.

Tab 3 Similar Events Results shows how athletes have performed in previous similar races as the one you are predicting for.

So please go have a look at the dashboards (full and small) and let me know what you think. And good luck making your own dashboards based on the real biathlon Patreon data subscription!

Posted in Statistical analysis | Tagged Data subscription, data visualization, Patreon, R, Tableau

Shooting Speed

Posted on 2020-12-02 | by Najtrebor | Leave a Comment on Shooting Speed

An analysis of shooting speed in biathlon, using the women’s individual race in Kontiolahti as an example. The data came from the real biathlon website, here is the exact link.

To get this data in a workable format, I just copied the table, pasted it in a text editor and copied/pasted that to Google Sheets. From there I had to do some splitting and moving things around but it was still fairly easy to get a working table. The only time consuming part was manually assigning hits or misses, and for that reason I only did to for the top 30 athletes. Then I added som ecalcualtion for athlete averages, max and min shooting times, etc. Although that can be done in Tableau, I find once you start working with filters etc. in becomes unnessessarily compicated in Tableau, just much easier to calculate the fields in Google Sheets.

Just a reminder the Tableau Dashboard below is interactive and intended to be used for further exploration of data. If you open it on the Tableau Public site you can use it full screen. Enjoy!

Posted in Statistical analysis | Tagged data visualization, Puck Possessed, shooting

Improvements season-to-season, putting it to use

Posted on 2020-11-19 | by Najtrebor | 2 Comments on Improvements season-to-season, putting it to use

To highlight what a great site Real biathlon is, and how easily the data can be used to give some great insights, below is a step-by-step on how to make a quick interactive chart based on the data referenced in the previous article. I used Google Sheets and Tableau Public, but you can use any of these kinds of tools that you are comfortable with, and publish a lookup chart within an hour. That’s how easy the real biathlon site is to collect biathlon data!

Step 1 – download & store the data

For the created chart I used the data of the last five season, for both men and women, limiting the data to only those athletes with at least 10 races. Since that never resulted in more than 100 athletes per gender and season, I did not set a filter on that.

After selecting the Season Statistics, Performance Score (for example, the 2019-2020 season for women), you can just select the table, copy, open a blank Google Sheet, select cell A1 and paste. In my case, my first row showed twice so I just removed one of them. Do this 10 times (five per gender), name the sheets appropriately (W1920, W1819, M1920, M1819, etc.) and export from Google Sheets to an xlsx file. You can export to text, but you would have to do that per sheet, where exporting to excel (xlsx) exports all sheets at once.

Step 2 – open data in Tableau

In Tableau Public connect to the spreadsheet by:

  • clicking on the Data menu > New Data Source
  • click on Microsoft Excel in the Connect window
  • select the xlsx file you just exported from Google Sheets
  • at the bottom of the list that shows your sheet names, double click New Union
  • drag all 10 sheets into the Union window
  • click OK

Step 3 – create some calculated fields

  • Full Name = [Given Name] + ” ” + [Family Name]
  • Gender =
IF LEFT([Table Name],1) = "M" THEN "Men"
ELSEIF LEFT([Table Name],1) = "W" THEN "Women"
ELSE "Unknown"
END
  • Season = MID([Table Name],2,5)

Step 4 – create the chart

Depending on what you want to show in your chart, the following differs, but to replicate the chart I made, drag the following pills in the Filters, Marks, Columns and Rows:

Step 5 – publish to Tableau Public

Once you are happy with your chart, just save the file to Tableau Public.

Now users can use highlighters to see how their favourite athletes stack up against the field, or see how certain Nations fare.

Posted in Statistical analysis | Tagged data use examples, data visualization, Tableau

Time Behind Score: comparing fruit, rather than apples and oranges

Posted on 2020-11-18 | by Najtrebor | Leave a Comment on Time Behind Score: comparing fruit, rather than apples and oranges

As IBU ranking point systems vary over time and per level (Junior, IBUcup and Senior) and typically awarded only to the top 30 athletes per race, I created the Time Behind Score to compare performances between races in different seasons and at different levels.

The Time Behind Score is based on the idea that at every level, every athlete is trying to be the fastest and wants to avoid being the last athlete crossing the finish line. As not all historic data, nor the data for all levels include skiing and shooting details, this Score only uses the final time per race, regardless of the balance between skiing-time and shooting-results. Although this leads to a lack of depth for further analysis, it is the only way to compare between different level races from different eras, and in the end, the balance between skiing and shooting is less relevant when only interested in performance based on which athletes cross the finish first.

Calculation

For the Time Behind Score calculation, all total race times per race are converted to a 0-100 scale, where the fastest athlete gets a score of 100, the slowest athlete gets a score of 0, and all other athletes get a score based on the relative position between the fastest and slowest athlete. This also gives points based on relative times rather than a rank-score that ignores how much time difference exists between positions.

The figure below demonstrates the process of converting a race result to the Time Behind Score: the top half shows the race results of all athletes with the winner on the left and the last finisher on the right; the orange dots representing each athlete are placed depending on how many seconds they finished behind the winner (so the further to the right, the more seconds behind). Those “seconds behind the winner” are converted to a percentage between the winner and last finisher in the bottom half of the image (“Percentage time behind compared to maximum time behind”) with the winner being 0% and the last finisher 100%. The Time Behind Score is the inverse of this percentage, shown on the horizontal axis of the graph, so 100 for the winner and 0 for the last finisher:

Converting race results to Time Behind Score

When comparing race results between seasons and levels, I will be using the Time Behind Score as the measurement. I hope the above will sufficiently explain the reasoning and process to calculate these values. I understand that there are (as with any other scores) pro’s and con’s but I like the pragmatic idea of scores based on how the athlete did, compared to the rest of the field. However, any comments or feedback are appreciated!

Posted in Statistical analysis | Tagged Puck Possessed, Ranking, Score, Time Behind Score

Posts navigation

Older posts
Newer posts

Recent Articles

  • Norwegian Dominance
  • Overall performance scores, season-to-season improvements
  • “Whether the weather is better or worse, the race is still always made on the course”
  • Fehlerfrei – a quick article on shooting clean
  • Shooting Efficiency comparison: First trimester 2019–20 vs. First trimester 2020–21

Categories

  • Biathlon News
  • Long-term trends
  • Statistical analysis
  • Website updates

Archives by Month

  • 2021: J F M A M J J A S O N D
  • 2020: J F M A M J J A S O N D
  • 2015: J F M A M J J A S O N D
  • 2013: J F M A M J J A S O N D
  • 2012: J F M A M J J A S O N D

Search Articles