(This article was written in April 2021, then example players changed in Oct 2022. Data is still from April 2021 and earlier)
What is finishing skill? Why do we need it? Who even are you?
Below I’ll cover what finishing skill is, then investigate how to predict it in the future using StatsBomb xG data (via FBref). My motivation comes from my own Kiwi-based model appearing to undervalue Kane (probably a good finisher). If you are an aspiring modeller/Excel-user/kiwi then feel free to follow along with the calculations in the investigation part of the piece, otherwise enjoy the ride!
Expected Goals (xG)
A more familiar concept now – basically in an xG model every shot is given an xG value equal to the probability that an average shot from that position on the pitch would score against an average goalkeeper. This is derived from analysing thousands of historical shots from similar positions and, in Statsbomb’s model, includes information such as ball height and defender locations.
However, some players are better at shooting than others. If Messi & Benteke get similar xG chances in a game, we rightly expect Messi to score more often. When using xG for FPL, we need a way to include “being good at shooting” in order to see what edge Maddison & Zaha might have over Bamford & Solanke. So how do we do it?
We want to go from predicting future xG to predicting future goals. For this, we look at how many goals a player scored from their xG in the past. It is usually accepted that finishing skill varies very little over a career, so for now I’ll do the same. Statsbomb’s model has not been around very long, so the only data available is from the 17/18 season to now in the top 5 European leagues (Bundesliga, La Liga, Ligue 1, Premier League, Serie A).
As an example take Erling Haaland, who in the previous 3 seasons scored 65 league goals (65G) from 50.2 xG. If we had correctly predicted that so far this season he would rack up 14.9 xG, how would we try to predict his goals? We could just guess 14.9, or we could try using his past ratio of goals to xG:
This seems reasonable (Kane has actually scored 17 goals). What if we try the same thing on Christian Benteke? In previous seasons he scored 6 goals from 18.6xG, this season he has 5.0xG:
Not so reasonable (Benteke has scored 5). What happened?
Goals are rare random events. Predictions can’t be bang on all the time (then football would be pointless), just as we can’t predict exactly which coin flips will be heads, even if we know each has a 50% chance. This means we have to be careful with small sample sizes like Benteke’s.
What we do have a huge sample of is average finishing skill – in theory on average 100 xG is converted into 100 goals (by definition). Before we have any data from Benteke, we assume that he is an average finisher. As he takes more shots we can slowly update that belief, but scoring his first shot should not make us believe he is 5 times better than Messi, and missing his first few shots should not make us believe he will never score again. So we need a formula that tells us how fast to update our belief away from “Benteke is average” to “Benteke is a little good/bad”, then to “Benteke is quite good/bad” and so on. The more shots he takes, the more confident we become that his past goals represent true skill rather than him getting lucky or unlucky.
The most popular method at the moment is adding in another number to the finishing skill formula, which brings the estimate closer to the average and has a bigger effect on small samples:
If c is 0 then we get the old formula again, whereas if c is very large then the result gets very close to 1, i.e. average finishing skill. The higher c is, the slower we update our belief away from a player being average – the more we believe their success or failure has so far been due to luck not skill. Let’s see how Kane & Benteke are affected by using c = 100, which the Kiwi model used up to now:
The Benteke number looks better, but Kane is undervalued. This is not much evidence by itself, but inspires the question:
What is the best value of c?
I restrict to non-penalty xG (npxG) as penalties are handled separately in most models. The aim is to find the value of c that is the most predictive. This should be repeatable with xG including penalties if anyone desires.
First the data. @FF_Trout provided scraped match-level data, but I chose to use season-level data instead, as FBref only provide xG values to one decimal place. So for example an xG of 0.2 from a match can mean anything from 0.15 to 0.25 (up to 25% different), whereas an xG of 10.0 from a season restricts to the range 9.95 to 10.05 (up to 0.5% different). This data comes from player stats tables on FBref which can be copied into Excel by exporting in CSV format, then using Text to Columns in the Data ribbon. Then the columns to keep are Player, Born (to distinguish between players with the same name), Min, G-PK, PKatt, xG, npxG, xG (per 90) & npxG (per 90).
Why not just have G-PK & npxG?
We can use the other numbers to get values more accurate than the rounded one decimal place in the npxG column. For each number, it gives us a range of possible values.
Using the two npxG numbers with the two xG numbers (minus 0.76 times the number of penalties taken), we get four different ranges of possible npxG. Then we can take the innermost range from them and take the midpoint of that. If the ranges don’t overlap then that means one incident has been awarded more than 0.76 penalty xG (e.g. a penalty rebound) – we won’t know the xG of this so we just have to stick to the two npxG numbers in these cases. I also assume that 0 npxG means 0 npxG (and remove all these entries) unless the player scored a goal that season, then I take the midpoint as normal. The cell formulas I used to calculate these are below:
Now I put Player, G-PK, and the calculated npxG in a large table (and Born or some sort of ID). Most players have multiple entries, either from multiple seasons or multiple clubs in the same season. We can therefore use the observed finishing skill in some combination of a player’s seasons to predict finishing skill in the unused seasons. For each player, I took every possible split of their seasons and removed any where one side of the split had <2.5 npxG (too small to bother with). Then I used each side of the split to predict the goals scored on the other side. Let’s run an example with Fabian Schär, who has 3 recorded seasons:
17/18: 2 goals from 3.50 npxG
18/19: 4 goals from 1.71 npxG
19/20: 2 goals from 1.10 npxG
20/21: 0 goals from 0.40 npxG
This gives 2 valid splits, one with 17/18 by itself and one where the 17/18 season combines with the 20/21 season. 18/19 has to be with 19/20 in order to break the 2.5 npxG barrier. Therefore we get these 4 predictions:
To find the best c, we take the difference between each prediction & result and square it. We then average across all predictions and square root back down to get the root mean square error (rmse), which is a standard tool for testing predictions. It rewards you for getting the average right, e.g. you get a better (lower) rmse by predicting 3 3 3 3 for the results 1 5 1 5 than predicting 5 5 5 5.
Say we were calculating the rmse on only Schär, with a value of c = 50:
One last technical detail – players with 2 seasons will have at most one valid split, but players with more seasons can have exponentially more valid splits, e.g. Patrick Cutrone has 54. To prevent Cutrone’s seasons from contributing multiple times, I divide the square error from each split by the number of valid splits from that player. Then each season effectively contributes once, so we don’t just end up with the best predictor of Patrick Cutrone.
Graph time! I used a total of 3581 splits, so 7162 predictions from 2838 seasons from 749 players. I tested values of c from 0 to 125 in steps of 5. First here is a scale large enough to fit everything – you can see that using no adjustment at all gives a large error. The dotted line shows the result of predicting purely from npxG (not including finishing skill), which gives a smaller error than using any c under 18.
Zooming in on the c>25 area, it looks like the most predictive value of c (on this dataset) is somewhere between 50 and 60, whereas my old value of 100 is about the same as a value of 35. Based on this I would recommend using c=55, while any nearby value should work well.
The final kiwi formula for calculating finishing skill for StatsBomb xG is:
I hope this was interesting to modellers and non-modellers alike, gave a little more insight into the Kiwi model, and will help you to factor in finishing skill appropriately for FPL decisions. Big thanks to @FPL_ElStatto for having faith I could convert from Twitter threads to articles, StatsBomb & FBref for the data, the FPL Twitter analytics community for motivating this analysis, and finally @FF_Trout for offering both data and feedback.
PS: Thanks also to @GeeLedger for bringing the previous examples to my attention, motivating the update.