I have shown that statistic is a powerful tool for the analysis of competitions were there are scores given by a jury.
The statistical analysis of Sochi Ladies Figure Skating results has shown the presence of systematic bias in the scores, in both Technical Elements and Program Components. The largest bias was assigned in both cases to the first skater, and this probably explain the “uproar” which has followed the end of the competition.
The analysis has shown also the following results:
- The “trimmed average”, used in almost all sports with a jury, is a very rough method to correct for bias. A better method should eliminate only the scores which have a distance very far from the Mean. The reference distance is the RMS of the distribution.
- The resolution power of a typical Jury is about 1.3 points. This indicates that score differences smaller than (about) 1.3 points don’t have any “objective” meaning, they are just statistical fluctuations. This happens in about 30% of the cases, considering all the skaters. However medal standing can be considered “objective” in about 90% of the cases (every time the score differences are larger than );
- World Records in Figure Skating don’t have intrinsic meaning due to the “global bias” which are depending on the particular jury. More relevant are score differences. In all cases the actual World’s Record holder in Ladies Figure Skating is Korean Kim Yuna;
- Fluctuations in the performances can be very important in all sports. In men’s 100m race they are relevant on the average in about 33% of cases.
On the basis of this analysis, an exercise to “correct” the official scores has been also performed. The result (Table 3.4) shows clearly that the first place should be assigned to the second skater Kim Yuna. The next scores (second and third places) are more or less equivalent within the errors, with a small advantage of Carolina Kostner over Adelina Sotnikova. In any case, a true unbiased result can only come from a non-biased Jury.
As a final remark, I would like to stress once again that the simple methods shown here can be in principle used by anybody who wants to perform a scientific check of any competition with scores.
In the following a sample of some discussions on the web are presented. The list doesn’t claim to be complete or “fair”, it just reflects what I have mostly found.
Coming back to the statistical test proposed in Par. 2.1.2, the amount of bias can be better quantified by looking the distribution of N, reported again in the next figure. Let me recall again the meaning of N: it is the number of times that a judge has provided a score above the average, for each element.
In the Free Program there are 12 elements, so N should be in the range 0 ― 12. It is clear that if a judge is most of the time above the average, most likely he is biased. It is possible evaluate a priori the probability that this happens, as N is expected to follow a binomial distribution, just as the number of times that you have red / black at the roulette, given the number of trials (12 in this case).
In the previous figure the distribution is compared with the expected theoretical binomial distribution (black dots). In the shadowed area the number of entries is much larger than what can be expected on statistical base (*). This indicate clearly bias in the scores, which is not a surprise, considering that for many skaters there is a judge of their own nationality (†).
I will evaluate here the total amount of bias that can be introduced in the score, as a function of the number N of biased judges.
For this purpose, I have recalculated all the scores by modifying the points given by a number N of individual judges. The change has been performed by adding some extra.points, as explained later. The number N was varied from 1 to 4 (hopefully no more than 4 biased judges should be present in a competition!)
The results of the analysis are presented in the next figures, for both Tech-nical Elements (TE) and Program Components (PCS). The scores are summed on Short and Free programs.
Let me recall once again that in all cases a “trimmed average” is performed (exclusion of the highest and lowest scores). In figure A.2 the differences in scores (TES, sum of SP and FP) are reported as a function of the number N of “biased Judges” (up to 4). The black dots correspond to the case where the Judges give (for each GOE) one point more than the "true" score (moderate bias). The red dots correspond to the case of two points more in each GOE (strong bias). For instance, in case of two biased Judges you get a difference of about 2 points in case of “moderate” bias, and 3 points in case of “strong” bias. The error bars indicate the “range” where the difference can be found (this depends on the details of the scores). Note that a difference of about 1 point is found also in case of N=1. In this case however it doesn’t depend on the amount of bias, because of the trimmed average.
In the figure below (figure A.3) the differences in Program Components (PCS) are considered. The three colors correspond to three different bias, from 0.5 points (moderate bias) to 1.5 points (strong bias). This means that a number N of judges gives for each element 0.5 points more than the “real” score, etc.
Again, a difference of about 1 point can be seen for N=1, independently on the amount of bias (because of the trimmed average). The bias on the total score can be obtained just by summing the TES and PCS differences. Note that as Judges can give negative bias also, the overall score difference between two skaters can be twice this difference.
Note also that the largest part of the TES score comes from the “Base Values” and they are not considered here. So, in principle a further, large bias can be also introduced by the “Technical Panel”.
From the previous figures it is possible to evaluate the amount of bias that can be introduced on the total score by a specific number of “unfair” (biased) Judges. For instance, in case of N=2 and moderate bias (black points for TES and red points for PCS), you get an average bias of about 1.8+1.8=3.6 points. As Judges can also provide negative bias also (by lowering the scores), and due to the symmetry of the situation, the total score differences between two skaters in this example can be of about 7 points!
 The discussed results can be found here:
 The table of SOV used in the exercise can be found here:
 The best runners performances are available here:
See also references therein.
Please, send questions, comments, etc. to:
* According to the general statistical laws, the expected number of entries in the shadowed area should be about 2.5% of the total, (about 5 units). The observed number is 23.
† Indeed the first 13 skaters in the ranking of Sochi had a judge of their own nation, with the exception of Korea.