Judgement Methodology #1 - Ranking and Scoring scale

The main role of a judge isn't to give an accurate grade out of 10 in a specific criterion. What judges do is a ranking of spinners. The larger the amount of spinner you rank and the more reliable your results will be.

For this specific article, I will talk only about execution and I will act as if execution was an undivided criterion, like it use to be.

Ranking

When you start judging, you might be tempted to give a score to a combo right after watching it. That's something that comes with experience and it's not that easy to do.

To begin with, you need to rank spinners relative to each others. Below, a very simple example that'll help me to illustrate what follows.

Each trait correspond to the execution score of a spinner, represented by a letter. e.g. Spinner D has a better exe score than spinner C, E and F.

In an ideal world with a very comprehensive rulesbook, I believe judges would all agree on ranking and disagree on score repartition. That score repartition would represent the "community disagreement".

Scoring scale

Absolute scoring scale and tournament-specific scoring scale

Something that isn't well understand for now is that every tournament has to get a specific scoring scale. It's specifically obvious for execution, but also quite easy to represent for difficulty.

The scoring scale is mostly defined by its extremities. As shown in the image below, you get completely different scores depending on how you defined it.

2 kind of extremities

the first could be the one used by DarKT and Xound in R2 (e.g. Xound score for control : from 2.5 to 3.5 /5)

the second could be the one I used (e.g. my score for control : from 0 to 5 / 5)

I think I overdid it a bit in the first round of WT and that motivated me to make some modifications on my extremities after R3 (or R4 ?)

These extremities are different depending on the tournament. There are 3 major kind of tournaments :

Beginner tournaments where execution is almost always crap because of the lack of experience
National tournaments where there's a huge disparity in skill level
International tournaments where mistakes are more heavily punished most of the time

example of extremities : blue = beginner ; orange = national ; green = international

Why is all this that important ? Because if you make a good work in defining the extremities, criteria don't have the same impact on the total score. Let's say judges in a tournament score execution from 5 to 7 and originality from 3 to 9. The tournament becomes an originality oriented tournament. It's not a bad thing of course, but I don't feel it's clear to everyone.

These extremities also mean that you can have multiple spinners below the minimum score and above the maximum score depending on how you define them.

Scoring

Once you have your extremities and your ranking, you can start to score.

The idea is to take 2 spinners close to each other and ask yourself : "do this deserve a point difference or not ? If so, how many points ?"

if we consider F = 0 and B = 10, even if we defined previously that spinner D > spinner C, it doesn't necessarily mean they will have a different score

This part is quite easy, because all the work has been done previously.

Importance of rulesbook

Imo we need to put more work in defining the extremities with a lot more examples to define the extremities for international competition. The current definition of 0 = 7 days of penspinning, 10 = eban Japen 9th is outdated and lead to very narrowed scoring.

That a lot of work but I might try to give some examples in a future article.

Consistency

The most important part of judging like this is that you'll get a better consistency. On my side, a 5/10 in difficulty is the same in my R2 scores and in my R4 scores, or at least I tried. That sound logical for someone not really involved in judgement, but trust me, it's a lot harder to do that it seems.

Also, if your scoring is based on ranking of combos, each total score for every combo will be independant to each other. What I mean by that : when you're a beginner and you start judging battles, you judge one combo to another, and only one, instead of a pack of combos as shown above. That's not necessiraly a bad thing but it's clearly less effective than judging a combo in comparison to a comprehensive database you built yourself. When poorly done, the judge just "chooses" the winner and the battle ends up with really close total scores, even if it's not close at all.

Tschüss

Next week about some implicit rules in setupology

Search

Some PS thoughts