1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. Hi all! No longer will threads be closed after 1000 (ish) messages. We may close if one gets so long to cause an issue and if you would like a thread closed to start a new one after a 1000 posts then just use the "Report Post" function. Enjoy!

Components experiment

Discussion in 'The Trash Can' started by gkelly, Nov 30, 2010.

  1. gkelly

    gkelly Well-Known Member

    Based on the discussion in the SS/TR thread, I'd like to try a little game, using some men's technical programs from 1994.

    Let's pretend we're working under a modified version of the IJS where we constitute a component judging panel that doesn't need to worry about marking the individual technical elements (except insofar as they contribute to the overall impressions of the program) and just mark the whole-program aspects of each performance.

    You can choose to use the five program components as currently used in 2010-11, with increments of 0.25 on a 0-10 scale.

    Or you can use JunY's proposed two-component system and mark only Blade Skills and Expression. You can use increments of 0.1 if you feel you need to make finer distinctions.

    Does anyone want to take a shot at defining what should be considered under each of those two marks? Or do just want to combine the current Skating Skills and Transitions criteria under Blade Skills and everything currently under Performance/Execution, Choreography, and Interpretation under Expression?

    Choose at least three of the programs below to analyze, assign your marks, and explain your reasoning.

    Have fun. And let's hope for some interesting discussion. :)








    And in case you feel like it,




    * * * * *

    A couple from that year's Worlds because I couldn't find their Olympic SPs on youtube:


  2. Jun Y

    Jun Y Well-Known Member

    I will play but I'll need a few days. Thanks for suggesting the game. :D
    gkelly and (deleted member) like this.
  3. dinakt

    dinakt Well-Known Member

    I want to play too, and also am falling behind in pretty much everything these days. With no GP this week, I hope to catch up.
  4. Marco

    Marco Well-Known Member

    Millot wuzrobbed. :(
  5. gkelly

    gkelly Well-Known Member

    Here you go, put Millot in the mix
  6. Jun Y

    Jun Y Well-Known Member

    First, let me try to define the criteria:

    1. Blade Skills: speed, power, smoothness, control, bidirectional skating, transitions in-between and immediately preceding entry to jumps/spins (quantity and quality), depth of edges, and other technical aspects not included in jumps and spins. Step sequence inevitably overlap with skating skills and cannot be separated from judging skating skills, but the focus is on between-element content.

    2. Interpretation and Expression: The non-technical part of the entire program, including posture and extension, choreography (ice coverage and patterns, originality of movements), how well the skater times his movements to music variations, how difficult the music is to interpret, how well he projects to the audience, anything "intangible" that influences the impression of quality.

    It is very difficult for me to assign a number to each of the programs, because I have never memorized or practiced the current component score system (a scale of 1-10) and do not have a "gut-level" estimation of what kind of skating is a 5, 6, 7, etc. So I just ranked the programs in #1 Blade skills and #2 Interpretation and Expression. Below are my impressions.

    1. Blade Skills (somewhat hampered by the inability to judge speed and ice coverage on TV):
    Cousins (with the most transitions and bidirectional skating of the lot; good speed)
    Tataurov (not a lot of transitions but the speed and smoothness look good)
    Jung (pretty good edges; speed looks well maintained throughout)
    Millot (? very hard to see on the video; very little transitions etc.; but the edges and speed seem good)
    Carr (hard to differentiate with Tylleson on video)
    Tylleson (His yellow boots are extremely distracting)

    2. Interpretation and Expression:
    Cousins (nuanced interpretation in the first half; excellent projection to audience)
    Jung (I'm surprised that I like this so much; difficult music, well choreographed)
    Tataurov (boring and predictable music and interpretation, but the best posture of the lot)
    Tylleson (goofy performance but humorous and in character)
    Millot (very boring music and interpretation; not choreographed to music details; good posture and extension)
    Quattrocecere (predictable music and choreography, but movements kept up with the music till the end)
    Carr (I like the music but his interpretation fell off after the first 30 seconds)

    I can see the problem with merging current components is that it is difficult to tell which consideration pulls a skater up or drags him down. Some may have better edges but little transitions. Some have better interpretation but boring music or no projection. OTOH, the current system is also impossible to accurately judge these elements in real time.
    Last edited: Dec 5, 2010
  7. gkelly

    gkelly Well-Known Member

    That being the case, what if anything would be the advantage to merging them? If anything, it would probably be more accurate and informative to split them further, especially P/E. Or maybe redefine the criteria to end up with 3-5 separate scores that overlap less than the existing 5.

    But I can't see what the advantage would be of combining to end up with 2 instead of 5.

    But is the current system any worse than using 2 component scores would be?

    Back in 2002-03, maybe a stronger case could have been made for giving one score for technical content and quality between the elements and one score for presentation that just refined the old Presentation mark, since that would have been less of a change from the existing system at the time.

    Now we're used to 5 components, so if 5 or 2 are equally inaccurate, we might as well stick with what everyone is now used to.

    If 5 is not as accurate as we'd like but 2 is even less accurate, then using only 2 would be a bad recommendation for several reasons.
  8. Jun Y

    Jun Y Well-Known Member

    Look, first, I'm not advocating for actually implementing a 2-component PCS system in reality, or that a 2-component system is definitely superior to a 5-component system. Besides, who would listen to me if I did? There is no danger that ISU would abandon the current judging system tomorrow because of what a single fan wrote on FSU. This is a simulated game that we played just for fun as fans. I have no evidence, research, or data to prove which of the 1- (like 6.0's second score), 2-, 3-, or 5-component systems is most reliable or consistent or practically fair or objective, how it should be best defined, or whether my suggested criteria are most effective, which I came up on the fly.

    Second, I suggested combining the components down to 2 because I am very skeptical that more components necessarily equal higher accuracy, no more than instituting more digits after the decimal point necessarily improves precision. One factor (or hurdle to accurate judging decisions) is how quickly judges must come up with a score in real time and how many criteria they have to apply. Most judges get to see a program and performance just once, with access to video replay but under tremendous time pressure. Plus they have to sit there for hours processing 6-24 or more programs. I'm not a judge, nor have I conducted psychological experiments on the consistency and reproducibility of the average judge's decisions (although I suppose "double-blind" experiments can be done).

    If all 5 components can be consistently and accurately applied in practice, then of course it is better than a 2-component system. However, the question is whether this is done or can be done? And if it cannot be done, is part of the problem having too many components, of which each must be assigned a number. What is the limit of judging accuracy for something as complex and multifaceted as figure skating is an empirical question that can be answered by systematically applied methods. If currently the 5 component scores given by each judge appear to be very close, is it so different from a 1-component system?

    Psychological research has come up with some interesting observations about how the brain makes decisions. Johan Lehrer has reported extensively in his book "How We Decide." When people are asked to make a quick decision based on a few criteria (e.g., choosing a car, furniture, or favorite food), they make excellent choices. But when people are asked to make a decision when they are given a lot of variables to consider and/or are required to articulate their argument to defend their choices, they choose poorly (see his reports on furniture-buying experiments conducted by Dijdijksterhuis and fruit jam and painting experiment by Timothy Wilson). Thinking too much and consciously, using predominantly the prefrontal cortex, appears to interfere with our ability to make accurate decisions quickly. Instead, faced with a lot of information to process, our brain performs better if it recruits the subcortical neurons that work fast but are not very articulate.

    This is the basis for my suspicion that having 5 components could make judges' judgment quality worse than having fewer components. Note that each of the 5 components already contains a lot of definitions and elements within (e.g., bidirectional skating and speed and edges are separate considerations). How do we know mixing and matching these elements into 5 components is better or worse, in terms of judging performance, than further dividing them into 7 or 8 components or 10?

    Lastly, the language used to define the judging criteria is perhaps inherently inadequate to help judges consistently interpret what's superior or inferior in music, dance, and performance in a consistent way --- perhaps such things are impossible and should not be quantified in the first place. And so far we have not taken into consideration language barriers and translational fidelity issues in international judging.


    If I had the funding, it would be entirely possible to test the validity of any judging system with double-blind experiments. Judges can be shown the video of some skaters at various levels (arranged in random order) with their faces and costumes digitally altered/bleached using the same criteria, and their intertester variability can be calculated. A cheap version of this test may be having all the Russian international judges test-judge a US sectional competition and then compare the results with the US judges' results --- not just ranking but scores.
    Last edited: Dec 6, 2010