Interesting aticle: Natural Bias in sports judging

Discussion in 'Great Skate Debate' started by maatTheViking, Nov 4, 2010.

  1. maatTheViking

    maatTheViking Well-Known Member

    Joined:
    Jun 27, 2008
    Messages:
    1,756
    So, my main sport is dressage, and one of the main (independent!) news sites had a very interesting article posted about Natural Bias in sports judging, and how judging complex movements (such as figure skating, gymnastics or dressage) is simply too hard for the brain without using any kind of prior bias or knowledge.

    The guest article is a written by a researcher in sports psychology, Inga Wolframm, and I thought it was very enlightening, and applicable to Figure Skating as well.

    Enjoy!

    http://www.eurodressage.com/equestrian/2010/11/04/natural-bias-hidden-controversy-judging-sports


    Reading the article, do you think figure skating judging could be made more simple? How? I don't have a firm enough grasp on all the teknik stuff to have any ideas.

    I also wondered if the judges can see each other marks/screens sitting that close in competition?
     
  2. mag

    mag Well-Known Member

    Joined:
    Jan 18, 2006
    Messages:
    7,081
    That is a fascinating article. Thanks for posting.
     
  3. joeperryfan

    joeperryfan Well-Known Member

    Joined:
    Feb 4, 2002
    Messages:
    2,014
    Thanks for posting, this article is quite interesting and I'll make sure to comment once I've read it thoroughly. :)
     
  4. gkelly

    gkelly Well-Known Member

    Joined:
    Sep 26, 2003
    Messages:
    10,564
    The author writes:

    In skating, I think that equates to breaking down the technical scoring into scoring each technical element separately, and getting different panels to determine what was performed and how well.

    Looking at all the numbers generated by that process is more complicated than just looking at one Technical Merit mark, but the process of coming up with each of those numbers is a lot simpler than trying to keep all the elements in mind to assign a single mark, and it's less subject to skate order effects.

    The PCS scoring is still holistic and therefore still subjective and still subject to all these effects.

    There is such a thing as a filter that can be put over monitors to make them unreadable from anything but a straight-on view, but I don't know if it's common practice to use them on the judges' monitors.
    http://www.tech-faq.com/computer-privacy-screen.html
     
  5. dinakt

    dinakt Well-Known Member

    Joined:
    Feb 20, 2008
    Messages:
    3,786
    Very interesting article and very relevant.
     
  6. overedge

    overedge Well-Known Member

    Joined:
    Jan 21, 2005
    Messages:
    17,508
    Great article. Thanks for posting the link.

    I would guess not - they look like small screens with a lot of information on them - e.g. video replay, which has to be large-ish so the details are visible - and the judges are sitting with some space between each seat. So unless they are entering the marks in 48-point font or some such, I think it would be pretty difficult to read a screen that was not your own.
     
    PeterG and (deleted member) like this.
  7. krenseby

    krenseby New Member

    Joined:
    Mar 18, 2006
    Messages:
    908
    This is the point that I found the most interesting: "Furthermore, in aesthetic sports... different movements are extremely complex, consisting of a number of technical and artistic elements that all need to be considered at once. However, research has shown that the processing of such complex information simply exceeds human capabilities. In order to be still be able to provide relevant scores within the given timeframe, judges fall back on schemas.. or “short-cuts” .. based on a number of different information sources, such as the athlete’s reputation, their previous performances, which team they belong to etc., [these] .. help judges come up with judgement decisions that, in their mind, approximate actual performances."

    This basically means that skaters are actually graded based on previous performances rather than on the current one, because all the movements in the program are too complicated to process and judges automatically fall back on an impression of the skater's previous performances.
     
  8. Blair

    Blair New Member

    Joined:
    Aug 4, 2002
    Messages:
    103
    This offers a great explanation for how PCS have manifested themselves over the last 7 years since the IJS was introduced.

    Really interesting article! Thanks for posting.
     
  9. Ziggy

    Ziggy Well-Known Member

    Joined:
    Oct 7, 2002
    Messages:
    20,569
    There is no exact criteria and no methodology for coming up with the score.

    When with Monika we tried to do an experiment with judging PCS at 2010 Worlds, we had to come up with our own methodology, based on the very vague guidelines.

    The other thing which really bugs me is judges only noticing the first thing and failing to notice anything that came afterwards. It's very well studied and described in Social Psychology but I can't remember what this effect was called now.

    So in effect - somebody puts a hand down and then put the foot down. Judges give -1 GOE for that element.

    When you look at the deduction sheet, they should have deducted -1 for the hand down and -2 for the foot down, which in combination gives you a -3 GOE deduction.
     
  10. alilou

    alilou Crazy Stalker Lady

    Joined:
    Oct 28, 2005
    Messages:
    4,012
    Very interesting article. I'm glad I read it. I don't have any brilliant insight or suggestions, but just want to say that it kind of helps me relax about it all, like I can finally exhale about the judging because it's just the way human beings are, and it doesn't matter what system is used these same "schemas" will still apply because it's a judged sport. Still, I do think a separate panel for TES and PCS would really help but I doubt that's ever going to happen.
     
    Last edited: Nov 5, 2010
  11. Ziggy

    Ziggy Well-Known Member

    Joined:
    Oct 7, 2002
    Messages:
    20,569
    It would help but with ISU keeping making all the cutbacks... No chance, yep.
     
  12. dinakt

    dinakt Well-Known Member

    Joined:
    Feb 20, 2008
    Messages:
    3,786
    Probably no chance, but that's my very strong wish, as well; separate the panels so people have specific limited tasks and actually can pay attention to technique and to performance/ choreography/ artistry- separately.
     
  13. BreakfastClub

    BreakfastClub Active Member

    Joined:
    Jun 17, 2002
    Messages:
    782
    Go back to 6.0 and crack down on the cheating.

    I'm really not trying to be a jerk saying that. 6.0 was a very simple system - rank the skaters. Period. Cognitive science research has proven over and over that the human mind is much more effective at comparing things to each other (6.0) rather than against an arbitrary standard (COP).

    Sure it was easy to get a bloc together and that led to controversial 4/5 and 6/3 splits on the medal stand at the elite level, that were debatable for reasons of preference, politics or reputation.

    They had to toss out the toe tappers, the Marie-Reines, and the Alla Shekhovtsovas, but 6.0 was simpler and generally led to more logical results than the craziness of COP. Now judges need to assign 7-12+ or more GOE marks against arbitrarily assigned pages of standards/criteria, then assign five more overall PCS marks based on even more arbitrary criteria they need to memorize.

    And they need to do this all while trying to guess and stay "in the corridor" (the ultimate piece of BS) based on a skater's reputation.

    Then add in the fact that the base value for each technical element is arbitrary (yank your blade over your head and get more points, wheeee!!!) quads are now suddenly worth more this year, wheeee!!!), and the fact that there's a powerful caller out there splitting hairs to assign a level, a downgrade, etc....

    Ah, 6.0, where did you go?

    Great article. And I love dressage. Thanks for posting!
     
  14. aftershocks

    aftershocks Well-Known Member

    Joined:
    Dec 8, 2009
    Messages:
    4,559
    Thanks for posting that article ... fascinating read. I too love dressage and anything to do with horses and horseback riding. [sidenote: Johnny attributes his erect posture on jump landings to his equestrian skills]

    I think what the writer said about judges relying on politics and athletes' reputation to help decide their scoring is pretty much the main modus operandi in figure skating judging. Let's not forget too that the Code of Points was essentially rushed into being mainly to protect the judges rather than helping to fairly judge the skaters. Whatever benefits may accrue as the system continues to develop, IMO, are tarnished by anonymous judging and the way CoP was rushed and forced into existence.

    So true re skaters being judged by previous performances -- case in point, Jeremy Abbott at Worlds 2010 (judges apparently couldn't forget images of Abbott falling and stumbling through his Olympic short progam -- one of the two best sp of last season, the other of course belonging to Daisuke). Abbott skated his sp beautifully at Worlds, but was marked lower than he deserved.

    There are rare occasions (Michelle Kwan many times, Brian Boitano at '88 Olympics, Rudy Galindo at 1996 Nationals and Worlds) where the judges had in mind to score differently, (i.e., politically, and based on things other than the skaters' performances), but couldn't in light of magical, bring down the house performances.

    Generally, I think figure skating is even more difficult to judge based on a set of criteria than other sports such as Gymnastics and Diving, because figure skating is sport and performance art, while I think gymnastics and diving have important aesthetic aspects, the performance aspect does not play as significant a role as it does in figure skating. For me taking a skater's performance apart to score on specific elements, without also truly looking at the whole and judging the whole without political bias and manipulation of PCS, is largely what sucks about current system .. along with the anonymous judging.
     
  15. millyskate

    millyskate Well-Known Member

    Joined:
    Feb 28, 2003
    Messages:
    9,039
    This can go further... When I was on a few piano panels, people would often get completely obsessed with one detail. Something they'd noticed at the start, and then failed to pay attention to any of the rest.
    It was generally the all-rounders that suffered.

    Starting off strong and collapsing at the end, or collapsing at the start and pulling it together at the end was often forgivable, but encountering a few problems interspersed throughout was generally the kiss of death.

    Being small and cute was a MASSIVE bonus. Any child tall for their age or slightly overweight was doomed unless they were outstanding, they rarely got more than a pass. I used to take :EVILLE: pleasure in looking up all the birth dates and pointing out the "small cute one" was the oldest of the pack.
     
  16. Japanfan

    Japanfan Well-Known Member

    Joined:
    Mar 1, 2002
    Messages:
    12,800
    Do you have a source? I'm not convinced that comparing things is easier than measuring them against a standard. It also really depends on the context. And, COP is hardly arbitrary.

    Under 6.0 bias and judging on reputation was arguably even more prevalent it is under COP. Remember those competitions where there was no movement whatsoever in the phases of the dance competition?

    Plus, judging in the LP was made really easy by the fact that the top three controlled their destiny (which also made for some major upsets, i.e. ladies at the 2002 Olympics). It was therefore easier to pick the podium in advance and manipulate the scores. Certainly judges manipulate PCS just as judges used to manipulate the second mark under 6.0. But, there is more room for movement and given all the numbers and computing involved, its harder to fix the final results.
     
    gkelly and (deleted member) like this.
  17. Ziggy

    Ziggy Well-Known Member

    Joined:
    Oct 7, 2002
    Messages:
    20,569
    6.0 meant that the majority of what skaters did on the ice did not count and was not taken into consideration.

    With CoP at least they know what they are marked for.

    The system is far from perfect and there is a lot of room for improvement but at least skaters and coaches now are getting feedback and can work on improving individual elements.

    As for cognitive science, which was supposed to be the new direction and the beacon of light in psychology, the majority of it has been proven to be methodologically unsound if not outright falsified (ie. studying 100 people and taking the results of 3 into consideration :D). When the correlations they got have been checked mathematically, it turned out a lot of them have been impossible to achieve.
     
    Last edited: Nov 5, 2010
  18. Ziggy

    Ziggy Well-Known Member

    Joined:
    Oct 7, 2002
    Messages:
    20,569
    That's another thing.

    The halo effect.

    Beautiful = good.

    I mean Korpi's PCS this season are seriously :huh:.
     
  19. zaphyre14

    zaphyre14 Well-Known Member

    Joined:
    Mar 14, 2002
    Messages:
    4,662
    I've seen the IJS screens up close. The judges screen are small (approx 4"x 6")and have nothing on them other than the list of the element codes and the keypad for marks. It's pretty difficult to see what your neighbor is marking.

    The tech panel has the large monitors for replay and entering all the codes. The Accountant and Data people have full size monitors too but even seated shoulder to should, it's pretty hard to see each other's screens because they were mean to be viewed straight on. Also the print is really small (in order to get everything on there. Reading each other's screens is diffiuclt at best and for the average human, pretty close to impossible.
     
  20. gkelly

    gkelly Well-Known Member

    Joined:
    Sep 26, 2003
    Messages:
    10,564
    Agreed.

    From one point of view, it's simpler to just rank skaters and give two marks. Much less complicated than looking at each aspect of the performances separately and giving lots of different marks.

    The protocols looked a lot simpler when it was just two marks per skater per judge and used up a lot less paper.

    But is that what the author of the article means by "simpler"?

    Think about what a judge has to do in order to rank skaters with some degree of "accuracy": Evaluate the basic skating skills, count the jumps and their difficulty, evaluate the quality of each jump, determine the difficulty and quality of all the spins and steps and spirals and other in-betweens, evaluate the skater's carriage and line, projection to the audience, connection between the movement and the music, etc. Were there obvious errors that should be penalized even more than the loss of credit for whatever skills they represented failures of? Was there any content that was unique in its difficulty or originality that should be rewarded for its uniqueness in addition to its actual technical value? Etc.

    Oh, and then decide which of the preceding skaters this skater was better or worse than.

    And somehow all that needs to get boiled down into two numbers, making sure to leave enough room between this skater's numbers and those who were immediately better or worse among the preceding skaters so that there will also be enough numbers left to slot in subsequent skaters above and/or below as needed.

    The resultant numbers look simple, but the thought processes required to arrive at those numbers are extremely complicated. Plenty of room for important details to get overlooked or for judges (and fans) to differ significantly in how they weight the most salient aspects of the performances. And plenty of room for "noise" such as reputation or skate order to overshadow the "signal" of the immediate performance as a deciding factor in a judge's decision on where to rank skaters with relatively comparable performances.

    On the other hand, evaluating an individual jump element is much simpler. There are clear guidelines defining the required takeoff and number of revolutions: the tech panel just has to decide yes or no whether those definitions were met, and if the rotation was short then by how much. Combos or sequences with unexpected errors might be a little trickier to define, but there are published guidelines for how to handle most situations.

    And then the judges just have to evaluate the element on a scale of -3 to +3 according to clearly spelled out guidelines, and then move on.

    There's no need to weigh the difficulty of one element against another or decide how much to value quality over difficulty or vice versa. Most of those weightings have been built into the scale of value and taken out of the hands of the judges, making the judges' task simpler.

    Defining spin and step levels is more complicated for the tech panel under the current rules. That's because most of the common-sense and gut feeling decisions about difficulty have now been codified, in ways that encourage certain kinds of difficulty and discourage others.

    We might disagree with some of those choices and want to see the rules and feature definitions and the scale of values rewritten to reflect our own preferences for what should be rewarded. But whatever those rules are, when calling a program the tech panel and judges don't need to make value judgments about what they think should or shouldn't be worth more, as the judges did under 6.0.

    The tech panel just needs to decide whether each attempted feature was achieved or not. Just a series of yes/no decisions, not value judgments.

    For the judges the process is similar to what they do for jumps: evaluate the element on a scale of -3 to +3 according to clearly spelled out guidelines, and then move on.

    There are a lot of separate decisions -- by two groups of officials -- producing a lot of separate numbers. Which in one way looks complicated. But it's a lot of separate simple decisions.

    It's a lot easier to decide "This spin meets four of the bullet points for positive GOE and doesn't have any errors -- +2 GOE" than it is to think "This spin was really good, but it wasn't very difficult and the rest of the program was generally sloppy and there were a couple of major errors on other elements, and this skater has never had good results in the past and is from a small country with no political influence -- wait, I'm not supposed to be judging those last facts -- well, it was a bad program and deserves low scores, but that was a very nice spin -- was it nice enough to score this skater above the skater I currently have in last place or not?"

    See what I mean?

    Now, the PCS are never going to be that simple.
     
  21. query5

    query5 New Member

    Joined:
    Jul 3, 2009
    Messages:
    672
    kinda of agree with article--but figure skating bias goes a bit further than natural bias .
     
  22. Skittl1321

    Skittl1321 Well-Known Member

    Joined:
    Feb 1, 2007
    Messages:
    11,181
    Isn't bias like this part of the reason judges watch practices (actually, I don't know if they still do that) and why skaters submit a planned program sheet?

    It's easier to judge if they know what to expect- then if all the information was presented to them for the first time as they were watching.
     
  23. Ania

    Ania Active Member

    Joined:
    Feb 20, 2002
    Messages:
    244
    The majority of cognitive science research is experimental and/or computational. Studies rarely rely ONLY on correlational evidence.
    Are you by any chance referring to Vul & Pashler's paper on "puzzlingly high correlations"? (http://psy2.ucsd.edu/~pwinkiel/vul-etal_correlations-main-2009.pdf) This paper has nothing to do with behavioral cognitive science (it's a critique of fMRI data analysis techniques in the field of social neuroscience).

    While there is some sloppy research in cognitive science (as in any other endeavor humans undertake, scientific or not), you'd be hard pressed to find factual support for saying that "the majority of it has been proven to be methodologically unsound if not outright falsified".
     
  24. Jun Y

    Jun Y Well-Known Member

    Joined:
    Dec 8, 2005
    Messages:
    1,095
    There are so many areas in the current and past figure skating judging systems that are subject to unconscious bias that I don't know where to start.

    One of many problems: the program components are poorly defined, too complicated, with too much overlap (vague rules, cognitive limitation, etc.). My very unscientific and very subjective observation of recent competition results has convinced me that many judges are frequently not following the guidelines. They probably can't, rather than are unwilling to, adhere to the rules, although I don't really know because I'm not in their heads. (It doesn't help that the rules keep shifting and changing every year.)

    All the hoopla about Transitions last year exposed the widespread problem, which still is and probably will be uncorrected for the foreseeable future. Except Skating Skills, the other 4 components often cannot stand up to much scrutiny, IN MY HUMBLE OPINION.

    As a sport, figure skating is fair only to a moderate extent. The internal complexity and contradictions make it impossible to produce a truly reliable, consistent, reproducible, and fair judging system. IMHO. If I had kids I would be very reluctant to let them get into a career in competitive figure skating.

    (In theory I think a holistic judging system based on competent and honest judges' overall impression may not be inherently more biased than the IJS. However, such a system makes it nearly impossible to detect intentional cheating and gross incompetence.)
     
    Last edited: Nov 5, 2010
  25. Visaliakid

    Visaliakid Well-Known Member

    Joined:
    Feb 1, 2003
    Messages:
    3,309

    Returning to 6.0 will never happen! The cheating that was prevalent with that system spelled the deathknell to it. Unfortunate but inevitable.
     
  26. leafygreens

    leafygreens Well-Known Member

    Joined:
    May 26, 2009
    Messages:
    1,648
    Combining this with the other thread about more skating events in the Olympics, wouldn't that result in more fair judging? If judges were judging only jumps in one event and then only spins in another event, that would cut down on all the cognitive confusion of being bombarded with multiple elements and ways to mark them.
     
  27. aftershocks

    aftershocks Well-Known Member

    Joined:
    Dec 8, 2009
    Messages:
    4,559
    :respec: :respec: :respec:


    I like the idea of more events and opportunities for skaters to medal on their strengths. However, figure skating is such a tradition-bound sport and extremely slow to change.* The only reason the scoring system changed so radically was due to the 2002 Olympic judging scandal. Pressure was placed on the ISU to do something about the judging fiasco in order to repair fs reputation as an Olympic sport. The changes largely were a smokescreen for business as usual for the judges, with a lot more protection and hiding room due to anonymity. Of course, the scoring rules changes have continued to change and to be reworked due to having been rushed into being. And the changes have drastically affected how we view the sport, and how the skaters train, and how programs are put together (in many ways adversely). A lot of fans, especially younger fans, IMO, love the accessibility of the scoring and the know-it-all ability and the numbers fix they get with CoP.

    Meanwhile, it was very possible for changes to have been instituted in a more thoughtful, reasoned way with the utmost purpose in mind of improving the sport and fairly judging the skaters, not protecting the judges.

    *Other changes such as creation of the short program in the early 70s (a good thing), and the complete dumping of figures in the early 90s (not so good) were changes that came about again due to pressures placed on the ISU-- the short program was created due to the effects of television -- media and viewers were astounded and confused about why a gorgeous Janet Lynn received bronze instead of gold for her beautiful free-skating at 1972 Worlds (figures counted for more and Beatrix Schuba was a genius at figures). The following year at Worlds the short program was in place for the first time, and Janet Lynn faltered (perhaps due to nerves and pressure -- because she was supposed to win now that she had two opportunities to showcase her free skating abilities). In 1973, Lynn came in second behind Karen Magnussen of Canada, and Schuba had retired as a result of the decrease in overall importance of figures in scoring.

    In the early 90s, largely because of the viewing demands of television, figures were completely dumped instead of being slowly phased out, or better yet, reduced to a separate event that didn't have to be widely covered (it was hardly covered prior to being dumped anyway). FS honchos failed to realize the importance of figures in helping skaters develop their edging skills. Obviously that is why a number of skaters today have problems with edging technique on the takeoff of their jumps. Figures could have been phased out of competition for singles skaters, but still kept as an important skill to practice and be tested on.
     
    alilou and (deleted member) like this.
  28. Aussie Willy

    Aussie Willy Well-Known Member

    Joined:
    Feb 18, 2005
    Messages:
    18,051
    I might as well be honest. It is not that difficult to see what other judges have given. We have donated laptops that we set up for judging. Depends on the size of the screen you use.
     
  29. Ziggy

    Ziggy Well-Known Member

    Joined:
    Oct 7, 2002
    Messages:
    20,569
    Zaphyre14, I'm not sure what you've seen but at competitions judges use normal size monitors (I'd guess "17).
     
  30. gkelly

    gkelly Well-Known Member

    Joined:
    Sep 26, 2003
    Messages:
    10,564
    Maybe all the international competitions use the same kind (or maybe not -- I don't know), but within the US I've seen different kinds of monitors at different competitions. And domestic events in other countries probably use different systems.