[eDebate] My thoughts 50 point scale at Wake

NEIL BERCH berchnorto
Thu Nov 1 19:41:16 CDT 2007

I really like Will's idea and think someone should try it if it's technically feasible.  A couple of other thoughts:
1.  About the Russian judge:  before the figure skating system got "fixed" (after the ice dancing was "fixed"), they actually did use a system that was similar to judge variance.  While much attention was paid to whether a judge gave a skater a 5.4 or a 5.5, the actual scoring was ordinal.  Each judge's scores were counted separately, and skaters were assigned ordinal ranks by each judge based upon the scores given to all skaters by that judge (with the technical merit score serving as tiebreaker in compulsaries, and the artistic expression score serving as tiebreaker in the free skate).
2.  Why not use opponent wins as the primary tiebreaker for clearing and seeding?
3.  Since debate is a team sport, why have speaker awards at all?
--Neil Berch
West Virginia University
  For reasons that I'll spell-out in a moment, I do not support a long-term 
  move to a 50 point scale. 

  That said, the most important item I'd like to raise in this post is that -- 
  irrespective of my personal opinion of the system -- I will abide by the  
  guidelines set-forth by the tournament. 

  Point # 1    Going unilateral = bad 

  Seems to ruin the experiment. Even if you sense the experiment will prove a 
  bad idea, doesn't it still seem useful to have good data on the 
  effectiveness of the *experiment* ?... 

  Also seems to defy the basic idea of being *invited* to a tourney. Wake 
  asked us to their party -- we should strive to be reasonably gracious 

  Point # 2    Trying to avoid accidentally jacking the scale 

  There simply won't be too many iconoclasts actively seeking to ruin the new 

  But, I fear non-iconoclasts could accidentally hurt the experiment. In hopes 
  of sparking a consistent read of the new scale, I included my read of it. If 
  people read Ross's scale differently, let's talk about it before -- not 
  after -- the tourney. 

  Assuming the scale Ross posted earlier today holds: 

  a) I'll probably issue zero or next-to-zero 49's or 50's. More likely the 

  It's fairly easy to "imagine a better performance". If the tabsheet has as 
  many 50's-as-typical 30's, or (gulp) shows as many 49's as 29's, then people 
  probably missed the point. 

  b) I won't import the 30 point model. 

  There are debaters to whom I consistently issue a 28.5, but are sorta 
  "closer" to 29.0 than a 28.0. 

  Suppose one of these debaters really stepped-it-up. I might have given them 
  a 29.0, but I think I am not going to give them a 49.0 

  c) I'll keep in mind that roughly 30 teams break to elims of the NDT. 

  Thus the second category "NDT elim worthy performance" is larger than one 
  might expect. 

  In fact, in *many* debates I judge, at least one person has at least an 
  "early NDT elim round performance". 

  Therefore, I think it's more likely-than-not that most of my ballots will 
  award at least one of the competitors a 47 (or higher). 

  That said, a close read of Ross's scale has a 47 applying to BOTH the bottom 
  of the second tier and the top of third tier. This makes sense, as roughly 
  as many team clear at WFU as clear at the NDT. 

  It seems, then, that a 48 (for me) will be issued to students that give a 
  performance that is worthy of quarters or later performance at the NDT/Wake. 
  A 47 will be for something akin to an octas/doubles performance. A 46 seems 
  to be warranted if someone put-forth a "bubble-clearing" performance for a 
  typical major. A 45 is feels like a performance akin those of a strong 
  4-4ish team or weak 5-3ish team. 

   d) If I judge two completely likeable but inexperienced debaters, I will 
  not be afraid to issue 42 points. 

  If I sense they put-forth a 2-6ish style showing, that seems to be a 
  generous read of the new scale. 

  MANY such points should be issued at the Wake tourney. 

  To put it another way, it would be highly abnormal to judge a 4 round 
  commitment at the Wake tournament and NOT issue MORE THAN one set of 43's 
  (or lower). You are QUITE LIKELY to judge a few teams that are currently 
  "below-average" relative to the field. If you refuse to give such points, 
  you are inflating the scale. 

  Point # 3 -- the reason I do not favor a long-term move to a 50-point scale. 

  I favor an different experiment. It uses a revised judge variance-scheme for 
  issuing speaker points (and as the first-tiebreaker for clearing). 

  Ross's post begins with a critique of "judge variance". 

  I think judge variance is only meaningless b/c the sample size for variance 
  is currently set to track only variance *within* the Shirley tournament (or 
  any given tourney). 

  I, however, feel that judge variance can be tweaked. I'd prefer an 
  experiment that uses season-long (or, even better, career-long) judge 
  variance by accessing archives from Bruschke's system. 

  Under this system, it would not matter if the Russian judge (Hardy) gives 
  everyone "low-points" (9.3 or lower to every gymnast). What matters is that 
  variation is meaningful within that judge's scale. 

  To contextualize, many of us like Aaron Hardy as a judge and we KNOW that a 
  29.0 from him means SO MUCH MORE than someone that gives out 29.0 like 
  candy. Why should speaker awards (or clearing) be a referendum on the luck 
  of which judges you got during the prelims ?... Worse, why should it 
  discourage us from pref-ing some of our favorite critics solely b/c they 
  tend to be "low-pointers" ?.. 

  The 50 point scale remedies none of this -- and only would do so if you 
  believed that inflation is somehow NOT the result of very human and very 
  foreseeable long-term variables. 

   From diving to gymnastics, points have always gradually inflated. 

  To me, the 50 point scale may be a useful experiment -- but is ultimately 
  cut from the same cloth. In time, the scale will ride-up for some, but only 
  some. We will be left with comparable frustrations. 

  The heart of the issue is that it is exceptionally difficult to get a large 
  group of judges to look at any speaker-point scale in the same way. It is 
  more workable, however, to ask them to have stare decisis within their own 

  At worst, I fear the 50 point experiment will be counter-productive. I even 
  think there may be a disad to the perm of trying both experiments. 

  Specifically, I fear it may trade-off with a broader move towards using 
  (career-long) judge variance. 

  I know of two tournaments that have both been seriously toying with the idea 
  of loading all of the Bruschke archives, creating a far larger pool of data 
  from which to draw "variance", and using career-long judge variance as the 
  standard for speaker awards and-or clearing. 

  One such tournament director opted against this experiment b/c Wake system 
  (50 points) changes the baseline and makes variance a touch odd. 

  Another Tournament director may still proceed with the experiment, but would 
  need to exclude the Wake '07 data (which is a really, really, large and 
  useful pool for sample size). 

  In the end, I will support the Wake system -- over time, Ross (and others) 
  have used the Wake tourney to experiment and I think the community has grown 
  because of it. 

  But, I would encourage other tourneys to consider the proposal on the table. 

  It seems to move-away from the unworkable notion that we will read the scale 
  the same, and move-towards a model where we read OUR OWN scale consistently. 

   -- Will 

