[eDebate] My thoughts 50 point scale at Wake

William J Repko repkowil
Thu Nov 1 19:27:19 CDT 2007


For reasons that I'll spell-out in a moment, I do not support a long-term 
move to a 50 point scale. 

That said, the most important item I'd like to raise in this post is that -- 
irrespective of my personal opinion of the system -- I will abide by the  
guidelines set-forth by the tournament. 

Point # 1    Going unilateral = bad 

Seems to ruin the experiment. Even if you sense the experiment will prove a 
bad idea, doesn't it still seem useful to have good data on the 
effectiveness of the *experiment* ?... 

Also seems to defy the basic idea of being *invited* to a tourney. Wake 
asked us to their party -- we should strive to be reasonably gracious 
guests. 

Point # 2    Trying to avoid accidentally jacking the scale 

There simply won't be too many iconoclasts actively seeking to ruin the new 
scale. 

But, I fear non-iconoclasts could accidentally hurt the experiment. In hopes 
of sparking a consistent read of the new scale, I included my read of it. If 
people read Ross's scale differently, let's talk about it before -- not 
after -- the tourney. 

Assuming the scale Ross posted earlier today holds: 
(http://www.ndtceda.com/pipermail/edebate/2007-November/072820.html) 

a) I'll probably issue zero or next-to-zero 49's or 50's. More likely the 
former. 

It's fairly easy to "imagine a better performance". If the tabsheet has as 
many 50's-as-typical 30's, or (gulp) shows as many 49's as 29's, then people 
probably missed the point. 

b) I won't import the 30 point model. 

There are debaters to whom I consistently issue a 28.5, but are sorta 
"closer" to 29.0 than a 28.0. 

Suppose one of these debaters really stepped-it-up. I might have given them 
a 29.0, but I think I am not going to give them a 49.0 

c) I'll keep in mind that roughly 30 teams break to elims of the NDT. 

Thus the second category "NDT elim worthy performance" is larger than one 
might expect. 

In fact, in *many* debates I judge, at least one person has at least an 
"early NDT elim round performance". 

Therefore, I think it's more likely-than-not that most of my ballots will 
award at least one of the competitors a 47 (or higher). 

That said, a close read of Ross's scale has a 47 applying to BOTH the bottom 
of the second tier and the top of third tier. This makes sense, as roughly 
as many team clear at WFU as clear at the NDT. 

It seems, then, that a 48 (for me) will be issued to students that give a 
performance that is worthy of quarters or later performance at the NDT/Wake. 
A 47 will be for something akin to an octas/doubles performance. A 46 seems 
to be warranted if someone put-forth a "bubble-clearing" performance for a 
typical major. A 45 is feels like a performance akin those of a strong 
4-4ish team or weak 5-3ish team. 

 d) If I judge two completely likeable but inexperienced debaters, I will 
not be afraid to issue 42 points. 

If I sense they put-forth a 2-6ish style showing, that seems to be a 
generous read of the new scale. 

MANY such points should be issued at the Wake tourney. 

To put it another way, it would be highly abnormal to judge a 4 round 
commitment at the Wake tournament and NOT issue MORE THAN one set of 43's 
(or lower). You are QUITE LIKELY to judge a few teams that are currently 
"below-average" relative to the field. If you refuse to give such points, 
you are inflating the scale. 

Point # 3 -- the reason I do not favor a long-term move to a 50-point scale. 

I favor an different experiment. It uses a revised judge variance-scheme for 
issuing speaker points (and as the first-tiebreaker for clearing). 

Ross's post begins with a critique of "judge variance". 

I think judge variance is only meaningless b/c the sample size for variance 
is currently set to track only variance *within* the Shirley tournament (or 
any given tourney). 

I, however, feel that judge variance can be tweaked. I'd prefer an 
experiment that uses season-long (or, even better, career-long) judge 
variance by accessing archives from Bruschke's system. 

Under this system, it would not matter if the Russian judge (Hardy) gives 
everyone "low-points" (9.3 or lower to every gymnast). What matters is that 
variation is meaningful within that judge's scale. 

To contextualize, many of us like Aaron Hardy as a judge and we KNOW that a 
29.0 from him means SO MUCH MORE than someone that gives out 29.0 like 
candy. Why should speaker awards (or clearing) be a referendum on the luck 
of which judges you got during the prelims ?... Worse, why should it 
discourage us from pref-ing some of our favorite critics solely b/c they 
tend to be "low-pointers" ?.. 

The 50 point scale remedies none of this -- and only would do so if you 
believed that inflation is somehow NOT the result of very human and very 
foreseeable long-term variables. 

 From diving to gymnastics, points have always gradually inflated. 

To me, the 50 point scale may be a useful experiment -- but is ultimately 
cut from the same cloth. In time, the scale will ride-up for some, but only 
some. We will be left with comparable frustrations. 

The heart of the issue is that it is exceptionally difficult to get a large 
group of judges to look at any speaker-point scale in the same way. It is 
more workable, however, to ask them to have stare decisis within their own 
scale. 

At worst, I fear the 50 point experiment will be counter-productive. I even 
think there may be a disad to the perm of trying both experiments. 

Specifically, I fear it may trade-off with a broader move towards using 
(career-long) judge variance. 

I know of two tournaments that have both been seriously toying with the idea 
of loading all of the Bruschke archives, creating a far larger pool of data 
from which to draw "variance", and using career-long judge variance as the 
standard for speaker awards and-or clearing. 

One such tournament director opted against this experiment b/c Wake system 
(50 points) changes the baseline and makes variance a touch odd. 

Another Tournament director may still proceed with the experiment, but would 
need to exclude the Wake '07 data (which is a really, really, large and 
useful pool for sample size). 

In the end, I will support the Wake system -- over time, Ross (and others) 
have used the Wake tourney to experiment and I think the community has grown 
because of it. 

But, I would encourage other tourneys to consider the proposal on the table. 

It seems to move-away from the unworkable notion that we will read the scale 
the same, and move-towards a model where we read OUR OWN scale consistently. 


 -- Will 






More information about the Mailman mailing list