The problem with the season or career variance scale is that it assumes we judge the same at every tournament.  For example, I tend to adjust my scale a little based on the tournament I am judging at or the division I am judging in.  A 28 in the novice division means something different then a 28 at the Kentucky round robin.  I generally give points relative to the pool of competitors at that tournament rather than relative to all the debaters I have seen in my life.  I am also giving slightly higher points this year than I did last year because I decided to raise my scale so a career comparison seems highly inaccurate.
I must admit that I am nervous about how the 50 point scale plays out.  I remember Emory University experimenting with a 50 point scale when I was competing (I think in 79 or 80?) and the points were all over the map with huge scewing because some judges used far more of the scale than others.  Perhaps the effort to give some guidelines will help reduce the variance.  However if some judges use a 42-47 scale while some use a 44-48 scale and others use a 45-50 scale it can have a substantial impact on who clears at a tournament where so few 5-3's clear.  It is a scary experiment when so much hangs in the balance.


Larson addresses the idea of career or season-long datasets to compute
z-scores, proving that a Wizard of Oz solution with reduced, not
improved, validity:
"Additionally, the whole premise of variance is that the observed is
held constant and we are measuring and correcting variability among the
observers. Rather than improving reliability by increasing the sample
size, we decrease reliability by radically decreasing the comparability
of the samples of debates seen by each of the observers."

Furthermore, it's not as if this idea was really publicly vetted, or
perceived as imminent. I am as "in the loop" as just about anyone on tab
procedures and did not know GSU and Southern Cal were really about to
embark on a variance experiment and that the Wake 50 point scale would
be getting in the way. There is no such thing as alternate actor fiat.
Wake should act on the basis of what we perceive others are likely to
do, not on the basis of what we wish they would do.

Otherwise, I could just wave my magic Hoe fiat wand and argue the 50
point scale should be rejected because judges "should" just use the 20
point scale better. I can and did consider persuading them to do so, but
have witnessed the inefficacy of that approach over the years. Adoption
of a different scale is a persuasive technique in and of itself, which
brings me to the final little point of this particular post . . .

My description of how one might conceive the 50 point scale was meant to
be the start of a discussion, not the end of it. For instance, Will said
he would likely give "zero or next-to-zero 49's or 50's. More likely the
former. It's fairly easy to 'imagine a better performance'." My word
choice was, perhaps, unfortunate. To me, there are a decent, if small,
number of "hard to imagine better" performances. But that's because I
mean something more like "hard to imagine a real live mortal college
student doing *better*" or "*unlikely* to be more than a handful of
performances that good."

But that's just my personal view that has a lot to do with my own
opinion that "perfect" is a stupid category. For instance, I have seen
some speeches (all in elims) that I would assign a 31 or 32 to on a 30
point scale because they really were that much better than the 29 point
speeches. No speech is "perfect" but some might be enough better than
the speeches that are a half point below the top of the scale as to
warrant giving maximum points.

Again, that's just me. I kind of proferred my grades analogy way of
describing my idea of how to use the 50 point scale to my
subjective/impressionistic list that I plunked at the bottom of that
original post.

I hope people offer amendments and other ways of interpreting the scale
that make sense. I will then try to reformulate the "suggested use"
guidelines to reflect what people think makes sense.

I especially appreciate Will's engagement and emphasis on the idea that
it is worse than unhelpful to just reject the attempt at a shared
standard. I also appreciate people's nervousness with change and the
overwhelmingly conservative nature of our community (culture is
conservative, by definition). I would not "shake things up" just for the
hell of it. Were the SQ not so broken, we would not just "experiment" on


