[eDebate] Response to Lacy
Sun Nov 26 19:17:16 CST 2006
JP identifies a VERY important principle with respect to mutual
preference. When I used the words "slightly worse than chance," it
might leave the impression that statistical chance is a bad thing. When
we consider the global objectives of MPJ from a tournament
administration perspective, chance is the IDEAL outcome. It would mean
that preference of one team or another for the judge fails to be a
predictor of who wins. For a tournament director, topic committees,
software programs, etc. the ultimate goal is that the only true
predictor of which team will win the debate is the team that debates
So I think that we are getting better and that the degree of
non-mutuality that continues is not enough to disadvantage the team that
preferred the judge less. Now that might mean that we we're now so
close in mutuality that it is unlikely to matter. It could also mean
that some teams are bad in picking judges (assuming that they try to
pick judges that will vote for them :-). My "worse that chance"
observation is a better scenario than a "better than chance." If the
odds of the team that prefers the judge winning are significantly better
than chance, then we aren't adequately preserving mutuality. But that
appears not to be the case.
If rankings are slightly worse than chance:
Are we collectively bad at picking judges? Or, is does this statistic
that we can collectively pick good judges?
If we're only slightly off pure chance, maybe mutual preference is
strong enough that we can pick fair judges.
Maybe debaters and coaches are getting smart enough to pick the judges
will do their best to determine the fair winner. A mutual 100 judge can
only pick one winner.
The bottom line is the holy grail--every team in the tournament gets
judge who they think can fairly judge their debate.
The real question is: How much lack of mutuality is a predictor of who
wins? Or, when does the difference predict an outcome?
The point where it becomes a significant difference should be the
for mutuality in the whole preference vs mutuality mess.
--JP "still learning statistics" Lacy
ps--While I agree in principle with having a "bright line" or "cap" for
strikes, shouldn't people be able to figure this out for themselves if
filled out a sheet that made their Z-score of a 0 LESS than -1? The
are on the sheet as you submit them.
I might well agree but during an experimental phase, I do understand
that folks might not immediately figure out the implications of
strategies that they might employ. As I've said in several posts, I was
more surprised than I should have been that competitive desires to
either maximize or minimize the odds of getting certain judges trumped a
"naive" rating of all of the judges based on some standard of
pps--Given an unfettered 0-100 system, I disagree with translating
into ordinals for an additional comparison point for the tab room.
are useful, but they don't reflect how teams fill out a sheet in an
unfettered 0-100 system. People are counting on the Z score to reflect
differences between clusters when they fill out an unfettered 0-100
Ordinals can't reflect that.
Z-scores remained the significant determiner of mutuality. But as I
said in the original description of the research, z-scores are most
statistically defensible when the distributions that you wish to
normalize are substantially "normal" in the first place. Bimodal or
discontinuous distributions are not particularly amenable to a strict
z-score normalization. For example, the team that ranked 42 judges as
100 and 117 judges as 0 created severe problems both on the top and
bottom of their distribution. The 0's had a z-score of less than 1
stdev below their mean while the 100's had a z-score of roughly 2, a
higher value than the highest rated judge for most teams. So if z-score
mutuality is pushed too far, the only highly mutual matches are the 0's.
And while it might be viewed as "just desserts" for that team to get
0's, it has the additional effect of causing their opponents to get
below average prefs (typically the equivalent of 7's for them). While I
appreciate the sentiment in JP's post, the assumption that everyone
understands the implications of radically skewed distributions is an
assumption I didn't dare make.
ppps--The 9-0 system isn't good enough. Has any system beat ordinals in
terms of overall preference? Despite whines to the contrary,ordinal
is the easiest way to fill out a preference sheet. Get a stack of 4x6
and put them in order if you can't figure out how to do it on a
Honestly, it is much easier to figure out if X judge is better than Y
if X judge should be deemed equal to your A+ judges. If sheet gaming,
reflected in categorical 9-0 prefs,) is valued by the community, it is
still preserved in an ordinal system. Add a guaranteed strike cap or
"cut-off" to that system and you have the best we can do for the time
More information about the Mailman