[eDebate] Response to Lacy

Gary Larson Gary.N.Larson
Sun Nov 26 19:17:16 CST 2006


JP identifies  a VERY important principle with respect to mutual
preference.  When I used the words "slightly worse than chance," it
might leave the impression that statistical chance is a bad thing.  When
we consider the global objectives of MPJ from a tournament
administration perspective, chance is the IDEAL outcome.  It would mean
that preference of one team or another for the judge fails to be a
predictor of who wins.  For a tournament director, topic committees,
software programs, etc.  the ultimate goal is that the only true
predictor of which team will win the debate is the team that debates
better.

So I think that we are getting better and that the degree of
non-mutuality that continues is not enough to disadvantage the team that
preferred the judge less.  Now that might mean that we we're now so
close in mutuality that it is unlikely to matter.  It could also mean
that some teams are bad in picking judges (assuming that they try to
pick judges that will vote for them :-).  My "worse that chance"
observation is a better scenario than a "better than chance."  If the
odds of the team that prefers the judge winning are significantly better
than chance, then we aren't adequately preserving mutuality.  But that
appears not to be the case.


If rankings are slightly worse than chance:

Are we collectively bad at picking judges? Or, is does this statistic
prove 
that we can collectively pick good judges?

If we're only slightly off pure chance, maybe mutual preference is
becoming 
strong enough that we can pick fair judges.

Maybe debaters and coaches are getting smart enough to pick the judges
who 
will do their best to determine the fair winner. A mutual 100 judge can

only pick one winner.

The bottom line is the holy grail--every team in the tournament gets
the 
judge who they think can fairly judge their debate.

The real question is: How much lack of mutuality is a predictor of who

wins? Or, when does the difference predict an outcome?

The point where it becomes a significant difference should be the
cut-off 
for mutuality in the whole preference vs mutuality mess.

--JP "still learning statistics" Lacy

ps--While I agree in principle with having a "bright line" or "cap" for

strikes, shouldn't people be able to figure this out for themselves if
they 
filled out a sheet that made their Z-score of a 0 LESS than -1? The
numbers 
are on the sheet as you submit them.

I might well agree but during an experimental phase, I do understand
that folks might not immediately figure out the implications of
strategies that they might employ.  As I've said in several posts, I was
more surprised than I should have been that competitive desires to
either maximize or minimize the odds of getting certain judges trumped a
"naive" rating of all of the judges based on some standard of
preference.

pps--Given an unfettered 0-100 system, I disagree with translating
things 
into ordinals for an additional comparison point for the tab room.
Ordinals 
are useful, but they don't reflect how teams fill out a sheet in an 
unfettered 0-100 system. People are counting on the Z score to reflect

differences between clusters when they fill out an unfettered 0-100
sheet. 
Ordinals can't reflect that.

Z-scores remained the significant determiner of mutuality.  But as I
said in the original description of the research, z-scores are most
statistically defensible when the distributions that you wish to
normalize are substantially "normal" in the first place.  Bimodal or
discontinuous distributions are not particularly amenable to a strict
z-score normalization.  For example, the team that ranked 42 judges as
100 and 117 judges as 0 created severe problems both on the top and
bottom of their distribution.  The 0's had a z-score of less than 1
stdev below their mean while the 100's had a z-score of roughly 2, a
higher value than the highest rated judge for most teams.  So if z-score
mutuality is pushed too far, the only highly mutual matches are the 0's.
 And while it might be viewed as "just desserts" for that team to get
0's, it has the additional effect of causing their opponents to get
below average prefs (typically the equivalent of 7's for them).  While I
appreciate the sentiment in JP's post, the assumption that everyone
understands the implications of radically skewed distributions is an
assumption I didn't dare make.

ppps--The 9-0 system isn't good enough. Has any system beat ordinals in

terms of overall preference? Despite whines to the contrary,ordinal
ranking 
is the easiest way to fill out a preference sheet. Get a stack of 4x6
cards 
and put them in order if you can't figure out how to do it on a
computer. 
Honestly, it is much easier to figure out if X judge is better than Y
than 
if X judge should be deemed equal to your A+ judges. If sheet gaming,
(as 
reflected in categorical 9-0 prefs,) is valued by the community, it is

still preserved in an ordinal system. Add a guaranteed strike cap or 
"cut-off" to that system and you have the best we can do for the time
being.




More information about the Mailman mailing list