[eDebate] Mutual Preference experiment - part 1

Gary Larson Gary.N.Larson
Wed Nov 22 08:32:39 CST 2006


Since the discussion has turned to a critique of mutual preference, it
is probably an appropriate time to discuss the experiment that was
conducted at Kentucky and Wake this fall.  Rather than provide one
overwhelmingly long post, I will submit a number of posts that discuss
the various outcomes and conclusions that can tentatively be made
regarding the experiment.  (I'll respond to Repko directly in a separate
post)

First, some quick background that is familiar to many of you.  In the
35 years that I've been involved in debate tab rooms and particularly
the 22 years during which I've been involved in debate software
development, I've gone through several stages with respect to judge
assignment.  It began with a project to write an algorithm for Zarefsky
to automate and improve the performance of mutual preference at the 1987
NDT.  Perhaps ironically, with Wheaton's participation in the growing
CEDA movement and years during which CEDA Nats and the NDT were held on
the same weekend, I soon became identified as the architect of purely
random judging.  But for both the NDT and the CEDA community the most
common strategy for tournament management at the time was "tab room"
preference.  Both tournaments that advertised themselves as using mutual
preference and those that advertised random typically used a significant
amount of tab room intervention to "improve" outcomes.  My project from
the beginning was predicated not only on improving tab room efficiency
by automating the process but even more critically to increase actual
and perceived fairness by minimizing tab room intervention.

Early critiques of the computer were instructive.  At CEDA Nats, we had
a difficult time accepting computer-generated random judging because it
actually WAS random.  For the first two national tournaments with random
assignments, the tournament director reserved the right to "fix" elim
round panels.  At the same time, tournaments that were using mutual
preference were uncomfortable with the notion of automating the process
with the computer because the computer could NEVER know enough to encode
all of the information that Kloster finds useful.  So I would routinely
be told, "sure the computer says that it's an AA match, but I know that
when that team is aff they would never want that judge against that
opponent because he/she will never vote for them."  While the statement
might have been absolutely accurate, I'm uncomfortable interjecting tab
room intervention as the vehicle to fix it.  While I've been in tab
rooms for 35 years and believe that tournament directors are remarkably
wise and ethical in their management of their tournaments, I simply
don't believe that any tab room staff is sufficiently omniscient or just
to fairly manage judge assignments for EVERY team in the tournament
equally. 

So the Holy Grail for the last several years has been to devise a
system that permits the participants enough input so that tab rooms
don't need to intervene on their behalf AND provides a powerful enough
algorithm to balance all of the conflicting global concerns that exist
in the pairing of a tournament.  Not surprisingly, most critiques that
get forwarded back to me resolve down to a very "local" question.  In
such and such a critical round, we got a judge that we believe harmed
our chance at success.  More often than not, the complaint comes after
the judge voted against the team.

While NO system ever has been or will be devised that could ensure that
no round out of 552 debates (at Wake) will have the outcome described
above, the goal as clearly been to limit those instances AND more
importantly to fairly distribute outcomes across the participants (with
the caveat that we favor those in break rounds and disfavor to some
extent those who have been eliminated).  With that goal in mind, we've
examined a variety of rating systems and a wide variety of algorithms. 
The experiment at Kentucky and Wake was less an attempt to replace
current systems with a new one but rather to test some of the
fundamental assumptions that underlie each of the systems we currently
use.  In subsequent posts I'll talk about the hypotheses, the data, what
I think we discovered, and perhaps where we should go from here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ndtceda.com/pipermail/edebate/attachments/20061122/240b27a9/attachment.html 



More information about the Mailman mailing list