[eDebate] 0-100 system
Mon Nov 6 18:29:34 CST 2006
Since Charles is asking a question that impacts several folks, I'm
posting a response to the list.
While there were some cases where there were "strange" results due to
the interaction of preference and mutuality, I'm not sure that they were
any greater than are usually seen. In most cases they seemed to occur
when there was a particularly idiosyncratic distribution. I've had a
few suggest that we should create some constraints on the distribution,
but it defeats much of the purpose in terms of ease of use and the
ability to create "permanent" pref sheets that can be used across
tournaments. At present I'm still testing whether z-score normalization
can create sufficient commensurability to make arbitrary restrictions on
the distribution unnecessary.
To be clear, once I collect the data, I create four different sheets.
The first contains the raw data. The second contains the z-scores. The
third contains the data transformed into an ordinal distribution. The
fourth contains the data transformed into categories using the
assumption that 11% of the judges are in each of 9 categories (for some
teams a too stringent requirement since they rank more than 11% as 1's
and for others a too lax requirement since they put a disproportionate
number of partial commitment judges as 1's).
The purpose of the categorical transformation is just a reality check
since it permits me to see and evaluate how the system is doing compared
to the old one. It is on this basis that I can conclude that the judge
assignments weren't any worse or more idiosyncratic than previous
Kentucky tournaments (even though the pool was tight). The ordinal
transformation demonstrated that in strict ordinal terms, the judging
was more highly preferred and more mutual (in aggregate) than if I had
attempted to pair the entire tournament with the nine categories.
Regarding the balance between mutuality and preference, I would say
that I maintain the same personal preference that would say a 1-2 is
better than a 3-3 and as good as a 2-2 (assuming 9 or more categories).
Using z-scores to manage the mutuality rather than categories gives a
slightly different balance, but not one that was statistically
>>> "Charles Olney" <olneyce at gmail.com> 11/06/06 6:06 PM >>>
Hi, I'm sure you're busy, but I've got a quick question about the
system. I got the impression from talking to Ken Strange (who, I
think, talked about this with you) that part of the reason why some
people were getting "strange" results was that it was difficult to
find mutual highly-ranked judges.
Is this different than the old system? I always thought that the
general preference was to weight the "preference" component of MPJ
more heavily than the "mutual" component. So, a 1-2 was definitely
better than a 3-3, and about as good as a 2-2. Or, if I'm wrong about
the numbers, at least that was the general principle.
Is that not the case with the new system, or is there some difference
in how it plays out because the increased ability to delineate judges
makes differences in mutuality more apparent? Or is this just
For the purposes of filling out prefs for this tournament, even a very
quick answer would definitely be appreciated. In a broader sense, I
am very curious how it all works and what kind of difference the
greater variation is able to make. But that can clearly wait until
after the tournament.
For what it's worth, the general consensus on the Dartmouth team is
strong support for the 0-100 system, both in terms of our perceived
quality of judging at Kentucky, and in terms of the ease of use. I
know the sample size is probably too small for the former to mean
much, but the latter is probably significant.
More information about the Mailman