[eDebate] A less risky experiment

Gary Larson Gary.N.Larson
Tue Nov 6 10:36:17 CST 2007

While I?m always intrigued by the opportunity to research something (rather than actually having to do it), we would need to clearly understand the limitations of the ?controlled? study that Stefan and David are proposing.  I?m not opposed to collecting two different scores for each debate.  I?d even to be open to arguments as to which of the scores counts towards the actual results and which is simply correlated (though all of this is really WFU?s decision to make rather than mine).
But the proposed experiment would prove to be the classic case where studying a behavior potentially changes the behavior in question.  When we start the exercise by saying, ?assign the points that you would assign on a 30-point scale and then indicate the points you would give on a 100 point (or 50 point) scale? we have no idea how the process of identifying two different scores AND consciously correlating them impacts one or both of the scores assigned.  My first hypothesis would be that the scores on the 30 point scale would NOT have, in fact, been the scores that would have been assigned had the scale research not been in progress.  Imagine the judge that now gives 27.5?s to 80% of the debaters that they judge in a tournament.  I can imagine that for some such judges, they ?might? argue that it was just a matter of discrimination ? none of those debaters were quite good enough to merit a 28 and none were poor enough to get a 27.  But for others, the 27.5 is just a polite fiction or convenience.  They did enough work deciding who won the debate.  I suspect that the experiment would result in some of those judges giving a broader range of scores on the 30-point scale.     If that happened, wouldn?t it just prove that reform was unnecessary ? that judges could reconceptualize the current scale and produce a more discriminating set of scores?  Perhaps.  Though without the force of the experiment , we?d be back in the SQ where friendly persuasion hasn?t done much over the years. But some others might actually give a narrower range of scores.  If they start the exercise by using the 100-point scale to provide discrimination they might discover that all of those scores can translate back to the same 30-point equivalent.  Once again, writing two scores down on the ballot with explicit instructions that they should correlate in accordance to a predefined conversion scale might influence the scores assigned ? including the 30-point versions.
The other question I have is how we would evaluate the outcome of the experiment.  David provides one metric.  The new scale succeeds if we get a normal distribution of +/- 2 points surrounding each of the scores that represent the 30-point scaled score times 3.3.  Of course, that?s assuming that the current scores are correct and that the only issue is increased discrimination (inflation being irrelevant).  To be honest, one of the dilemmas whether we change the scale or just do the research is that we don?t really have any control.  There is no ?right? score, correct seed orders, or correct speaker awards to which our outcome can be compared.  If the two scales produce slightly different outcomes, we have grounds for arguments but not really for conclusions.  And if the two-score research creates an extremely high correlation that in itself doesn?t prove that the reform is unnecessary since the research itself might have prodded the outcome.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ndtceda.com/pipermail/edebate/attachments/20071106/93fface5/attachment.htm 

More information about the Mailman mailing list