[eDebate] Ermo's Proposal
Tue Nov 6 15:17:29 CST 2007
I REALLY like this proposal [ (X-20) x 10 ] and I hope it will gain close consideration as a recommended scale. I'm terrible with numbers, and methods was never really my thing, but unless I'm missing something, this proposal seems to solve all of the major concerns on both sides. It gives critics the luxury of feeling comfortable by using the 30 point scale as a base, but is distinct from the 30 point scale in that a judge would have more discretion to adjust up or down. This is especially true if critics couldn't repeat point assignments in a given debate (which makes sense given that no two speeches in a debate have the exact same value and we already create these distinctions by giving ranks in the squo). The other thing I like is that it using Ermo's example [ (28.5-20) x 10 = 85] it's very easy for everyone to use because you can do that math in your head without giving yourself a headache and provides an intuitive result that still gives critics a range to play with within what would be reasonable.
I'm particulary concerned that some critics aren't going to come out of the 90-100 range. I'm also concerned, as Ede mentioned earlier, that argumentative choice along with MPJ sheets will have an even greater influence over seeding. This is just as true of situations where a team gets a critic who doesn't like what a team has to say or how they say it as much as it is of a situation where critics like what a team has to say and how they say it TOO much. Block 100's will have a much greater impact on seeding than block 30's b/c the distinction between a 30 and a 28.5 is a lot smaller than the difference between an 85 and a 100. A difference of 1.5 points can cause some skew but not nearly the skew that 15 points would cause when everyone's points are adjusted and averaged.
Does anyone else have any thoughts on Ermo's proposal?
Veronica M. Guevara
Weber State University
Department of Communications
1605 University Circle
Ogden, UT 84408
Date: Tue, 6 Nov 2007 11:15:07 -0600From: EricMorris at MissouriState.eduTo: Gary.N.Larson at wheaton.edu; edebate at ndtceda.comSubject: [eDebate] Making the experiment less 'risky'
If we end up with a 100 point scale with guidelines, then I will try to follow that scale and hope others will too.
The lowest risk version of this is probably the early suggestion for decimal places (28.4, etc.).
Of course, the ?20? is superfluous, since basically all speaker points start with that number. ?8.4? is just as meaningful, and would save some data entry time.
?8.4? is pretty similar to ?84?. Just move the decimal point.
Thus, Wake could say use a 100 point scale where you start with the formula (X-20)x10 and then make quality gradients from there.
Thus, if Wake decides to put me in charge of writing the ?scale?, here is my proposed text:
?Begin with your preferred points on a 30 point scale. Subtract 20. Multiply by ten. Make minor adjustments, using whole numbers as you see fit. For example, a debater might give a performance you would have called 28.5. Subtract 20, leaving 8.5. Multiply by 10, so the score is 85. Feel free to move the number up or down up to 2 points to make a finer quality distinction.?
The benefit to allowing quality gradiations is achieved with less risk that individual judges will privilege or harm debaters by using different mid-point assumptions. That feels like the best of both worlds ? shaking things up more might be interesting but is probably NOT the best of both worlds.
A couple of possible addendums:
1. Wake could require that no debaters share the same number is a possible addition. Thus, a block 28 round might end up with 82, 81, 80, and 79 ? probably in rank order.
2. Any points under 60 could require verification of intention, as with low point wins.
3. The scale could cap at 99 instead of 100, in case the 3rd whole digit creates programming hassles. I doubt it would, since we have 3 significant digits now (though we only ?use? two of them), but I?d defer to him on that question.
From: edebate-bounces at www.ndtceda.com [mailto:edebate-bounces at www.ndtceda.com] On Behalf Of Gary LarsonSent: Tuesday, November 06, 2007 10:36 AMTo: edebate at ndtceda.comSubject: [eDebate] A less risky experiment
While I?m always intrigued by the opportunity to research something (rather than actually having to do it), we would need to clearly understand the limitations of the ?controlled? study that Stefan and David are proposing. I?m not opposed to collecting two different scores for each debate. I?d even to be open to arguments as to which of the scores counts towards the actual results and which is simply correlated (though all of this is really WFU?s decision to make rather than mine).
But the proposed experiment would prove to be the classic case where studying a behavior potentially changes the behavior in question. When we start the exercise by saying, ?assign the points that you would assign on a 30-point scale and then indicate the points you would give on a 100 point (or 50 point) scale? we have no idea how the process of identifying two different scores AND consciously correlating them impacts one or both of the scores assigned. My first hypothesis would be that the scores on the 30 point scale would NOT have, in fact, been the scores that would have been assigned had the scale research not been in progress. Imagine the judge that now gives 27.5?s to 80% of the debaters that they judge in a tournament. I can imagine that for some such judges, they ?might? argue that it was just a matter of discrimination ? none of those debaters were quite good enough to merit a 28 and none were poor enough to get a 27. But for others, the 27.5 is just a polite fiction or convenience. They did enough work deciding who won the debate. I suspect that the experiment would result in some of those judges giving a broader range of scores on the 30-point scale. If that happened, wouldn?t it just prove that reform was unnecessary ? that judges could reconceptualize the current scale and produce a more discriminating set of scores? Perhaps. Though without the force of the experiment , we?d be back in the SQ where friendly persuasion hasn?t done much over the years. But some others might actually give a narrower range of scores. If they start the exercise by using the 100-point scale to provide discrimination they might discover that all of those scores can translate back to the same 30-point equivalent. Once again, writing two scores down on the ballot with explicit instructions that they should correlate in accordance to a predefined conversion scale might influence the scores assigned ? including the 30-point versions.
The other question I have is how we would evaluate the outcome of the experiment. David provides one metric. The new scale succeeds if we get a normal distribution of +/- 2 points surrounding each of the scores that represent the 30-point scaled score times 3.3. Of course, that?s assuming that the current scores are correct and that the only issue is increased discrimination (inflation being irrelevant). To be honest, one of the dilemmas whether we change the scale or just do the research is that we don?t really have any control. There is no ?right? score, correct seed orders, or correct speaker awards to which our outcome can be compared. If the two scales produce slightly different outcomes, we have grounds for arguments but not really for conclusions. And if the two-score research creates an extremely high correlation that in itself doesn?t prove that the reform is unnecessary since the research itself might have prodded the outcome.
Peek-a-boo FREE Tricks & Treats for You!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Mailman