The curving process should be linear, I think.

BeetleB · on Sept 27, 2016

The curving process shouldn't exist.

If you make an exam that is difficult enough to need curving, you're getting a poor measure of ability. This is because exams ask only a few questions, and unreasonably difficult exams result in low scores even from high achieving students. Low scores are very susceptible to noise (the delta between 50% and 60% is greater than between 85% and 95%).

If that doesn't convince you, take my argument to the obvious extreme. The Putnam Competition in mathematics is tough. Sometimes over half of the people score 0. Getting 1 question correct (out of 12) at times puts you in the top 20%.

Imagine I gave this as an exam to a math class of 20 students. One person scores a 1. The rest score 0. Is it meaningful to curve this? I could correct by giving partial credit: Some people get 0.25, others 0.5, and others 0.75, so we now have 5 different grades. Should I just give A, B, C, D and F?

The lower the scores, the higher the effect of noise. It's a bad idea.

smallnamespace · on Sept 27, 2016

> Low scores are very susceptible to noise (the delta between 50% and 60% is greater than between 85% and 95%)

This seems untrue from information theory.

The entropy of a question is maximized when its probability of being answered correctly is exactly 50% [1]. If your only goal is to have the least amount of measurement noise given a fixed number of questions, then you'll want each question to be hard enough to filter 50% of the class out, and to minimize the correlation between questions.

For example, 10 independent 50% questions reveals as much information as 16 independent 85% questions.

> Imagine I gave this as an exam to a math class of 20 students. One person scores a 1. The rest score 0. Is it meaningful to curve this?

You're looking at only 1 tail but ignoring the other. By symmetry, a question that only 1 person gets correct tells you exactly as much as a question that 19 get correct. An exam with a 90% pass rate (before curving) is no better than a 10% exam.

[1] https://en.wikipedia.org/wiki/Binary_entropy_function

mcguire · on Sept 27, 2016

Thanks. I'm now having flashbacks to the summer of 1990, when I worked on a Education professor's research project, creating a math test in a Hypercard stack that presented the question that provided the most information about the testee's abilities, given their previous answers.

smallnamespace · on Sept 27, 2016

The computerized GMAT and LSAT attempt to do this - the difficulty level of questions dynamically adjust to your current performance level (higher means harder questions).

BeetleB · on Sept 28, 2016

First, while you may have a point, you misunderstood my percentages. I did not mean them to be the pass rate, but rather the score.

> An exam with a 90% pass rate (before curving) is no better than a 10% exam.

I do not encourage a 90% pass rate. I do not endorse exams where most of the students score 90%. I'm saying an exam which allows for a large variation in scores (anywhere from 10 to 100) yields more information, whereas an exam where a really brilliant student will get 50, with the next highest score being a 30 by someone who is only very smart, tends to be less likely to yield useful information about most of the students who will get 20 or less. A fairly bright student and a fairly average student may both score a 20 on such a test - yet the test failed to distinguish between them.

(BTW, I had an instructor whose exams were like this - I think I once had the highest score at around 25-30 out of 100).

A less demanding, but not trivial test, will separate out the average from the brighter.

Regardless, why the need for a curve? Your grading system should not depend on which students are present. It can lead to poor students getting a good grade and smarter students getting a poorer grade - in different semesters, for the very same tests.

aianus · on Sept 28, 2016

> Regardless, why the need for a curve? Your grading system should not depend on which students are present. It can lead to poor students getting a good grade and smarter students getting a poorer grade - in different semesters, for the very same tests.

The distribution of skill between one 300-person calc class and the next is going to vary much less than the difference in teaching styles and exam difficulties. This is exactly why a curve is required -- so that your grade reflects how well you perform relative to others doing the same thing instead of some absolute level of competence that would vary from school to school, prof to prof, semester to semester.

Consider an employer or graduate school admissions committee that needs to decide who to interview. Looking at curved grades makes it easy to pick the top X% of students, whereas looking at uncurved grades leaves a lot more to chance (maybe a C was the highest grade in your section, but there was an easier prof the following year where the highest grade was an A).

BeetleB · on Sept 28, 2016

That works fine when you have classes with 100+ students. In the ones I attended, it would range from 15 to 40 (the latter being considered high). Lower numbers tend to be impacted more by noise.

>Consider an employer or graduate school admissions committee that needs to decide who to interview. Looking at curved grades makes it easy to pick the top X% of students

As an employer, I'm not interested in the candidate's ranking in the class. I'm interested in their skills. While one is often used as a proxy for the other, I do not.

As a student, I want feedback on how much knowledge I learned, not how I did in comparison with the class. This was the original purpose of scoring tests.

Having gone through the PhD route, I know that "A" grade students who were always focused on the metric of relative ranking rather than knowledge acquired eventually were more likely to do a poor thesis or drop out, compared to "A" grade students who were focused on acquiring knowledge.

This was more acute from students who came from top undergrad schools: Very competitive background with heavy curving - and they would take their A as a faulty indicator that they were "doing well". In grad school, even though the courses are more challenging, most professors give A's and B's. Only rarely were C's given. The professors want to focus on learning and theses - grades are a distraction. Suddenly these students were getting A's, thinking they were doing well and not learning much. Their internal barometers were measuring the wrong thing, so their research suffered.

aianus · on Sept 28, 2016

> As a student, I want feedback on how much knowledge I learned, not how I did in comparison with the class. This was the original purpose of scoring tests.

My university automatically attached our grades to all internship applications. It's pretty clear the purpose of grades (other than pass/fail) is employer or grad school evaluation, not for student feedback. For better or for worse.

BeetleB · on Sept 28, 2016

Which is why in grad school, many professors subvert this by never giving C, and only giving B's if you're fairly poor. They lost the battle for undergrad, but grad school (at least in science/engineering) is still their domain.

smallnamespace · on Sept 28, 2016

I was unclear before -- an exam with an average score of 90 is equally bad as one with an average score of 10.

The variation in score is maximized when the exam has an average score of 50.

Since an uncurved score of 50 is a failing grade, you'd need curving to make that system work.

BeetleB · on Sept 28, 2016

>I was unclear before -- an exam with an average score of 90 is equally bad as one with an average score of 10.

I agree but it's not what I'm advocating for.

>Since an uncurved score of 50 is a failing grade, you'd need curving to make that system work.

A curve is inherently about grading relative to one's peers, and I see no reason why it is required. If your test will have an average score of 50, make that a B (or whatever), and perhaps 75 and A. Just do it and fix it to those grades. Do not keep changing the threshold based on your current batch of students.

I think the fundamental differences between the two camps boils down to this question:

Should grades be an absolute measure or a relative measure? I'm strongly in the absolute camp.

smallnamespace · on Sept 28, 2016

> If your test will have an average score of 50, make that a B (or whatever), and perhaps 75 and A. Just do it and _fix_ it to those grades.

That's still a form of curving, since you can't know ahead of time on a new test how students will do. E.g. if you arbitrarily fixed it at 50 / 75 and your students do a lot worse than you expected, do you just fail the lot of them? If not, and you move the score thresholds lower, then that's just what curving is.

In fact, I had a couple of professors who told us ahead of time that curving could only be used to adjust our scores upward, but never downward, e.g. 70% was passing, but so was 60%, if enough of the class did poorly. Made the students feel much better, since they had a fixed bar to reach.

Seems like you'd be satisfied though if the curve used at least ~100 students to calculate so that it didn't change much due to random variation (e.g. if your class is small, aggregate multiple years of data)? Still a form of curving (since you are, at the end of the day, being compared to other students), just with different methodology.

Bartweiss · on Sept 27, 2016

But I've seen a lot of courses where it isn't. Among the many techniques I've seen used:

- Raw average, then set letter cutoffs for a normal distribution across students. Arbitrary on distribution, normal on letter.

- "Skew high" by counting missed points more weakly as you miss more. Inflated letters with a score-accurate relative distribution.

- "Fixed and curved", where each cutoff is based on the more (or less!) generous of standard deviations and 'conventional' grade brackets: the floor for A is either 92%+ or +2 SD from mean, B is 83%+ or +1 SD, etc. Bizarre and distorted everything, since the relevant cap is determined by the local distribution.

- God knows what else.

So honestly, I think post-curve data is a hopelessly course-specific, nonlinear mess.

matteotom · on Sept 27, 2016

I've seen a few different curving processes: 1. Formula, like square root of grade times 10, that's designed to bring up grades. There're a lot of possibilities here 2. Top x% get A's, next y% get B's, etc 3. Look for clusters: top cluster gets A's, next B's, etc

These can also be applied to classes as a whole or individual assignments and tests

tamana · on Sept 27, 2016

That sounds funny, but it's true under the common modern curving regime that is really just lowering the bar for raw score to letter grade conversion, not forcing a bell curve.

turnip1979 · on Sept 27, 2016

Square root of grade times 10 :-p