Bayesian truth serum, grading and student evaluations

In one of my last posts, I examined some proposals for making university grading more equitable and less prone to grade inflation. Currently, professors are motivated to inflate grades because high grades correlate with high student evaluations, and these are often the only metrics of teaching effectiveness available. Is there a way to assess professors' teaching abilities independent of the subjective views of students? Similarly, is there a way to get students to provide more objective evaluation responses?

It turns out that one technique may be able to do both. Drazen Prelec, a behavioral economist at MIT, has a very interesting proposal for motivating a person to give truthful opinions in face of knowledge that his opinion is a minority view. In this technique, awesomely named "Bayesian truth serum"*, people give two pieces of information: the first is their honest opinion on the issue at hand, and the second is an estimate of how the respondent thinks other people will answer the first question.

How can this method tell if you are giving a truthful response? The algorithm assigns more points to responses to answers that are "surprisingly common", that is, answers that are more common that collectively predicted. For example, let's say you are being asked about which political candidate you support. A candidate who is chosen (in the first question) by 10% of the respondents, but only predicted as being chosen (the second question) by 5% of the respondents is a surprisingly common answer. This technique gets more true opinions because it is believed that people systematically believe that their own views are unique, and hence will underestimate the degree to which other people will predict their own true views.

But, you might reasonably say, people also believe that they represent reasonable and popular views. They are narcissists and believe that people will tend to believe what they themselves believe. It turns out that this is a corollary to the Bayesian truth serum. Let's say that you are evaluating beer (as I like to do), and let's also say that you're a big fan of Coors (I don't know why you would be, but for the sake of argument....) As a lover of Coors, you believe that most people like Coors, but feel you also recognize that you like Coors more than most people. Therefore, you adjust your actual estimate of Coors' popularity according to this belief, therefore underestimating the popularity of Coors in the population.

It also turns out that this same method can be used to identify experts. It turns out that people who have more meta-knowledge are also the people who provide the most reliable, unbiased ratings. Let's again go back to the beer tasting example. Let's say that there are certain characteristics of beer that might taste very good, but show poor beer brewing technique, say a lot of sweetness. Conversely, there can be some properties of a beer that are normal for a particular process, but seem strange to a novice, such as yeast sediment. An expert will know that too much sweetness is bad and the sediment is fine, and will also know that a novice won't know this. Hence, while the novice will believe that most people agree with his opinion, the expert will accurately predict the novice opinion.

So, what does this all have to do with grades and grade inflation? Glad you asked. Here, I propose two independent uses of BTS to help the grading problem:

1. Student work is evaluated by multiple graders, and the grade the student gets is the "surprisingly common" answer. This motivates graders to be more objective about the piece of work. We can also find the graders who are most expert by sorting them according to meta-knowledge. Of course, this is throwing more resources after grading in an already strained system.

2. When students evaluate the professor, they are also given BTS in an attempt to elicit an objective evaluation.

* When I become a rock star, this will be my band name.