Evaluations: Pros & Cons

A partial essay on systematized evaluation. It argues that evaluation systems are an alternative to costly signaling, using athletes (Sabermetrics/Moneyball) and software engineers as examples of fields where performance is easy to demonstrate, contrasted with intellectuals who advertise credentials. It distinguishes objective quantitative measures from the subjective qualitative evaluations needed for harder-to-measure things. The second half concerns evaluation agencies and power: that such bodies depend on a brand of authority, that highly-ranked parties have incentives to reinforce the evaluator’s prestige (Princeton Review and elite universities; award-ceremony dynamics), producing self-reinforcing cycles, and that there is little evaluation of evaluators despite this seeming especially important. The draft is unfinished: it retains an apparent stray header line (“AJoys and Negatives of Evaluations”), relies on one referenced image, and ends mid-thought on an incomplete sentence (“The Oscars”).

AJoys and Negatives of Evaluations

Systematized Evaluations as a Rationality Power Tool

Systematized evaluation procedures aren’t generally considered an exciting topic, but it should be clear that they are an important one.

Evaluations and Signaling

In cases where there’s a lot of unnecessarily costly signaling happening, evaluation systems can be an answer. Costly signaling is a sign that there aren’t existing quality evaluation systems in place; if there were, it’s not clear how signaling would be useful.

Athletes don’t spend much time advertising their quality. Instead they focus on doing well. If a young athlete gave long interviews or wrote a series of articles about what makes them great, I don’t think this would convince coaches much. It’s so incredibly easy to demonstrate their quality in simple tests. Now we have comprehensive and systematized Sabermetrics to calculate specific aggregates of quantitative performance measures to make hiring decisions, as described in Moneyball. These metrics don’t bother analyzing how well players argue for their quality, they just look at the stats. Players in such a system are incentivized not to signal but to perform, the system is rather aligned.

Many software engineers barely have resumes. If they’re good it’s often evident by rather short interviews. Facebook has been known to encourage people to drop out before finishing college. Compare this with some Intellectuals, who seem to spend a considerable amount of time detailing their awards and recognitions in long CVs, even publicly pronouncing these recognitions at the beginnings of talks.

Simple objective and quantitative measures are ideal and preferable. We can repeatedly measure the karats of gold, the population of a country, the total profit of a company. But many important things aren’t suited to simple objective and quantitative measures. There’s no one number for how good an employee is or how well reasoned a book is. For things like these we need some subjective and often qualitative measures. This can be somewhat arbitrary and time intensive, but it’s often the only serious option on the table. The question is often not “should we use systematic evaluations?” but rather, “what should our systematic evaluation systems be?”

Evaluation Agencies and Power

Subjective and qualitative evaluations in particular require a fair amount of coordination. There’s a lot of human work to coordinate and pay for. Often this is easiest with one agency in charge, but centralized power can be corruptible. Power can breed more power and unchecked power can lead to big problems.

Evaluation bodies tend to have brands that scream out, “We’re authoritative, you can trust us!” Often there doesn’t seem to be much more to these brands really. I often don’t see public concerns or discussions about quality from these organizations, which is interesting as it is insanely difficult to actually do a good job. I haven’t yet seen an award ceremony include details of the potential problems of the award selection process.

If I ever were to have an award agency, I’d want to name it something like, “The highly uncertain best guesses of X.” Perhaps this is one reason I’ve never had an awards agency.

For example, take courts, with some of the most over-the-top authoritatively-branded buildings out there. The histories of courts are filled with challenges and the results are often fairly random, but you wouldn’t have any idea looking at them. Courts do lots of marketing, it’s just all spent on architecture.

figure from Evalutions_ Pros & Cons

Why is authoritativeness so important? Given that evaluation bodies focus on evaluations, they are listened to about as much as people respect their evaluations. Their power is proportional to their respect among powerful actors.

Evaluative authoritativeness is beneficial to those ranked highly. If you represent a top rated product on Consumer Reports, you’d prefer that people trust Consumer Reports. This means that those ranked highly will be inclined to “go along with it” or reinforce the authority of the evaluation agency. The Princeton Review giving good scores to Harvard and Stanford in part helps Harvard and Stanford keep their prestige, and then Harvard and Stanford (the most prestigious schools) can give back by being positive about the Princeton Review. Everyone at the top does well when The Princeton Review becomes more reputable, so the groups in power help make sure that happens. No one (in power) wants to see much criticism of The Princeton Review.

A lighter example would be the speeches at the ends of award ceremonies. The winners have obvious incentive to pretend that the awards process was respectful. Imagine how odd it would be for a Nobel Prize winner to spend resulting interview time detailing the faults of the Nobel Prize selection process. The losers get less attention so have trouble complaining (plus, they seem like sore losers when they do.)

I’m sure norms around respect and reciprocity also have to do with it, but this is a complicating factor more than an altruistic one. If you were to attend a prestigious ceremony and later insult the organizers, you’d be in essence insulting everyone who supported the organizers, whether they did so for valid reasons or selfish ones. The fancy parties hosted by awards agencies for top participants come act a lot like bribes.

So you can get nasty cycles where evaluation agencies (and evaluation systems, in general) reinforce their own authority over time. They help make specific actors powerful, and those actors are incentivized to further empower the agencies.

Sometimes broad consensus is even more important than evaluation quality. Perhaps it’s very important that there’s consensus that one Presidential candidate fairly won; more important than if they did actually fairly win.

Interestingly, there seems to be very little in terms of evaluator evaluations. My guess is that the evaluator bodies really like being on the top of the proverbial food chain, for one. Not such a fan of their authority being questioned. Add the fact that “evaluation evaluations” may seem pretty abstract to people. But if we think that evaluations are valuable and important, it seems particularly important to evaluate the evaluators. Evaluators should be subject to the most evaluation. If we make a mistake there, this mistake will cascade to what could be many other decisions.

The US FDA reviews (a kind of evaluation) food and medicine. Academic Journal administrators select and organize and peer review for their respectful journals. The Oscars