Criticize this post

A November 2020 essay arguing that the effective-altruism community, particularly its longtermist and meta wings, was underdeveloped in public feedback, evaluation, and candor. It contrasts GiveWell-style charity evaluation with the near-absence of evaluation for longtermist/meta organizations, and attributes this partly to the community being “almost all friends,” where honest negative evaluation risks one’s own funding and future job prospects. It discusses a cultural lack of candor (comparing EA with the Rationalist community), uses EA Global as a case of consensus over controversy, and proposes a simple rubric (internal/external feedback, mentorship, prioritization, cultural candidness, decentralized justified trust) with the author’s ratings shown in a referenced image. Later sections connect the theme to forecasting and evaluation systems, an explicit invitation for criticism, and a reading list. The author flags it as a messy exploration rather than a polished essay; Nuño Sempere’s restored marginal comments are included at the end.

Criticize This Post

TLDR: I think the Effective Altruist community and research organizations, particularly around meta and long-termism, are fairly early with regard to methods of public feedback, evaluation, and candor. I believe this is an area people should pay attention to and that there are likely significant effective gains to be made here going forward.

To be clear I think many other groups are also poor here and often worse. Much of this post could apply equally well outside EA. But some groups (entrepreneurs, top businesses, effective scientific institutions) are better. And I think it might be wise to aim much further than any existing groups as a 10 to 100 year goal.

This post is very much a messy exploration, think of it as a collection of quick rants rather than a polished essay. I budgeted a day or two and took a week. I’d love to attempt a full rewrite or make it into a series, but I really don’t have the time now for doing so. Feel free to ignore this or do a quick skim.

Note: this post assumes a fair amount of background knowledge of the Effective Altruist community and culture. It’s based primarily on personal accounts and discussions, from my history of being around the community for the last several years.

Effective Altruism (2020), Criticism, and Feedback

On the surface, it may seem that Effective Altruists should be at the cutting edge in terms of evaluating their work and being candid about disagreements. I think in practice we have a fair way to go. I’m interested in working to help improve things and am curious to get other thoughts on the topic.

The Effective Altruism movement largely grew around GiveWell, which specialized in evaluating the effectiveness of nonprofits. The old argument was that nonprofits couldn’t be fairly evaluated between cause areas or that nonprofit work was simply too complicated to understand quantitatively. The GiveWell approach looked to the critics as an incredibly arrogant endeavor, but now is recognized at least within the community as having done a fairly good job, especially compared to what had come before.

So it’s frustrating that many of the new high-profile cause areas seem almost evaluation-free. There is no GiveWell or Animal Charity Evaluators for longtermist or meta nonprofits. One reason for this is that there’s been a shift towards longtermism, where it’s substantially more difficult to estimate effectiveness. Another is that these are relatively new areas. But these excuses obviously shouldn’t mean a free pass.

I think one thing that’s happened is that the longtermist / meta community is almost all friends with one another. It’s easy to rank organizations that neither you nor most of your readers will ever personally know. It’s much more awkward when you have to call out friends for doing a bad job, knowing it means they might lose their jobs because of it. It can be personally ruinous if it’s likely that you might want to work for one of the organizations you give a bad score to down the line, or that someone you give a bad score to might be an important Board member one day.

I know this in large part because I’m friends with many people in the community and have experienced this. I like the community a whole lot and have an absolute ton of respect for them. I’ve worked at 80,000 Hours in 2014, have attended something like five Effective Altruism Global events, and have most recently worked at FHI for two years in the Research Scholars Program.

I’ve previously asked a few relevant figures if they’d be up for publicly evaluating long-termist/meta work, and got some rather hesitant responses. You might notice that many of the public research analyses emphasize good work much more than poor work. It’s much easier politically to highlight the good things than the bad.

The existing solution I see is background chatter around funders. Organizations that do work that seems sketchy get flagged and that gets communicated fairly haphazardly between people. This is a low cost strategy that’s pragmatic for small setups but has a bunch of limitations. Organizations that are refused funding often have a very challenging time finding out why. They often continue to search for increasingly-distant funding sources, many of whom have not heard the rumors. Onlookers can’t understand what’s happening and might either fund poor organizations or worse, start poor organizations only to get rejected (or worse, accidentally funded) several years later.

I don’t want to over-emphasize feedback on a per-organization level. There’s also blog posts, articles, and cause areas. On articles, for example, a lot of content by smaller organizations is written without peer review. Even content written under peer review has problems, as modern peer review processes can be quite myopic in focus and restricting in formats.

I can point to my work as a case where feedback has been challenging. I’ve found handfuls of people who can help review my Google Docs papers, but it’s an ad-hoc process. Posts on LessWrong and the EA Forum don’t really get comprehensive reviews. The karma system is a good start but does not replace a quality rubric and evaluation. Going up higher, I’m not sure who to go to to get the best feedback on how I’m doing as a whole, or how well a research agenda is doing. I don’t want to produce a bad effort, I’d prefer that a quality system help tell me if my work is of high expected value or a net loss.

When I’m asked by young Effective Altruists how to get started with research, the best advice I can give them is often something like, “Try to post on the EA Forum and cross your fingers you’ll get useful feedback.”

So, to summarize, we don’t have many public evaluations or feedback, and also have prosaic private feedback mechanisms.

Effective Altruism, Certainty, and Candor

Part of the issue feels like a cultural lack of candor.

Candid communication is not always enjoyable, especially for people not used to it. But a lack of candor can be stifling in the long run.

Candidness is one area where I feel the EA community could learn from the Rationalist/LessWrong community. The Rationalist community has a long history of attracting disagreeable people. For a while the comment threads were highly unpleasant with lots of over-the-top criticism. But over-the-top criticism is at least criticism, and things have gotten more friendly (though less critical). Many of the non-AI-safety writers seem to have very individualistic approaches and agendas.

My gut feeling is that the Effective Altruist community is more agreeable and conformist than the Rationalist community. This is beneficial for coordination. I’m sure it has helped speed up the time from someone joining the field fresh to them heading for a path to work at one of the most EA-reputable AI Safety organizations. Agreeable people tend to be good team members, as long as the ship is heading in the right direction.

For example, on the EA Forum, I’ve seen almost no serious feedback or criticism of the primary organizations (Open Phil, FHI, CEA, CSER, OpenAI etc.) It’s like the main ideas and actions are going unchallenged. I’ve heard a fair amount of mumblings behind the scenes, but very little shown publicly or even presented to the relevant people. I’m guilty of this myself. I’ve noticed that it’s particularly scary to speak up when there’s a lack of precedent for it.[1]

When I see posts and writing of altruistic people I’m hesitant to provide criticism. There’s a culture of positivity on both LessWrong and the EA Forum (and most forums, to be fair). This is great in some ways but bad in others. Public & online criticism can be scary, especially because we don’t know who will be reading things and what possible future job opportunities are on the line. At the same time though, it’s really difficult to improve without real feedback. Also it would be useful common knowledge for people to know at least some of which posts and projects aren’t good and why that is the case. Negative case studies are often the best ones.

Effective Altruism Global

Let’s use Effective Altruism Global as an example of a cultural point. I’m very thankful for EAG events (As noted, I’ve been to several), but I’ve been frustrated that they seemed to emphasize consensus over controversy. EAG presentations give an aura of authoritativeness. Presenters are (literally) put on a stage and introduced with glowing speeches. The opening and closing sessions are typically highly optimistic about Effective Altruism, and many of the talks seem a lot like they are standing in as “the definitive take on X.” There are sometimes talks about possible disagreements, but these are rather few, and it’s not always clarified that the disagreements are substantial. I recognize that debates have a lot of problems, but I think I’d be more excited about what would come out of an “EA Cause Area Debate” or similar than the marginal presentation. There could be a lot more emphasis on controversy and ways that the respected actors are wrong.

My read is that one of the main goals of EAG is to get newcomers up to speed with the EA expert understanding of things. There are very valid reasons for this. The majority of people I talk to that have criticisms of EA are people who haven’t thought about the issue much. On the margin, I’d expect their beliefs would be better if they accepted the primary results of EA investigation. But there are downsides to a confident image as well. It’s easy to give the impression that we’re far more sure than we actually are. I’ve seen this happen several times and been complained about a lot in the background. See the Earning to Give controversy for perhaps the clearest example. And more important we really would benefit from people who question the wrong things. I’m sure we’re making lots of mistakes that aren’t yet obvious.

I think one could often come away from existing EAG events thinking that all of Effective Altruists agree with one another. This is either very wrong, or we have a much bigger problem to worry about.

I’m not saying that EAG is worse than other conferences. I’m rather the kind that finds most similar conferences fairly insufferable. I would like for EA to find ways of presenting things better. It might be the case that because most conferences project plastic, uniform, proud, and overoptimistic images of themselves, ones that don’t would be seen as strange and irrelevant. Perhaps there is little such thing as a big conference that emphasizes humility, doubt, and self-reflection. But I’m hopeful. It might be easier now that many of the EAG conferences have become smaller and some aimed more at the most experienced people, as opposed to trying to attract newcomers.

Candidness between cultures

One cause of the culture could be that Effective Altruism is attempting to be very encompassing to get a diverse set of skills. As such one might expect the culture to approximate the average of the cultures that it draws from. It’s already highly selective for talents, there’s not much room to be additionally selective for candor. I’m used to startup culture, so in comparison, the “average of intelligent groups in Western Nations” is not particularly good at being honest and candid. This might be what’s common, but that doesn’t make it good enough.

A more encompassing issue is just that all existing communities have limitations regarding candidness. I don’t believe we have any examples of groups that are as strong as what we would ideally want. Startup culture is good in some tactical areas but quite poor in moral ones. Bridgewater culture seems great for business performance, but I doubt it has escaped the confines of all Western biases and honesty limitations. If you were to go back in time, all civilizations seemed to have some core unquestionable assumptions and honesty norms common among all subcultures.

At some point you enter uncharted territory. I imagine that to do this well would require a fair amount of innovation and consideration. This is the kind of challenge that could take a while. Perhaps one would desire some serious research efforts to navigate and test possible cultural changes. I imagine that big gains would take many years, perhaps 5 to 100 or so.

Counter Examples

I have noticed Effective Altruists being highly candid or critical in a few areas:

Criticisms against Effective Altruism
Proposals of new cause areas
Research from people new to Effective Altruism, often who aren’t used to the writing style and evidential standards
Culture war issues I have mixed feelings about the criticized areas. I think the criticism is often warranted, though I think there’s room to improve with regards to empathy and respect in how it’s delivered. I hear that many other altruists have had bad experiences engaging Effective Altruists and gotten a sour taste for the movement. But perhaps the main thing here is that this is an indication that critique is very possible, it just doesn’t seem to be applied much to some of the most core and important topics.

It should be clear that a community being critical about opposing beliefs does not gain it many points in being critical for self correctness. Every intense community is critical about opposing beliefs. Christians have many fierce debaters that have stood ground against atheists in argument. I’m sure there are many intelligent Scientologists and other cult members who are exceedingly clever in defending their held orthodoxies. So it’s interesting how good communities can be at exploring arguments against other groups while at the same time refusing to apply similar measures to their own, often exceedingly dubious claims. Really, communities that are good at attacking criticism without applying self reflection aren’t ineffective, they can be actively dangerous.

Candor****and certainty

If the key decisions Effective Altruists make were obvious, then candor wouldn’t be as important. However, I think the decisions are clearly not. The solution space of “all of the ways to help the world” remains vast and perplexing.

The flip side to candor is certainty. Candor that opposes things that are certain is often paranoia. It’s counterproductive to have people doubting things that are actually true. If you have to rally the troops to defeat actual Nazis, you don’t want your people to be spending their efforts on metaphysical definitions that preclude the meaning of war. In business there’s a phrase for such problems; disagree and commit. At some point a decision has to be made, after which it isn’t useful to debate it. What then matters is execution.

A lot of the main Effective Altruist beliefs are clearly not certain. We have lots of time to change things going forward. So we rarely need to disagree and commit.

It’s useful for individual organizations to make fixed assumptions, but this applies less to the EA community. For example, I’m happy that the Against Malaria Foundation doesn’t spend resources figuring out if global welfare is cost effective vs. Artificial Intelligence safety work, but I very much want other groups to be questioning this.

A Simple Rubric

We can try inventing a rubric to clarify where communities or organizations stand in these areas. Here’s a simple breakdown that could do the job for now. I think of things as split between “processes” and “systems”. Processes are procedures that ideally have operationalizations and regular implementations. Systems refer more to all-encompassing measures of culture and knowledge.

Processes: Internal Feedback, External Feedback, and MentorshipThese all have similar purposes but different methods. I think they are self descriptive. Mentorship is arguably a type of internal feedback but was broken out as it’s typically distinct. Mentorship could include things like line managers; any kind of people who check in with people on a recurring basis and give them advice to improve.Prioritization AbilitiesHow well do all levels of a community prioritize, especially around company-level strategic factors? Are priorities both correct and clear to all relevant members?

Cultural CandidnessIs candid communication common? Are people who use it actively rewarded when it turns out to be beneficial?

Decentralized Justified Trust“Justified Trust” means that individuals and organizations are both trusted and also deserve that trust. Decentralized means that this is widely spread out. If some leaders are trusted more than they should be that’s bad, if newer members aren’t trusted enough that’s also bad.

Here are my quick intuitive ratings for where EA stands, focussing on Longtermist (includes AI safety and Bio) and meta cause areas. This is a hard rubric for modern communities; top hedge funds and tech companies would probably get a lot of 4s and a 5 or two, most communities I can think of would mostly get 1s and 2s. Also, note that there is some selection effect in regard to the rubric. There are other equally important rubrics I could have imagined where the Effective Altruist community is doing quite well. I was focussed on this one because I wanted to write about an area where I was excited to see improvement.

figure from Criticize this post

Feedback, Candidness, and Forecasting

If you’re familiar with some of my recent work, you might be wondering where this topic fits in. I’ll briefly go over its relevance to forecasting platforms.

A lot of potential forecasting value is bottlenecked by the acceptable candidness or honesty of a community. Many of the benefits of forecasting could come from delivering evaluations. See the Prediction-Evaluation post for details here. Unfortunately in current cultures honest and impactful evaluations are very hard to publish without deeply upsetting some people and getting a lot of pushback.

One of the key reasons why internal forecasting setups haven’t succeeded in total seems because of their transparency. Project managers typically don’t seem to want information on the expectation of their success (of project success or timelines), if it means the information will also be public to others. It’s far more convenient to pretend things are going well and make up excuses last minute for why you couldn’t have seen failure coming. A difference transparency challenge is that the introduction of forecasting platforms to respected analysts is often met with trepidation. They are already respected, so they only have to lose respect by honestly tracking their accuracy. This is one reason why the Expert Political Judgement tests were done anonymously. So forecasting platforms require levels of transparency to operate that make traditional analysts uncomfortable, and when they are used they create transparency in areas where often leaders don’t want it.

New evaluation measures in general are disliked by people who don’t do well on them, so they are very difficult to introduce and substantially change for this reason. Top scientists (by citation count) don’t like altmetrics, popular hospitals don’t like price transparency.

Regarding Effective Altruism, we could begin to estimate the value of all EA projects and make that public. We could have estimates for how valuable each organization is at each point in time. We could have measures for our expectations of possible job candidates and identify promising potential that naive measures might have missed. This could be incredibly valuable but it would probably make some people uncomfortable or worse. If you identify that an organization doesn’t seem very beneficial, that could lead to that organization to stop getting funding and talent. Hopefully if there are bad projects or organizations they would get fewer resources, but someone would feel worse for it.

The few of us playing with these ideas can use ourselves as guinea pigs. Nuño Sempere recently scored my work. I’d love to make a lot more about myself and the organization more transparent, but am reluctant to be particularly radical.

Speaking more generally, within our culture (both EA culture and Western culture) it’s expected that we keep a lot of things private. If one person reveals their mental health issues publicly, it will be seen negatively for potential collaborators, even if all things considered the person isn’t in particularly bad shape. I noticed this in the startup scene where founders would produce highly inflated images of their progress. This happens at a systematic level, so any new founder that doesn’t is considered particularly unpromising. Even if a Venture Capitalist appreciates the honesty, they will be suspicious that potential hires and future potential Venture Capitalists will not.

In a world where people and organizations promote highly filtered and overconfident information about themselves, radicals who try to be honest can look quite poor by comparison. And in this world, evaluations that try to be honest can dramatically interfere with the convenient self-images so carefully put together.

So when designing a community wide forecasting/evaluation system, we really need to decide just how transparent and honest people are willing to be, and if there are measures that could make such honesty more tolerable.

Opt-in Candidness Invitation

Without a culture that actively encourages candid critiques it can be risky to begin. However, individuals or organizations can make things easier by publicly asking for criticism. Consider this section an open invitation to be critical of this post and future posts I work on. Go crazy, write a scathing critique. Honestly I would really appreciate it, I’m sure that I’m making a lot of mistakes and overlooking key points. Here’s an admonymous link if you want.

I would appreciate it if you phrased things respectfully, but where there are trade-offs, I’d appreciate honesty more than niceness.

What do we want for Effective Altruism?

I think that a lot of key community members aren’t too thrilled with the current situation. The EA community could be doing a lot worse, but it could also be doing a lot better. I have very high hopes for the community in the long term and think that doing better on these measures might be one of the most powerful instruments for substantial long term impact.

There are a bunch of changes we could make, but they might be either difficult or uncomfortable. I have a list of proposed methods for improvement and am actively working on some of them. However, to not anchor the community, I’m curious to get people’s responses first. I’ll write another post with my thoughts here later.

Comments from Nuño Sempere

Restored with permission (Nuño’s comments, with Ozzie’s replies).

On “.”:

Nuño Sempere: Also, it makes you more vulnerable if anyone decides to hunt for dirt on the EA community

[2 comments by other collaborators omitted]

On “My read is that one of the main goals of EAG is to get newcomers up to speed with the EA expert understanding of things”:

Nuño Sempere: Unclear if true

On “If one person reveals their mental health issues publicly, it will be seen negatively for potential collaborators”:

Nuño Sempere: Depends on how much status the person had in the first place. When Rob Wiblin talks about how he takes anti-depressants, he gets some bonus points.

Ozzie Gooen: It’s complicated. Might discuss edge cases.

On “I noticed this in the startup scene”:

Nuño Sempere: Yes, but you were talking about EA culture and Western culture, and the whole point is about EA, so this example doesn’t really contribute to the point you’re making. It would if you had an equivalent EA example

On “radicals who try to be honest can look quite poor by comparison”:

Nuño Sempere: Not necessarily, because then honesty is a strong signal of quality “I don’t have to be overconfident because I am so great.” / countersignaling.

Ozzie Gooen: I didn’t mean this absolutely. If I were to provide clarifications on all edge cases this would have been much, much longer. Finding the balance is difficult.

On “The few of us playing with these ideas can use ourselves as guinea pigs. Nuño Sempere recently scored my work. I’d love to make a lot more about myself and the…”:

Nuño Sempere: This seems too self-congratulatory

Ozzie Gooen: I imagine you mean just the first paragraph here? I don’t think I follow if you are referring to the entire section.

On “Here’s an admonymous link if you want.”:

Nuño Sempere: I think admonymous links are a great social mechanism, because they align virtue signalling with actually creating mechanisms to give feedback. Might be be worth explicitly pointing that out.