Skip to content

Understanding Forecasting Systems for EA purposes

  • Empirical Accuracy
    • Forecasting with a track record can align incentives on accuracy.
    • Over time, data will be collected on empirical accuracy, giving a better sense of how much trust to have.
    • Aggregation systems could improve group ability. These will get better over time. Currently Foretold uses a very simple aggregation method, but this will be improved.
  • Openness
    • Having formal forecasts on important questions is a convenient way of providing useful information to a bunch of people, even if the only forecasters are internal.
  • Tracking
    • Seeing the results of many people could be interesting. Understanding when & why forecasts change can be useful for a better understanding of what is important.
  • Calibration Practice
    • Calibration abilities often fade (according to direct discussions with Douglas Hubbard), unless they are continuously used. An ongoing forecasting effort could help with this.
  • Skepticism and calibration
    • Experienced forecasters are generally calibrated. If they have enough information to make a good forecast, then they can be expected to be less biased than experts, who are often less well-calibrated. This could be specifically useful in areas where others would be expected to be particularly biased. For instance, the question “When will we complete this project?” is one where bias could be expected, but the question “How many views will the Apple.com website have in 2024” would likely have less (assuming the group is not related to Apple).
  • Crowdsourcing research
    • Forecasting is a way of outsourcing research work. This is best in situations that don’t require a ton of domain expertise. One advantage of crowdsourcing is that it can be scalable and simple, but one disadvantage is that it could be expensive. One remote forecasting full-time equivalent is probably expected to be less effective than one in-house full-time equivalent because they will have much less context. In some cases, we could bring in forecasters with very specific domain expertise, and that could help in these areas.
    • We probably won’t have access to the best remote researchers in the beginning, but can work to get that if it seems particularly useful. It may take some time if we have to pay them significant amounts. The existing Superforecasters were relatively inexpensive but still cost significant amounts.
  • Recruiting
    • If we attract a bunch of people to help in forecasting efforts, then this could be useful to identify ones that are particularly good or interested in different efforts. This could be useful for identifying future hires or encouraging other smart people to apply for positions.
  • Question ideation & discussion
    • Coming up with specific questions that external forecasters can understand is surprisingly difficult. Once you make one version, you may get pushback or further questions from forecasters. For example, if you ask “What are the chances that nuclear war will happen by 2025?”, you may get flooded by a series of questions on which specific types of nuclear weapons would count as being part of a nuclear war.One issue here is that most people are used to relatively vague terminology, but once things are pinned down to a specific question, users demand much more specific terminology. This could be a surprise for many people. That said, one benefit is that it could force you to recognize vagueness and make other discussions more clear.
    • External forecasters may have many domain-specific questions for their forecasting efforts. If the question writers are responsive, this could be highly valuable, but this also presents a significant distraction.
  • Resolving questions
    • Questions can be difficult to resolve. If the question isn’t already clearly posted on a website, the answer may require a significant amount of research and/or evaluation.
    • If questions are to be resolved in the far-future, then the question itself presents a type of debt that one should prepare to pay off in many years. This could be difficult to track and ensure that the resources will exist to adequately resolve it. In some worst cases, it’s possible an organization could take on a significant amount of “resolution debt” before it has experience understanding the cost, and get left with a surprisingly high amount of work without adequate resources to deliver. That said, this could, of course, be mitigated with forecasting on these costs and capabilities.
  • Information Liabilities
    • Forecasting important questions often create informational openness that may be unusual to many groups. For instance, the accurate answer to the question “Will our project run over schedule?” is typically one highly guarded by a few managers, but in forecasting systems is exposed more broadly. I believe this is one main reason why forecasting systems are currently used by very few groups.Forecasting can be seen as a very transparent tool, similar in style to organizational practices at Bridgewater. Some people really dislike these kinds of practices.

Possible Best-Practices for Question Organization

Section titled “Possible Best-Practices for Question Organization”
  • Focus on highly similar & structured questions.
    • Defining questions, debating the specifics, and resolving them can take a significant amount of time. The more similar different questions are, the cheaper they will be for all involved. For example, instead of having a custom success metric for each nonprofit, if there were one universal, but slightly worse metric, it could be a good first-pass. This also helps with organization; keeping a high-level overview in your head of 100 different questions could be very hard, but if they are all specific variations of each other it is much easier.
  • Keep things nonsensitive
    • Questions that must be private or questions that would raise controversy or hurt feelings can be liabilities.
      • Security
        • While the chances shouldn’t be high, there is always a chance that someone’s account could be breached or similar, so data will never be 100% secure.
      • Controversy / Pain
        • There are some questions that would anger specific people or groups. For example, the question, “How likely is Startup X to fail in 2 years” could put the founders of Startup X on edge. In some of these cases, there are very similar questions that could get around these issues.
          • Privacy
            • When in doubt, questions like this could be private.
            • You could even use pseudonyms for specific names, to be extra careful. We may eventually add some extra tooling to make this painless.
          • Generality
            • “How many of these 50 YC startups will fail in 2 years?” is likely to be protested less.
          • Grade on curves
            • Rather than ask, “How much value will organization X create?”, you can make a rubric for organizations which maps their value to a score of A-F or similar, with most groups getting a C or better. Even if the mapping is clearly stated, it’s very easy to make this less visible and have people worry less.
          • Focus on the best items
            • Many awards only give awards to the very best participants. Similarly, you could forecast things like, “how likely is this group to be in the top 3 of the rankings”?, or just reveal the top few results.
  • Questions should be interesting to other groups and to forecasters
    • Forecasting can be expensive. If the results of these forecasts are useful to more people, that’s generally more efficient. Likewise, if you can come up with questions that would be useful to multiple EA organizations or similar, those would be particularly interesting.
    • Forecasters find some questions more interesting than others. While I don’t have a great model here at the moment for the specifics, I think you can imagine what kinds of things they may prefer to work on. Especially for volunteers, this could matter a lot.
  • EA-related volunteers / part-time consultants
    • There are a few handfuls of EAs who have expressed interested in volunteer work, some would be willing to spend more time if paid. A few of these are experienced/well ranked on Metaculus or The Good Judgement Project.
      • College Groups
        • One particularly interesting type of volunteers could be those in college groups. It could eventually be interesting to have competitions between different colleges for forecasting value.
  • Full-time forecasters
    • We may eventually hire some forecasters to spend 20+ hours per week on forecasting. Here it would be more of a job than a hobby. This could get expensive but could be very reliable. These forecasters would likely be remote.
  • Organizational employees
    • Individuals inside an organization can be used to make forecasts. This is good for short things in areas they are well knowledgeable on. It can also be useful to help the organization become better calibrated. However, it’s probably unlikely that participants will spend a whole lot of time on forecasting unless it is seen as official work or is strongly integrated into the culture.
  • External EA experts/employees
    • It could be useful for various individuals at different organizations to directly participate in shared forecasting efforts. This could be particularly beneficial in order to better understand what a diverse set of community members thinks on various issues. It may also be good if some individuals have specific but highly applicable expertise.
  • A domain-specific dictionary.
    • Lots of terminologies may be similar between questions, but be vague. It can be good to have a defined list of terminology somewhere for repeat use.
    • You can see one example of this at the AI Dictionary.
  • A “resolution counsel”
    • There could be some questions that require judgements by individuals. For instance, “In 2025, on a scale from 1-10, how good of a job did the new CEA do?”, or even specific questions that may still have some hidden assumptions like, “In 2022, will project X have been started?”. In these cases, it can be useful to specify who is responsible for answering the questions. The less the questions are, the more important that a robust and respected counsel is established.
    • The Parallel Forecast team is currently putting together a resolution counsel for AI purposes.
  • A “knowledge graph”
    • As terms are defined, it may be useful to establish a significant knowledge graph of information, similar to that in Wikidata. Foretold currently has a little support for this, but will have more, later.