Project CAIRN: A Longtermist Wiki Project
CAIRN: Comprehensive AI Impact & Risk Navigator
Section titled “CAIRN: Comprehensive AI Impact & Risk Navigator”Using LLMs to make an extensive longtermist wiki, focusing on generating strategy insights
Ozzie Gooen, The Quantified Uncertainty Research Institute
LLMs have recently become able to produce compelling research reports and build complex software. So far this has been slow to translate to much better longtermist prioritization.

How can we convert Claude Code into better decision-making? It’s unclear what the exact best approach is, but there are several somewhat-obvious steps that could be useful:
- Gather the relevant information.
- Build knowledge graphs of critical longtermist factors.
- Make lists of potential cruxes / key questions.
- Provide rough estimates of a great number of key variables. For example, rating hundreds of interventions in terms of neglectedness/importance/tractability.
- Brainstorm new intervention ideas, especially in high-prioritization areas.
- Make strong interfaces both for public use and for LLM use.
It can be very easy to get overwhelmed by overzealous approaches. But one nice thing is that this can be done very iteratively. We begin with a small wiki, and gradually add more and more scale and complexity. We suggest a lean approach, where we attempt to iteratively publish. We already have a basic tool that does an introductory job on these tasks.
This is an experiment to see whether LLM-assisted research infrastructure can surface non-obvious insights at a cost/speed that traditional research cannot match.

Current Status
Section titled “Current Status”We have a basic, highly-experimental prototype here. This used Claude Code for almost all writing and estimation. I think this can be useful as an experiment, but note that it is very messy right now.
A Scalable Hunt for Critical Insights
Section titled “A Scalable Hunt for Critical Insights”It can be easy to spend a lot of effort researching a topic with little to really come from it. For example, many mathematical models might be highly accurate and technically detailed, but not lead to decision changes.
We can label a “Critical Insight” for a research finding that is a good mix of being surprising, important, and simple. We can estimate all of these with LLMs.
-
Surprising: Do LLMs dramatically update their beliefs when they get this information?
-
Important: Is there a strong argument that this information is on a crucial topic?
-
Simple: Is this argument easy to defend? High-quality critical insights include things like:
-
The importance of X-risk: “If something kills everyone in the world, that would be catastrophic. It’s easy to argue that current cost/benefit analyses would highlight such interventions.”
-
Concrete intervention ideas:“We’ve found that there’s a certain cluster of politicians in Germany who are unusually important and unusually influenceable. We have evidence for this that’s been extensively critiqued by different AIs, and seems robust.”
-
Surprising & Actionable Research: “When investigating organization X, we found clear and straightforward evidence of extreme fraud.” LLM-generated knowledge bases can become enormous quickly. It’s critical to focus on providing real value that can be very simple to convey. This is useful both for LLMs (which have limited context windows) and for humans (who have limited time and attention).
A “Hunting for Critical Insights” strategy would likely involve a lot of experimentation and prioritization. This would generate a great deal of data for the AIs to navigate, but many of the outputs could be very simple.

You can see a current dashboard of these here.

Building Blocks
Section titled “Building Blocks”A project like this can be decomposed into a series of concrete outputs that can be created using LLMs. Each output type would require its own tweaking, evaluation procedure, visualization, and prioritization.
Deep Research
Section titled “Deep Research”For a certain topic, produce an extensive research report. LLMs are already quite good at this. Several products for “Deep Research” exist, though few provide API access. Common challenges include things like minimizing hallucinations, and scaling this so that research work doesn’t get duplicated when researching similar topics.

High-Level Models
Section titled “High-Level Models”Having a long list of topics can quickly become overwhelming. To assist, it’s useful to have some key models. These generally contain important concepts/terminology, with assumptions about how they relate to each other. Below is one example.
These models would likely require some of the greatest human assistance at this stage. They are very high leverage, as once you have a good one, it becomes doable to have LLMs automate a lot of research on them.

As we get concrete models like these, we can then use estimations to figure out what longtermists might want to prioritize.

Factor Maps
Section titled “Factor Maps”For any parameter in question (e.g., “Total probability of catastrophe from 2026 to 2030”, “US Energy Production in 2026 to 2030”), generate a multi-layer network of contributors. Estimate the matrix of all interactions. This would take in the output of Deep Research.
Further work would go into ensuring that all influence diagrams don’t have redundancies with each other. One way to do this is to occasionally synchronize them with one super-diagram of all of the sub-nodes. This way we could ensure that we don’t need to duplicate research, for example in the situation of one sub-node that’s present in multiple influence diagrams.

https://ea-crux-project.vercel.app/diagrams/misalignment-potential

https://ea-crux-project.vercel.app/diagrams/master-graph?full=true&level=detailed
Large Tables
Section titled “Large Tables”We can organize much of the work into large organized tables. This is helpful to test semantic data specific to specific kinds of data.
Some examples:

Structured Data
Section titled “Structured Data”Deep Research is useful for putting together wiki-style writing on important topics, most of which is long unstructured text. In addition to this, we’ll want to organize a lot of structured data. Different types of data will require different functions for retrieval.
Putting together a full comprehensive structured repository of long-term information can be a great deal of work. There are already some organizations working on this. So for this project, we’ll focus on the essentials.
Some relevant types of structured data include:
- Publication information. For all research documents of interest, put together information about the authors, citations, publication, etc.
- Organization information. For each important organization that’s relevant to longtermism, organize some basic data (people, outputs, board members, etc).
- Relevant legislation. For relevant AI/bio/nuclear legislation, gather relevant links/authors/etc.
Controlled Vocabularies & Categories
Section titled “Controlled Vocabularies & Categories”Such a project will likely require thousands, if not hundreds of thousands, of concrete concepts to be researched, brainstormed on, and organized. Existing longtermist terminology was typically optimized out of convenience for local situations, with limited regard for wide consistency. This includes having consistent categories for things.
https://ea-crux-project.vercel.app/internal/reports/controlled-vocabulary/
Questions & Critiques:
Section titled “Questions & Critiques:”Why now?
It’s only in the last 10 months or so that Claude Code has made it easy to build out wikis and many of the other project components.
Wouldn’t the result just be AI slop?
With some work, it seems possible to produce outputs that at very least act as good research summaries. If you are picturing the naive results of chat applications like ChatGPT or the Claude interface, remember that these are minimalist applications that use single LLM calls to be fast and cheap. Extra scaffolding, costs, and runtime can go a long way.
How far can this get using existing AI?
I’m unsure. I think we could have interesting knowledge bases that track, say, 10,000 items, and provide a set of interesting estimates. As projects expand, they become more difficult, and it’s possible that this scale will lead to diminishing returns at some point. When doing technical projects, it can be very difficult to get the timing not too early and not too late. It’s possible that this sort of project might be a bit too early.
Will the results reflect the biases of the creators?
My goal is to aim for a good mix of “reasonable to the longtermist community” and “few unusual assumptions”. This is a tricky balance to get right. There will definitely be some biases and pet worldviews that become part of the project. However, there are some techniques that can help here. One idea is to really attempt to flag any critical and speculative assumptions, then allow users to make adjustments and get personalized takes from that. We’ll also try to focus on topics that don’t require as many personal worldview assumptions.
Further: Ambitious Potential Diagrams
Section titled “Further: Ambitious Potential Diagrams”Below are some example images of what future complex visualizations could look like. These specific images are fancier than what we could provide at this point, and have some nonsense parts, but can be useful to help showcase what we’re aiming for.


