Strongly Bounded AI: Definitions and Strategic Implications
Strongly Bounded AI: Definitions and Strategic Implications
Section titled “Strongly Bounded AI: Definitions and Strategic Implications”Ozzie Gooen - April 14 2025, Draft. Quick post for the EA Forum / LessWrong.
Epistemic status: Exploratory concept with moderate confidenceThis represents my current thinking on a potentially valuable framing for AI safety, drawing on established engineering principles. In the last few years discussion around these topics has exploded - I wouldn’t be surprised if there were great existing works that I don’t know about and can’t quickly find.
—
I feel the AI safety conversation lacks terminology for limited, safe, and useful AI systems that address takeover risks rather than misuse by humans. This concept goes beyond alignment to include capability restrictions, reliance on established technologies, and intelligence limitations for predictability.On the Terminology
One thing I feel is missing from AI safety conversations is strong and versatile terminology for limited, safe, and useful AI systems.
This concept isn’t just about alignment. It’s also about:
-
Substantial capability restrictions (using older models, strong compute limits)
-
Exclusive use of highly-tested and well-established technologies
-
Intelligence limitations that make behavior highly predictable I think some potential names for these systems could be:
-
Strongly Bounded AI
-
Highly-Reliable AI
-
Boring AI For the rest of this document, I’ll go with “Strongly Bounded AI.””Strongly Bounded AIs” are not necessarily ones with substantial alignment or safeguards - but rather, AIs we can reason to not represent severe AI takeover risks. This means they can either be very weak systems (like many of the systems of today) without safeguards, or stronger systems with a much greater degree of safeguards.
We already have somewhat understood areas of “Control,” “Scalable Oversight,” etc., which approach similar ideas. But I believe these systems typically investigate “specific AI layers directly overseeing risky AIs” rather than broader AI services/agents that are doing more regular duties in the world.
We also have the fields of “Comprehensive AI Services” (see Drexler) or Guaranteed Safe AI (See Davidad). These are closer to the idea I’m proposing, but are more specific. I think neither is necessary for “Strongly Bounded AI”.
A “Bounded” AI is also arguably different from an aligned or a safe AI. Both “aligned” and “safe” at this point have fairly broad and imprecise definitions, in comparison. I’d also flag that “Boundedness” is really about accident risks, not mistake risks. A bad actor could use a bounded system to do significant harm. This is akin to the importance of reliability in military technology - such reliability is useful for the military, but obviously can still be used destructively if desired.
Engineering Culture and Established Patterns
Section titled “Engineering Culture and Established Patterns”In tech companies, there’s an established virtue of using “boring tech” – Postgres, SQL, REST, etc. There’s always something fancier trending on Hacker News, but these cutting-edge systems come with major uncertainties and liabilities. Typically, new programmers enthusiastically advocate for the latest JavaScript framework while experienced engineers spend time arguing for proven technologies.
Engineering also features many well-understood and distinct subfields for highly-reliable systems: “Fault-Tolerant Systems,” “Ultra-Reliable Systems,” “High-Assurance Systems,” “Formal Verification,” etc. I believe these concepts effectively carve out market positions for unusually secure technology. Major software products like Microsoft Windows or Google Docs don’t advertise themselves as “Fault-Tolerant Systems” or “Formally Verified” – these terms are reserved for genuinely reliability-focused systems. While these terms can function as marketing buzzwords, I think they still help establish meaningful categories.
Current State and Future Potential
Section titled “Current State and Future Potential”I think most AI agents today are weak and highly limited. I don’t expect 99% of them could cause catastrophic damage (say, $100B in damages) anytime soon – the technology is simply too weak and expensive. I’d feel fairly comfortable using many current systems without worrying about major alignment risks.
My strong expectation is that a tremendous amount of good and global stability could come from developing “Strongly Bounded AIs.” And perhaps most importantly, I think the game plan should entail using “Strongly Bounded AIs” to help us reason about, develop, and control cutting-edge AI technologies.
I believe the real AI takeover threat comes from frontier AI agents. I think the capability gap between frontier models and our controlled AI systems represents the potential damage frontier AI could cause. If it’s “a powerful malicious frontier AI agent” versus humans alone, there’s a massive potential for takeover. If it’s the same agent versus “robust, reliable and controllable AIs,” I’d feel much better about our defensive position.
Applications and Evolution
Section titled “Applications and Evolution”Over time, I think we’ll develop better methods for creating “Strongly Bounded AIs” that push the frontier of effectiveness while maintaining safety. One of the main things we should probably do with cutting-edge AIs (to the extent we use them) is to help us create better “Strongly Bounded AIs.”
What could “Strongly Bounded AIs” do? In my view:
- Oversee personal data on devices
- Make strategic recommendations for organizations
- Secure key resources beyond traditional access management (e.g., AI monitoring bank withdrawals for signs of duress)
- Handle bounded high-reliability operations in medicine and defense
- Assist auditors examining potentially dangerous organizations/systems
Addressing Common Questions
Section titled “Addressing Common Questions”“Doesn’t delegating to AI systems increase takeover risks?” I think this is an oversimplified view and would often argue the opposite. I’d expect that “Strongly Bounded AIs” could make the world much more secure against frontier adversarial AIs, not less. But of course, one would need to implement smart tradeoffs.
“Isn’t opposing frontier AI while supporting limited AI confusing?” I think engineering has a long history of distinguishing between safe and unsafe technologies. I don’t think the difference between AI systems is unusually strange compared to previous work in reliability engineering and computer security.”Won’t this term become meaningless marketing?” I’m not that cynical. I think safety-minded people should develop clear standards for safe systems, then work to form the language. We already have some terminology for highly-trustworthy technology. Even when one term gets semantically diluted due to marketing, others can emerge to take its place.
Ultimately, we want systems where:
- We are strongly able to predict their behavior patterns. We can have assurances that the downside risks to using them are minimal.
Other unique bits
Section titled “Other unique bits”A few points from a rougher second pass of this draft, not covered above:
- It depends on trusting those in power. There are, of course, many ways one could mess up such a strategy — so promoting it depends partly on how reasonable one expects those in power to be.
- Framed as objections: “Isn’t this just Drexler’s Comprehensive AI Services?” and “Isn’t this just Davidad’s Guaranteed Safe AI?” — both are close, but I’d treat each as one possible (and not necessary) route to a Strongly Bounded AI rather than the whole idea.
- Strongly Bounded AIs could also act as assistants to auditors of dangerous organizations or systems (cf. “superhuman governance”).