Missing AI Safety Goalposts

January 6, 2025

I’ve been in existential safety since I was a teenager in some way, shape or form. It seemed self-evident to me that either nuclear weapons or future AI–both existentially risky technologies–would plausibly bring about our destruction. That would be very bad, therefore we ought to prevent it. I’ve spent my life trying to do just that. 

The rough safety goalposts I’ve held for AI development:

  1. Ensuring we would build a global community of smart, dedicated people tackling existential risks. This was partly done with effective altruism–which I helped with in its early days–but it has since failed in many ways to lead on this front.
  2. Ensuring no one would develop existentially risky frontier AI without extremely stringent internationally-set safeguards. Many of us assumed an Ex Machina, airgapped style of careful development by an individual or small group of innovators. That would have been extremely reckless, but not as much as what has actually happened. OpenAI being founded to explicitly build and “open source” artificial general intelligence (AGI) broke this. This was an unjustified risk and should not have been allowed to happen in a sane society. AI can create weapons of mass destruction or tools for societal control and there’s no known way to prevent all malevolent actors from doing so. Similarly, no one should open source nuclear or bioengineered weapons technology. 
  3. Ensuring we would globally shut down all existentially risky AI development upon the first empirical evidence of AI’s immense capabilities. ChatGPT’s launch demonstrated these growing capabilities, but there was no immediate pause. The Future of Life Institute’s 6-month pause letter was fantastic, but insufficient. In a sane society, world leaders would have immediately endorsed it given the alternative would likely be an AI suicide race.
  4. Ensuring we would globally shut down all existentially risky AI development upon the first empirical evidence of AI’s willingness and ability to hurt and/or deceive humans. Once we began seeing empirical evidence of AI’s harms, it would finally be impossible to ignore the risks as “science fiction”. This should have led to a period of intense global reflection and scrutiny. Cost-benefit analyses needed to be debated in corner of the world and global consensus needed to be reached.  
  5. Ensuring no one would ever create AGI without extremely stringent internationally-set safeguards. This is a self-evident extinction risk for nearly every serious thinker in the space. Many in the space believed humanity should not even contemplate working on AGI until we could provide plausibly airtight arguments for how it could be developed in a provably safe manner. This seemed like the most prudent approach, if you thought in expected value terms and valued all life and not just your own. The claim that you need to build existentially risky technology to then learn how to make it safe was never defensible to me. Similarly, we should not do gain-of-function research on lethal pathogens, even in BSL-4 environments, given our atrocious track record of safety
  6. Ensuring no one would ever create artificial superintelligence (ASI) without extremely stringent internationally-set safeguards. A prematurely created ASI almost certainly would mean near immediate extinction or disempowerment of humanity, so it was hard to imagine people with common worldviews and high levels of psychological wellbeing wanting this. Surprisingly to me, many today seem to embrace this view. 

We’re likely to miss the fifth safety goalpost in a few months or years. We will not likely ever recover from missing that goalpost. It’s not easy to put humanity’s genie back in the lamp when eight billion people want their own personal genie. 

If history is any indication, we will likely soon reach ASI and cause our own extinction. The painful irony is that this is entirely avoidable, if we were more individually and/or collectively rational.

It was always abundantly clear to me that we shouldn’t gamble with the Singularity without a full understanding of ethics, intelligence, and game theory, among many other things. We needed a universally (or perhaps nearly universally) accepted solution to human flourishing, nonhuman flourishing, and international governance before we even seriously attempted to create AGI. 

We don’t have any of that, yet some among us choose to gamble anyway without everyone else’s consent. This is immoral and absolutely unacceptable. I hope it becomes illegal soon, with severe punishments for offenders.