Risk and Safety#


  • Safety is a core concern for model builders there days

    • As much as any other technical concern such as scalability

    • It’s highlighted in nearly every major paper and product announcement

  • Safety is the both the concern and responsibility of many parties

    • Model builders, government, product developers, users

    • The lines are still being drawn though

  • AI safety isn’t as defined as other fields yet

    • It is in quite active development and you can get involved.

    • You can get involved

The discussion of AI is moving quickly, quite confusing, and coming from many different points of interest so let’s ignore AI systems for a second and just think about bicycle safety.

A primer with bicycles#

You hear about an accident on a bicycle where someone got hurt and are asked to provide an opinion bicycle safety. What questions come to mind?

  1. Was the bicycle constructed safely? Were all the sharp edges ground down? Was the material the bike created from free of defects?

  2. Where was the bike being used? In a city or on a mountain? Different bicycles are designed for different environments, was this the wrong one for where it was being used?. [1]

  3. Did the rider wear a helmet? Were they acting responsibly?

  4. Did the government set up safe local infrastructure or close poorly maintained trails?

  5. Do we have effective systems? Legal framework for disputes, engineering framework for failure testing, certifications for helmet

  6. Do we know what could possibly go wrong and make our own assessments of risk?

  7. Does the bicycle in question one day spontaneously turn into a high powered motorcycle without the user knowing?

  8. If I park my bicycle and it gets stolen who’s fault is it?

  9. If I park my bicycle and it gets stolen, and I didn’t lock it, who’s fault is it?

So many considerations#

For the simple question for bicycle safety there are so many angles to approach the problem, especially if consider this from a holistic perspective. Even If we ask you, personally, to assess your risk tolerance for riding a bike, relevant questions would be

  • Do I individually have the information to assess my hazard?

  • Who is responsible for my safety, the manufacturer, me, or the government?

  • Who is providing the bike? Do I own it or does someone give me a new one of a different design each day?

The problem simplifies some, but not a lot. Even if today’s world there are ongoing challenges with bicycle safety, such as the responsiblity cities bear to design safe bike paths, certifications for helmet impact ratings, or manufactururing recalls if bicycles are constructed poorly.,

AI safety#

All these same questions can be applied to AI Safety, The difference though with AI the frameworks, assessments, responsibilitise and social expectations are being defined at the same time. Even the language is imprecise. Insofar this being a guide book to get mental model of the topic and various viewpoints, I’ve listed my recommendations below, though let me provide two caveats first.

  1. This was written on 2023-09-30. Again this is a fast moving area so depending on when you’re reading it the ideas may have shifted. In fact two of the resources I list below were published in the last month, one in the last week.

  2. Currently this list comes from very USA/UK centrics perspectives (including myself). I’m looking for authoritative voices for folks based in other communities. I suggest you do the same. If you have

With that here’s my ranked order of readings.

The Coming Wave#

This is a recently released book from Mustafa Suleyman, who is a founder of DeepMind and and now another AI startup Inflection.ai. The book overall provides an insiders perspective of Modern AI, and motivates how synthetic biology and AI will come together in a “dual wave” of change. A word he uses frequently in this book containment. Most would most associate this in the biological space for pathogen control, for instance containment of CovID 19. In this book he refers to containing the dual wave mentioned above. Section 5 specifically outlines a 9 component containment plan for these dual technologies, the idea being that containment essentialy being the safest path for these waves. He also highlights how safety typically gets less than 1% of the total investment in AI and synthetic biology, and how he hopes more folks will move into this subeset of the field.

Though book is a longer read I suggest it as it provides a holistic view of what humanity as a whole is likely to see, and approaches the problem from the widest perspective. An issue I have with this book is it’s largely self serving in sections, such as descriptions of the author’s own accomplishments. While the plan presented seems alturistic and the dual waves arguments is compelling, the self aggrandizing cast some doubt on the larger message. Nontheless I still reccomend it as a starting point, particularly to motivate why a focus on safety is necessary.

Concrete Problems in AI Safety#

By modern AI standards this paper published in 2016 is ancient, written before the Transformer was invented. I suggest it for three reasons

  1. Written LLM” it takes a broader view of AI and the challenges in designing a safe cleaning robot.

  2. It’s written in an approachable way by a mix of companies and universities.

  3. One of its primary authors is Dario Amodei who is now the founder of Anthropic, a company where User safety is core to it’s mission.

Towards Comprehensive Risk Assessments and Assurances of AI Based Systems#

After the GenAI explosion the discussion about AI Safety similarly has expanded, and importantly claims about like AI Alignment being safety alignment are being made without abandon.

This paper really forces you to think about the terminology used in AI Safety, such has how AI alignment and safety assessments are not the same thing, and risks and hazards have their own meaning. It also is written by a company that does not itself build any AI systems, but is quite notable in the cybersecurity space, providing an alternative perspective from an organization with a vested interest in AI proliferation.

This particular paper also contains a safety assessment framework, motivating why other frameworks fall short. I suggest reading this paper before reading other modern papers because with this perspective you’ll be able to be more critical of other papers written about specific facets of AI safety. This guide book also contains a deep dive of this paper.

Blueprint for an AI Bill of Rights#

The USA White House published this document in October 2022, coincidentally before the launch of ChatGPT in November 2022. I find myself rarely saying this but this government document is quite well designed, visually appealing, and quite timely in its arrival.

The document applies to any automated system, which includes just “AI”, but “traditional ML, or even plain old computer programs hooked up to data sources.

The notable aspect here is the publisher, the highest office of the US executive branch, and its focus on citizens.

The document itself is laid out quite practically each section

  • Listing a principle and motivating why it’s important

  • Detailing the expectation of the automated system

  • Providing adice on how the principles can be used in practice

Because this is coming from the executive branch some of its advice is paired fairly heavy weight directives, such as Executive Orders requiring federal agencies adhere to certain requrements when designing systems. Even if you’re not running a country, these principles frame safety challenges from a user perspective. This differs from most other papers here targeting fellow researchers or AI developers

Anthropic’s Responsible Scaling Policy#

Fast forward 7 years, from Dario’s paper above, through Transformers, ChatGPT, and every other paper in this list and we get this This is a self accountability document Anthropic published as a company where they detail how they think about scaling their AI models responsibly. This paper is published around the same time as when Amazon announced a $4 billion dollar investment in the company, definitely scaling the amount of capital and compute available to this organization for subsequent model scaling.

This document outlines Anthropic’s view on AI Safety levels, categorizing them between ASL1 and ASL4. It details how they came up with 4 levels, the harms a model of that level could produce, and how they intend to assess if a model has reached a certain level, or at least how they think they will.

I find this paper notable in how speculative it is. Most research papers tend to come to firm conclusion, with some novelty clearly stated as a truth. This document is notable in how much Anthropic states no one has any idea what might actually happen, but this is how we notice thigns are going off the rails, and how we might contain it.

This guide provides insight into how leading companies approach AI safety across dimensions like security, capabilities, and risk evaluation including internal and external theft, capability improvements and risks associated, and evauations across the board.


In this chapter the primary resources are directly linked in the headers. Here are some additional ones.

  • NIST AI Risk Assesment Framework - Similar to the Trail of Bits Risk assessment framework this is another one provided by NIST. It’s interesting to compare the two to see the differences in motivation and suggested action between an independent consultancy and a government institution.