EA - Announcing the SPT Model Web App for AI Governance by Paolo Bova

The Nonlinear Library: EA Forum - Podcast készítő The Nonlinear Fund

Podcast artwork

Kategóriák:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing the SPT Model Web App for AI Governance, published by Paolo Bova on August 4, 2022 on The Effective Altruism Forum. Modeling Cooperation is excited to announce our web app SPT Model—an interactive tool to explore a model of AI competition built in collaboration with Robert Trager. Will AI safety breakthroughs always lead to safer AI systems? Before long, we may be capable of creating AI systems with high levels of general intelligence. Such capable systems may present considerable risks when misaligned. Yet, suppose that thanks to incredible work by the AI safety community, we make a breakthrough which enables us to align many (but not all) AI systems and make them act according to our values at a relatively low performance cost. We also communicate this research to all relevant companies and governments. Everyone knows how to make their AI systems safer. Fast forward several years and we may find that despite these efforts, the AI systems that companies deploy are just as risky as they were before the breakthrough. Our question to the reader is: why did this happen? Announcing the web app One way of looking at this question is to explore the safety-performance tradeoff model (SPT Model) of AI competition created by Robert Trager, Paolo Bova, Nicholas Emery-Xu, Eoghan Stafford, and Allan Dafoe. At Modeling Cooperation, we worked with Robert Trager to implement this model in an interactive web app we call SPT Model. The web app lets you explore a model of a competition to build emerging technologies. The model shows how failure scenarios like the one above can emerge, and there are many more insights contained within. We built this tool as we believe that when trying to understand a model, there is no substitute for exploring it yourself. Our web app lets you set the parameters for two different scenarios and compare the effects graphically. Answering the question Let’s return now to the puzzle we presented above. What obstacles prevent AI safety breakthroughs from improving the safety of AI systems? We will provide an answer below. Before that, you may wish to first experiment with the web app. One phenomenon at play in the puzzle above is “risk compensation”. Risk compensation occurs when making an activity safer motivates people to take more risk. Most times, risk compensation erodes but does not eliminate the benefits of safety technologies (think of seatbelts and bicycle helmets, for example). However, for emerging technologies, such as AI, a key difference is competition. Companies will take on more risk to outcompete their rivals. We visualize these effects of a safety breakthrough in the safety-performance tradeoff plot below. The safety-performance tradeoff plots the maximum levels of performance we can achieve for each level of safety. If companies accept a higher level of risk, then conditional on avoiding a disaster, they can expect to produce a more capable AI system than otherwise and so reap a higher benefit. Such a scenario is analogous to the risk-return tradeoff employed in stock-market investing. It seems reasonable to expect companies and other institutional actors to consider a similar tradeoff when investing in new, potentially risky, strategic technologies. In the puzzle we presented above, the breakthrough means if firms invest in AI systems with the same performance, they will be safer than before (see the upward arrow in Figure 2). However, this breakthrough has also enabled companies to take on even riskier AI systems to outcompete their rivals (the declining arrow in Figure 2). This competition can completely erode the benefits of the safety insight. Of course, this isn’t always true. We now turn to a scenario where AI companies operate at such a high level of performance that attaining higher performance is prohibitively costly ...

Visit the podcast's native language site