jessica dai on Reboot

15 Comments

Aug 21, 2023

that's a really nice survey!! really like the way s3.2.1 lays it out.

on rogue AI... yeah I kind of didn't want to get into the discussion about AI agency/ intrinsic motivations/etc here, but tbh i've always really struggled imagining the mechanism of that risk pathway and felt it the least substantiated. Like at some point someone gave it instructions with the goal of having it do something right?

Expand full comment

Reply (1)

Peter Hase

Aug 22, 2023

Ok I looked back at some writing on the topic, and (1) people often do imagine humans giving power to AI systems before the AI do bad things to humans, but (2) the "instructions" still remain well-intentioned, with the harms arising from misaligned goals in the AI.

See Sec. 5 of https://arxiv.org/pdf/2306.12001.pdf, particularly the story on pg 41, and Sec. 4.3 of https://arxiv.org/pdf/2209.00626.pdf.

So I think the main concern is that instrumental goals / proxy gaming / goal drift are all likely to mean that an AI has some goals that diverge from human intentions, no matter what the instructions are. Then you add in some deception, and an AI which humans trust enough to cede some power to ends up turning on humans.

Expand full comment

Like (1)

Reply (1)