How are we actually “aligning AI with human values”?
So many great quotes here! I've been seeing so many different opinions on 'AI alignment' that it has been difficult to get an accurate picture of what's going on. Really appreciate this overview.
A nice essay!
Unfortunately, I don't see how alignment can be solved at all, as long as there is more than one human who desires power. People are fundamentally not aligned and will use AI to further their agenda (not necessarily with bad intent). It is a little surprising that a philosopher would call it "a technical problem", while it seems to be a part of the human condition.
The most we can hope for is risk mitigation, and for that we need much better fundamental understanding of the principles of statistical learning than what we currently have.
> It’s just a bit too coincidental to me that the major alignment research directions just so happen to be incredibly well-designed to building better products.
Agreed, this is also a big worry of mine. See also this paper critical of RLHF (slight self-plug): https://arxiv.org/pdf/2307.15217.pdf
> the pathways to AI x-risk ultimately require a society where relying on — and trusting — algorithms for making consequential decisions is not only commonplace, but encouraged and incentivized…All of the AI x-risk scenarios involve a world where we have decided to abdicate responsibility to an algorithm.
I think this is notably missing the argument from people worried about rogue AI. The worry here is about break out of an unaligned AI, without us intentionally ceding power to it. Not saying that’s likely or not – it’s just missing an argument I often hear. Maybe even doing some large scale pretraining of an AI model with the idea of it being useful downstream for replacing human labor counts as abdicating responsibility? Or there’s the case where some tech person with developer access to GPT-7 decides to create the next autoGPT and hook it up to the internet and then maybe that leads to x-risk, which isn’t really abdication of any significant societal responsibility to AI so much as OpenAI selling a useful email/code-first-draft machine and then it going poorly for us when made into an agent.
> There is a rich and nuanced discussion to be had about when and whether algorithms can be used to improve human decision-making… And there is a large community of activists, academics, and community organizers who have been pushing this conversation for years.
True and important