LukeProg wrote:The difficulty with this approach is that we don't know yet what a Friendly AI will do. We don't know what 'Friendliness' is, because we haven't yet solved metaethics and normative ethics and cognitive neuroscience.
On cognitive neuroscience that's clear, but on metaethics and normative ethics it isn't. Given that much of what modern ethicists propose is very similar to what Kant and Bentham were saying a couple of centuries ago, and some of it still basically mapping onto Aristotle and Epicurus, it seems quite plausible that ethics
has been solved. If so, the primary challenge of whoever has it right is persuading other people to accept its veracity rather than to find out more about it themselves and the primary challenge of everyone else is to recognise what causes the heavy bias away from the correct view in most people and adjust for it in themselves.
This relates to my own problem with SIAI/the LW community, in that they seem to present a paradox which, if it were coherent, would be basically unsolvable. Starting from views something like this -
1) FAI should have values which are commensurate with a perfectly ethical person’s views'.
2) Ethics is solvable. (where ‘solving’ equals something like ‘becoming able to derive the views of a perfectly ethical person’)
3) Ethics is unsolved.
- the LW community seem to derive the view, roughly, that its goal is to solve ethics in time/well enough that we can derive ultimate values, programme them into the first AI, and still have time for tea. Therefore they put a lot of effort into step one.
My first problem with this is partly that there’s no real evidence for 3. The argument for it seems to be that lots of people still disagree on it, but that’s hardly telling. We know we haven’t solved eg neuroscience because we can’t build a human brain, but just as there’s no test for the success of non-natural philosophy (its perennial bane?), there’s also no test for its failure. This means that, if ethics
were solved, you’d expect to see a world much like today’s one, in which various intelligent humans still squabble over it and continue to burn resources looking for a solution that resonates with everyone. And if it isn’t solved, there’s no particular reason to suspect that anyone will be able to tell the difference when it is.
If I’m right, then, whether or not ethics is solved, the instructions we programme into an AI are unlikely to ever be universally agreed on, even among LW’s best and brightest. So how will we know who to trust with the controls?
My second problem is that a lot of the LW commentary seems to miss the point of ethics in the sense relevant to FAI, which gives us motivation and perhaps reason to make one choice rather than another from moment to moment, perhaps differently to the motivation we would have had (and therefore the choice we would have made) if we hadn’t considered it - it is not a study of how people react to situations that someone else designates morally relevant.
The former is what we need to solve to make FAI, the latter is, possible instrumental value aside, completely irrelevant. I recently saw a LW post, which I can’t find now, claiming that giving an AI the instruction to maximise happiness is naive because humans don’t maximise happiness. This seems like a really basic category error - perhaps there are reasons not to set that as your instruction, but that post gave me no reason to suspect their existence.
The last problem is that LW/SIAI seem to be working from the semi-suppressed premise that FAI should be benign towards humans. As Mike pointed out above, this is obviously speciesist, but it’s also at odds with the three premises above - if there is an ethic which is correct and unknown, we clearly can’t conclude that it contains a prescription for human preservation.
From what I’ve seen of LW posters comments, it’s really this suppressed premise that generates the hostility towards a hedonistic utilitarian AI (UAI) - no-one really thinks that a maximally efficient happiness generator would much resemble a human (more to the point, it's hard to imagine a maximum
anything generator that would resemble a human), so we suspect that a UAI would quickly go about using our matter and energy to create some sort of utilitronium shockwave without paying much attention to what happened to us in the process.
Even to me, death by utilitronium shockwave is a scary thought, which I intuitively recoil from - but I can’t think of any reason not to support it, and I’ve never seen a LWer even
try to criticise it directly.
This leads me to the view that the LW mission is fundamentally selfish, more about self-preservation than ethics. But again, we allegedly don’t know what ethics is and have little reason to think it includes self-preservation (whatever that even means in a universe with no privileged viewpoints). So now we’re talking about creating a near-perfectly logical entity that has (or is very likely to have) a basic contradiction in its programmed instructions. I can’t see any reason to support such a cause, or to have much hope for its success if I did…