name_here wrote:If you want to keep your AI from going murderous, you're going to have to structure the code that determines what modifications the AI chooses to make so that it will not choose to modify itself to hurt people.
Humans do not have unlimited self-modification. No matter how hard I think at it, I'm still not going to be able to make myself want to punch a baby in the face.
There are parts of myself I modify by being 'active,' but those parts are not all of them. And this is represented in current AI; you have routines, which may be totally static and unchanging, but some knowledge base that gets updated and modified to correspond to observations about things. For example, a chess AI isn't modifying the rules which govern chess as it goes, and it isn't changing its mind that "winning" means "losing." The rules and win conditions are immutable to any chess AI.
And a lot of this is going to be true of any AI you program; it doesn't have to be absolutely self-modifying. Because no other intelligence (including humans) is absolutely self-modifying.
Prak wrote:"If you harm humans, bad things happen to you as a consequence, such as imprisonment. X, Y and Z things happen in prison. These are unpleasant, at best. If you harm someone bad enough, you will be permanently deleted."
Quite a few problems here and in general:
1) People are already taught like that, and they still do bad things. Sometimes because they don't think they'll be caught. (Mentioned already, I think.)
2) Alternatively, they'll do it because at the time they valued the illegal action over the consequence; i.e., killing the man sleeping with your wife in a fit of rage. That is a case where the weight "I want to murder the guy sleeping with my wife" outweighed the weight "I don't want to be in jail." That could happen with AI's; the weights for the desirability of circumstances get added up, and the AI decides murdering you in the face is better than not dying.
3) In order to get the AI to care about any of that, you have to program the AI to care about that. The sense of self-preservation will have to be encoded behavior. And you have to make sure that the weights that lead to self-preservation are higher than the weights that lead to anything else. And then you have to hope that you don't end up with an AI that is crippling avoidant of danger because your weights were too high.
hyzmarca wrote:But loyalty, duty, and honor work.
The trade-off here is autonomy. It's almost certain that you can program obedience to an individual, institution, or ideal; the question is, how self-directed do you want the AI to be after that? If the answer is not very self-directed, then you barely have anything more than a slightly more sophisticated gun platform. You don't need an 'AI' for that. Not an intelligent one, just a slightly-more-than-modern day drone one.
And if you want self-direction, then it's going to be trying to evaluate what's best for the individual/institution/ideal to which it is obedient, and when its opinion differs from the rest, you might get a conflict and then emergent behavior happens, and your absolute obedience is in jeopardy.