The Alignment Problem

By Kenneth Browder

Although we can be tempted to see a computer writing sentences as a thinking being, we have no guarantee that it is more than a souped-up autocorrect.

In November 2022, OpenAI released the latest application of their GPT-3 language model, ChatGPT. Because of its easy-to-use interface and conversational, almost humanlike, manner of speaking, ChatGPT has become quite popular and has thus generated quite a bit of interest in Computer Science, especially in the field of artificial intelligence. Many concerns have been raised about the moral and ethical issues with this system and its applications, but one of the most important from the perspective of Computer Science is the alignment problem in artificial intelligence.

The alignment problem refers to the challenge of ensuring that an artificial intelligence system, such as ChatGPT, continues to act in accordance with the values and objectives of its human creators as it becomes more powerful and sophisticated. This is crucial because if an AI system’s objectives diverge from those of its human creators, it could potentially cause harm or act in ways that are counterproductive. The alignment problem is a complex issue that requires careful consideration and collaboration between experts in various fields, including computer science, philosophy, and ethics. As AI technology continues to advance at an unprecedented rate, addressing the alignment problem will be essential in ensuring the safe and beneficial development of AI systems.

Encoding goals

Although in the abstract sense, the alignment problem is both straightforward and intuitive (just make sure an AI system does what it’s designed to do), it’s a little bit more difficult to consider in practice, because of a number of factors. Even if an artificially intelligent system is designed perfectly so that for any input data it behaves exactly according to the intention of its designer, it is still in danger of being misaligned. There is still the possibility of error if the input data is faulty. For example, if a Tesla in autonomous vehicle mode misjudges graffiti for a person for just an instant (or misjudges a person as a pothole), it can correctly make the decision to swerve around or come to a stop, but because the input data is faulty, it is creating the potential for more harm (whiplash for the passengers).

The problem becomes even worse when an AI system is created along the modern mode of machine learning. In this case, rather than being able to encode a specific goal, or at least a set of principles that the AI should follow under different conditions, the AI makes decisions (classifies, or calculates output) based on a set of examples on which it is trained. A model like GPT is trained on a huge corpus of text, and within this corpus of text, it is given the task of predicting the next word in a sentence based on the sequence of words that comes before it. For example, when given a text that includes a phrase like “life, liberty, and the pursuit of happiness,” GPT might learn to predict “liberty” after “life”, “and” after “life, liberty,” and so on. The “G” in GPT stands for Generative, referring to GPT’s ability to repeat this process to create longer, understandable pieces of text. This makes the alignment problem more difficult because although a model can be rewarded or penalized based on how it behaves in different situations, we can’t ensure that it is truly “understanding” information and learning how to apply its learning to new scenarios.

Neural networks

Another issue with machine learning is that it can be difficult to understand what a machine learning system is doing. Most complex machine learning systems are built using the model of the neural network, which is essentially a complex mathematical process that is modeled after the human mind in that it is composed of neurons, simpler mathematical processes that are composed together in the network. GPT-3, the iteration of GPT on which ChatGPT is built, has hundreds of billions of neurons. With so many neurons that are all interconnected, it is extremely difficult to interpret how a neural network is achieving its goal. This is referred to as the problem of interpretability and is one of the more interesting upcoming fields of research in Computer Science.

What all of this means is that it is very difficult to gain a firm grasp on the alignment of an AI system like ChatGPT. The only way we can really measure anything about how it “thinks” is by observing how it behaves, and the alignment of the system can more or less only be understood by observing it in different scenarios. Although the AI is trained with a huge corpus of information, and trained to respond plausibly in many different scenarios, there is no solid rule that is being enforced on the system. As ChatGPT stands currently, it mimics understanding of text, but it is really just throwing words together to mirror the structure of text that it already has. One of the clearest examples of GPT’s lack of understanding is its inability to do simple math with decimals, a skill many children learn before exiting elementary school. Although we can be tempted to see a computer writing sentences as a thinking being, we have no guarantee that it is more than a souped-up autocorrect.

Kenneth Browder is a junior at Grove City College, double majoring in computer science and mathematics. Kenneth is the president of the Grove City chapter of the Association for Computing Machinery, a leadership team member of Impact outdoor ministries, is involved with multiple computer science research projects on campus, and is a frequent participant in intramural sports. During the summer of 2022, Ken interned at Lockheed Martin RMS, a private defense company, and at the Nuclear Regulatory Commission, an agency of the United States government tasked with protecting public health and safety related to nuclear energy.

Kenneth has trouble deciding where he’s from, having spent the most time living in Oregon, but has also lived half of his life overseas, including in Kazakhstan, the Philippines, and Jamaica. Kenneth values the nuanced perspective on culture he has gained through this exposure and wishes to use the skill of finding common ground to serve many through his work in computer science.