7.4

Principles of Operant Conditioning

Thousands of operant-conditioning studies have been conducted, many using animals. A favorite experimental tool is the Skinner box, a chamber equipped with a device that delivers a reinforcer, usually food, when an animal makes a desired response, or a punisher, such as a brief shock, when the animal makes an undesired response (see Figure7.4). In modern versions, a computer records responses and charts the rate of responding and cumulative responses across time.

Figure 7.4

The Skinner Box

When a rat in a Skinner box presses a bar, a food pellet or drop of water is automatically released. The photo shows Skinner training one of his subjects.

Early in his career, Skinner (1938) used the Skinner box for a classic demonstration of operant conditioning. A rat that had previously learned to eat from the pellet-releasing device was placed in the box. The animal proceeded to scurry about the box, sniffing here and there, and randomly touching parts of the floor and walls. Quite by accident, it happened to press a lever mounted on one wall, and immediately a pellet of tasty rat food fell into the food dish. The rat continued its movements and again happened to press the bar, causing another pellet to fall into the dish. With additional repetitions of bar-pressing followed by food, the animal began to behave less randomly and to press the bar more consistently. Eventually, Skinner had the rat pressing the bar as fast as it could.

The Importance of Responses

Operant conditioning shares many terms in common with classical conditioning. Classical conditioning, however, places a premium on the association between two stimuli, whereas operant conditioning focuses on how responses to stimuli get reinforced. The centrality of responses is illustrated in the operant conditioning principles we now turn to.

Extinction In operant conditioning, as in classical conditioning, extinction is a procedure that causes a previously learned response to stop. In operant conditioning, however, extinction takes place when the reinforcer that maintained the response is withheld or is no longer available. At first, there may be a spurt of responding, but then the responses gradually taper off and eventually cease. Suppose you put a coin in a vending machine and get nothing back. You may throw in another coin, or perhaps even two, but then you will probably stop trying. The next day, you may put in yet another coin, an example of spontaneous recovery. Eventually, however, you will give up on that machine. Your response will have been extinguished.

Stimulus Generalization and Discrimination In operant conditioning, as in classical conditioning, stimulus generalization may occur. That is, responses may generalize to stimuli that were not present during the original learning situation but resemble the original stimuli in some way. For example, a pigeon that has been trained to peck at a picture of a circle may also peck at a slightly oval figure. But if you wanted to train the bird to discriminate between the two shapes, you would present both the circle and the oval, giving reinforcers whenever the bird pecked at the circle and withholding reinforcers when it pecked at the oval. Eventually, stimulus discrimination would occur. Pigeons, in fact, have learned to make some extraordinary discriminations. They have learned to discriminate between two paintings by different artists, such as Vincent van Gogh and Marc Chagall (Watanabe, 2001). And then, when presented with a new pair of paintings by those same two artists, they have been able to tell the difference between them! Pigeons have even learned to discriminate beautiful paintings from ugly ones, a discrimination similar to what human beings would make (Watanabe, 2010).

Sometimes an animal or person learns to respond to a stimulus only when some other stimulus, called a discriminative stimulus, is present. The discriminative stimulus signals whether a response, if made, will pay off. In a Skinner box containing a pigeon, a light may serve as a discriminative stimulus for pecking at a circle. When the light is on, pecking brings a reward; when it is off, pecking is futile. Human behavior is controlled by many discriminative stimuli, both verbal (“Store hours are 9 to 5”) and nonverbal (traffic lights, doorbells, the ring of your cell phone, other people's facial expressions). Learning to respond correctly when such stimuli are present allows us to get through the day efficiently and to get along with others.

Learning on Schedule When a response is first acquired, learning is usually most rapid if the response is reinforced each time it occurs; this procedure is called continuous reinforcement. However, once a response has become reliable, it will be more resistant to extinction if it is rewarded on an intermittent (partial) schedule of reinforcement, which involves reinforcing only some responses, not all of them. Skinner (1956) happened on this fact when he ran short of food pellets for his rats and was forced to deliver reinforcers less often. (Not all scientific discoveries are planned.) On intermittent schedules, a reinforcer is delivered only after a certain number of responses occur or after a certain amount of time has passed since a response was last reinforced; these patterns affect the rate, form, and timing of behavior. (The details are beyond the scope of this book.)

Intermittent reinforcement helps explain why people often get attached to “lucky” hats, charms, and rituals. A batter pulls his earlobe, gets a home run, and from then on always pulls his earlobe before each pitch. A student takes an exam with a purple pen and gets an A, and from then on will not take an exam without a purple pen. Such rituals persist because sometimes they are followed, purely coincidentally, by a reinforcer—a home run, a good grade—and so they become resistant to extinction.

Skinner (1948) once demonstrated this phenomenon by creating eight “superstitious” pigeons in his laboratory. He rigged the pigeons' cages so that food was delivered every 15 seconds, even if the birds didn't lift a feather. Pigeons are often in motion, so when the food came, each animal was likely to be doing something. That something was then reinforced by delivery of the food. The behavior, of course, was reinforced entirely by chance, but it still became more likely to occur and thus to be reinforced again. Within a short time, six of the pigeons were practicing some sort of consistent ritual: turning in counterclockwise circles, bobbing their heads up and down, or swinging their heads to and fro. None of these activities had the least effect on the delivery of the reinforcer; the birds were behaving “superstitiously,” as if they thought their movements were responsible for bringing the food.

Now listen up, because here comes one of the most useful things to know about operant conditioning: If you want a response to persist after it has been learned, you should reinforce it intermittently, not continuously. If you are giving Harry, your hamster, a treat every time he pushes a ball with his nose, and then you suddenly stop the reinforcement, Harry will soon stop pushing that ball. Because the change in reinforcement is large, from continuous to none at all, Harry will easily discern the change. But if you have been reinforcing Harry's behavior only every so often, the change will not be so dramatic, and your hungry hamster will keep responding for quite a while. Pigeons, rats, and people on intermittent schedules of reinforcement have responded in the laboratory thousands of times without reinforcement before throwing in the towel, especially when the timing of the reinforcer varies. Animals will sometimes work so hard for an unpredictable, infrequent bit of food that the energy they expend is greater than that gained from the reward; theoretically, they could actually work themselves to death.

It follows that if you want to get rid of a response, whether it's your own or someone else's, you should be careful not to reinforce it intermittently. If you are going to extinguish undesirable behavior by ignoring it—a child's tantrums, a friend's midnight phone calls, a parent's unwanted advice—you must be absolutely consistent in withholding reinforcement (your attention). Otherwise, the other person will learn that if he or she keeps up the screaming, calling, or advice-giving long enough, it will eventually be rewarded. From a behavioral point of view, one of the most common errors people make is to reward intermittently the very responses that they would like to eliminate.

Shaping For a response to be reinforced, it must first occur. But suppose you want to train cows to milk themselves, a child to use a knife and fork properly, or a friend to play terrific tennis. Such behaviors, and most others in everyday life, have almost no probability of appearing spontaneously. You could grow old and gray waiting for them to occur so that you could reinforce them. The operant solution is a procedure called shaping.

Behavioral techniques such as shaping have many useful applications.

In shaping, you start by reinforcing a tendency in the right direction, and then you gradually require responses that are more and more similar to the final desired response. The responses that you reinforce on the way to the final one are called successive approximations. Take the problem of teaching cows to milk themselves. How can cows possibly do that when they have no hands? Ah, but cows can be trained to use a milking robot. In several countries, psychologists have done just that (Stiles, Murray, & Kentish-Barnes, 2011). First, they give the cow crushed barley (the cow equivalent of a chocolate treat) for simply standing on a platform connected to the robot. After that response is established, they give her barley for turning her body toward the spot where the robot attaches the milking cups. After that, they reward her for being in the exact spot the robot requires for attaching the cups, and so on until the cow finally learns to milk herself. The key is that as each approximation is achieved, the next one becomes more likely, making it available for reinforcement. Cows allowed to milk themselves do so three or four times a day instead of the traditional twice a day, and show fewer signs of stress than other cows. Farmers show less stress too because they no longer have to get up at 5:00 A.M. for the early-morning milking!

Using shaping and other techniques, Skinner was able to train pigeons to play table tennis with their beaks and to “bowl” in a miniature alley, complete with a wooden ball and tiny bowling pins. (Skinner had a great sense of humor.) Today, animal trainers routinely use shaping to teach animals their parts in movies and TV shows, and to act as the “eyes” of the blind and as the “limbs” of people with spinal cord injuries. These talented animal companions learn to turn on light switches, open refrigerator doors, and reach for boxes on shelves.

Biological Limits on Learning All principles of operant conditioning, like those of classical conditioning, are limited by an animal's genetic dispositions and physical characteristics. If you try to teach a fish to dance the samba, you're going to get pretty frustrated (and wear out the fish). Operant-conditioning procedures always work best when they capitalize on inborn tendencies.

Decades ago, two psychologists who became animal trainers, Keller and Marian Breland (1961), learned what happens when you ignore biological constraints on learning. They found that their animals were having trouble learning tasks that should have been easy. One animal, a pig, was supposed to drop large wooden coins in a box. Instead, the animal would drop the coin, push at it with its snout, throw it in the air, and push at it some more. This odd behavior actually delayed delivery of the reinforcer (food, which is very reinforcing to a pig), so it was hard to explain in terms of operant principles. The Brelands finally realized that the pig's rooting instinct—using its snout to uncover and dig up edible roots—was keeping it from learning the task. They called such a reversion to instinctive behavior instinctive drift.

Terms Used in Operant Conditioning

In human beings, too, operant learning is affected by genetics, biology, and the evolutionary history of our species. Human children are biologically disposed to learn language, and they may be disposed to learn some arithmetic operations as well. Furthermore, temperaments and other inborn dispositions may affect how a person responds to reinforcers and punishments. It will be easier to shape belly-dancing behavior if a person is temperamentally disposed to be outgoing and extroverted than if the person is by nature shy.

Skinner: The Man and the Myth

Because of his groundbreaking work on operant conditioning, B. F. Skinner is one of the best known of American psychologists. He is also one of the most misunderstood. Many people (even some psychologists) think that Skinner denied the existence of human consciousness and the value of studying it. In reality, Skinner (1972, 1990) maintained that private internal events—what we call perceptions, emotions, and thoughts—are as real as any others, and we can study them by examining our own sensory responses, the verbal reports of others, and the conditions under which such events occur. But he insisted that thoughts and feelings cannot explain behavior. These components of consciousness, he said, are themselves simply behaviors that occur because of reinforcement and punishment.

Skinner aroused strong passions in both his supporters and his detractors. Perhaps the issue that most provoked and angered people was his insistence that free will is an illusion. In contrast to humanist and some religious doctrines that human beings have the power to shape their own destinies, his philosophy promoted the determinist view that our actions are determined by our environments and our genetic heritage.

Because Skinner thought the environment should be manipulated to alter behavior, some critics have portrayed him as cold-blooded. One famous controversy regarding Skinner occurred when he invented an enclosed “living space,” the Air Crib, for his younger daughter Deborah when she was an infant. This “baby box,” as it came to be known, had temperature and humidity controls to eliminate the usual discomforts that babies suffer: heat, cold, wetness, and confinement by blankets and clothing. Skinner believed that to reduce a baby's cries of discomfort and make infant care easier for the parents, you should fix the environment. But people imagined, incorrectly, that the Skinners were leaving their child in the baby box all the time without cuddling and holding her, and rumors circulated for years (and still do from time to time) that she had sued her father, gone insane, or killed herself. Actually, both of Skinner's daughters were cuddled and doted on, loved their parents deeply, and turned out to be successful, perfectly well-adjusted adults.

B. F. Skinner invented the Air Crib to provide a more comfortable, less restrictive infant bed than the traditional crib with its bars and blankets. Here the Skinners play with their 13-month-old daughter, Deborah.

Skinner, who was a kind and mild-mannered man, felt that it would be unethical not to try to improve human behavior by applying behavioral principles. And he practiced what he preached, proposing many ways to improve society and reduce human suffering. At the height of public criticism of Skinner's supposedly cold and inhumane approach to understanding behavior, the American Humanist Association recognized his efforts on behalf of humanity by honoring him with its Humanist of the Year Award.

Journal: Thinking Critically-Consider Other Interpretations
People cling to superstitious rituals because they think they work. Could this “effectiveness” be an illusion, explainable in terms of operant principles? How would extinction, stimulus generalization, stimulus discrimination, or schedules of reinforcement be involved in maintaining a superstitious belief?