9.3 Measuring Intelligence: The Psychometric Approach

Intelligent people disagree on just what intelligence is. Some equate it with the ability to reason abstractly, others with the ability to learn and profit from experience in daily life. Some emphasize the ability to think rationally, others the ability to act purposefully. Part of the disagreement lies in the fact that the topic under discussion can't be directly observed.

Measuring the Invisible

If a surgeon wanted to know how much a person's gall bladder weighed, the patient could be sliced open, the gall bladder removed and weighed, and the patient stitched back up again. If the surgeon wanted to know the weight of many people's gall bladders, that procedure could be repeated with as many patients as desired. Gall bladders are tangible parts of the human body, able to be observed and measured directly. Intelligence, on the other hand, has an invisible quality to it; no matter how much a surgeon prods, pokes, or hunts, intelligence can't be held in one's hands or weighed on a bathroom scale. Rather, we infer a certain degree of intelligence in a person based on characteristics we can see and measure directly, such as the outcomes of rational decisions, answers to standardized tests, or the purposefulness of behavior. These qualities are all part of what most people mean by intelligence, but theorists weigh them differently.

The traditional approach to intelligence, the psychometric approach, focuses on how well people perform on standardized aptitude tests, which are designed to measure the ability to acquire skills and knowledge. A typical intelligence test asks you to do several things: Provide a specific bit of information, notice similarities between objects, solve arithmetic problems, define words, fill in the missing parts of incomplete pictures, arrange pictures in a logical order, arrange blocks to resemble a design, assemble puzzles, use a coding scheme, or judge what behavior would be appropriate in a particular situation. A statistical method called factor analysis helps to identify which basic abilities underlie performance on the various items. This procedure identifies clusters of correlated items that seem to be measuring some common ability, or factor. For example, performance on vocabulary, spelling, and reading comprehension tests is typically correlated within an individual. The underlying intellectual factor that contributes to each of these specific skills might be identified as “verbal reasoning.”

More than a century of research has convinced most psychometric psychologists that a general ability, or g factor, underlies the various abilities and talents measured by intelligence tests (Bouchard, 2014; Gottfredson, 2002; Jensen, 1998; Lubinski, 2004; Spearman, 1927; Wechsler, 1955). This general ability has two components. Crystallized intelligence refers to knowledge and skills, the kind that allow you to do arithmetic, define words, and make political decisions. Fluid intelligence refers to the capacity to reason and use information to solve new problems (Horn & Cattell, 1966). Crystallized g is heavily dependent on education and tends to remain stable or even increase over a lifetime, whereas fluid g is relatively independent of education and tends to decrease in old age. Tests of g do a good job of predicting academic achievement, occupational success, and eminence in many fields (Kuncel, Hezlett, & Ones, 2004; Schmidt & Hunter, 2004; Simonton & Song, 2009). But, as we will see, some scientists dispute the existence of a global quality called “intelligence,” observing that a person can be smart in some areas and not in others (Gould, 1994; Guilford, 1988). For more perspectives on what intelligence is or how it should be defined, watch the video What Is Intelligence?

Watch

What Is Intelligence?

The Invention of IQ Tests

The first widely used intelligence test was devised in 1904, when the French Ministry of Education asked psychologist Alfred Binet (1857–1911) to find a way to identify children who were slow learners so they could be given remedial work. The ministry was reluctant to let teachers identify such children because the teachers might have prejudices about poor children, or might assume that shy or disruptive children were mentally impaired. The government wanted a more objective approach.

Binet's Brainstorm Wrestling with the problem, Binet had a great insight: In the classroom, the responses of “dull” children resembled those of ordinary children of younger ages. Bright children, in contrast, responded like children of older ages. The thing to measure, then, was a child's mental age (MA), or level of intellectual development relative to that of other children. Then instruction could be tailored to the child's capabilities.

The test devised by Binet and his colleague, Théodore Simon, measured memory, vocabulary, and perceptual discrimination. Items ranged from those that most young children could do easily to those that only older children could handle, as determined by the testing of large numbers of children. A scoring system developed later by others used a formula in which a child's mental age was divided by the child's actual age to yield an intelligence quotient (IQ) (a quotient is the result of division). With this formula, all average children, regardless of their age, would have an IQ of 100 because their mental age and their actual age would be the same. But a child of 8 who performed like the average 10-year-old would have a mental age of 10 and an IQ of 125 (10 divided by 8, times 100). For a brief history of the development of intelligence testing, watch the video Intelligence Testing, Then and Now 1.

Watch

Intelligence Testing, Then and Now 1

Unfortunately, this method of computing IQ had serious flaws. At one age, scores might cluster tightly around the average, whereas at another age they might be more dispersed. As a result, the score necessary to be in the top 10 or 20 or 30 percent of your age group varied, depending on your age. Also, the IQ formula did not make sense for adults; a 50-year-old who scores like a 30-year-old does not have low intelligence! Today, therefore, intelligence tests are scored differently. The mean (average) is usually set arbitrarily at 100; tests are constructed so that about two-thirds of all people score between 85 and 115; and individual scores are computed from tables based on established norms. These scores are still informally referred to as IQs, and they still reflect how a person compares with other people, either children of a particular age or adults in general. At all ages, the distribution of scores approximates a normal (bell-shaped) curve, with scores near the mean more common than high or low scores (see Figure9.8).

On most tests, about 68 percent of all people will score between 85 and 115; about 95 percent will score between 70 and 130; and about 99.7 percent will score between 55 and 145. In any actual sample, however, the distribution will depart somewhat from the theoretical ideal.

The IQ Test Comes to America In the United States, Stanford psychologist Lewis Terman revised Binet's test and established norms for American children. His version, the Stanford–Binet Intelligence Scales, was first published in 1916, and has been updated several times since. It can be given to children as young as age 2 or adults as old as age 85. The test asks a person to perform a variety of tasks—to fill in missing words in sentences, answer questions requiring general knowledge, predict how a folded paper will look when unfolded, measure a quantity of water using two containers of different sizes, and distinguish between concepts that are similar but not exactly the same (such as, say, vigor and energy). The older the test taker is, the more the test requires in the way of verbal comprehension and fluency, spatial ability, and reasoning.

Two decades later, David Wechsler designed another test expressly for adults, which became the Wechsler Adult Intelligence Scale (WAIS); it was followed by the Wechsler Intelligence Scale for Children (WISC). These tests, too, have been updated several times. They produce a general IQ score and also separate scores for verbal comprehension, perceptual reasoning, processing speed, and working memory (the ability to hold information in mind so that it can be used for a task). Items measure a range of abilities, including vocabulary, arithmetic abilities, the ability to recognize similarities (e.g., “How are books and movies alike?”), general knowledge and comprehension (e.g., “Who was Thomas Jefferson?” “Why do people who want a divorce have to go to court?”), and nonverbal skills, such as the ability to re-create a block design within a specified time limit or to identify a part missing from a picture. (See Figure9.9.) More information on these types of tests is provided in the video Intelligence Testing, Then and Now 2.

Figure 9.9

Wechsler Picture Arrangement Test

Nonverbal items such as these are particularly useful for measuring the abilities of those who have poor hearing, are not fluent in the tester’s language, have limited education, or resist doing classroom-type problems. A large gap between a person’s verbal score and performance on nonverbal tasks such as these sometimes indicates a specific learning problem.

Binet had emphasized that his test merely sampled intelligence and did not measure everything covered by that term. A test score, he said, could be useful, along with other information, for predicting school performance, but it should not be confused with intelligence itself. The tests were designed to be given individually, so that the test giver could tell when a child was ill or nervous, had poor vision, or was unmotivated. The purpose was to identify children with learning problems, not to rank all children. But when intelligence testing was brought from France to the United States, its original purpose got lost at sea. IQ tests became widely used not to bring slow learners up to the average, but to categorize people in school and in the armed services according to their presumed “natural ability.” The testers overlooked the fact that in America, with its many ethnic groups, people did not all share the same background and experience (Gould, 1996).

Video9.5

Intelligence Testing, Then and Now 2

CULTURE and Intelligence Testing

Intelligence tests developed between World War I and the 1960s for use in schools favored urban children over rural ones, middle-class children over poor ones, and white children over nonwhite children. One item asked whether the Emperor Concerto was written by Beethoven, Mozart, Bach, Brahms, or Mahler. (The answer is Beethoven.) Critics complained that the tests did not measure the kinds of knowledge and skills that indicate intelligent behavior in a minority neighborhood or a remote rural community. They feared that because teachers thought IQ scores revealed the limits of a child's potential, low-scoring children would not get the educational attention or encouragement they needed. The video Intelligence Tests and Stereotypes explores these issues further.

Watch

Intelligence Tests and Stereotypes

Test makers responded by trying to construct tests that were unaffected by culture or that incorporated knowledge and skills common to many different cultures. But these efforts were disappointing. One reason was that cultures differ in the problem-solving strategies they emphasize (Serpell & Haynes, 2004). In the West, white, middle-class children typically learn to classify things by category—to say that an apple and a peach are similar because they are both fruits, and that a saw and a rake are similar because they are both tools. But children who are not trained in middle-class ways of sorting things may classify objects according to their sensory qualities or functions; they may say that an apple and a peach are similar because they taste good. That may be a charming and innovative answer, but it is one that test givers have interpreted as less intelligent (Miller-Jones, 1989).

Whether or not you feel “stereotype threat” depends on what category you are identifying with at the time. Asian women do worse on math tests when they see themselves as “women” (stereotype = poor at math) rather than as “Asians” (stereotype = good at math) (Shih, Pittinsky, & Ambady, 1999).

Testing experts also discovered that cultural values and experiences affect many things besides responses to specific test items. These include a person's general attitude toward exams, comfort in the settings required for testing, motivation, rapport with the test giver, competitiveness, comfort in solving problems independently rather than with others, and familiarity with the conventions for taking tests (Anastasi & Urbina, 1997; López, 1995; Sternberg, 2004).

Moreover, people's performance on IQ and other mental-ability tests depends in part on their own expectations about how they will do, and those expectations are affected by cultural stereotypes. Stereotypes that portray women or members of certain ethnic, age, or socioeconomic groups as unintelligent can actually depress the performance of people in those groups. You might think that a woman would say, “So, sexists think women are dumb at math? I'll show them!” or that an African American would say, “So racists believe that blacks aren't as smart as whites? Just give me that exam.” But often that is not what happens.

On the contrary, such individuals commonly feel a burden of doubt about their abilities, creating an insecurity known as stereotype threat (Steele, 1992, 1997, 2010) (see Figure9.10). The threat occurs when people believe that if they do not do well, they will confirm the stereotypes about their group. Negative thoughts intrude and disrupt their concentration (“I hate this test,” “I'm no good at math”) (Cadinu et al., 2005). The resulting anxiety may then worsen their performance or kill their motivation to even try to do well.

Figure 9.10

Stereotype Threat

More than 300 studies have shown that stereotype threat can affect many African Americans, Latinos, low-income people, women, and older adults, all of whom perform better when they are not feeling self-conscious about themselves as members of negatively stereotyped groups (e.g., J. Aronson, 2010; Brown & Josephs, 1999; Inzlicht & Ben-Zeev, 2000; Lamont, Swift, & Abrams, electronic preview; Levy, 1996; Quinn & Spencer, 2001; Steele & Aronson, 1995; Thomas & Dubois, 2011). Anything that increases the salience of group stereotypes can increase stereotype threat and affect performance, including taking the test in a setting where you are the only member from your group, or being asked to state your race before taking the test. The media and even some scholars have sometimes misinterpreted these results to mean that stereotype threat is the only reason for group differences in test performance, which it is not (Sackett, Hardison, & Cullen, 2004). It is, however, an important contributing factor.

What can be done to reduce stereotype threat? One effective approach is to assure people that the test is fair; another is simply to tell people about the existence of stereotype threat (Good, Aronson, & Harder, 2008; Johns, Schmader, & Martens, 2005). But these solutions are unlikely to eliminate all group differences in test scores, and that fact points to a dilemma at the heart of intelligence and mental-ability testing. Intelligence and other mental-ability tests put some groups of people at a disadvantage, yet they also measure skills and knowledge useful in the classroom. How can psychologists and educators recognize and accept cultural differences and, at the same time, promote the mastery of the skills, knowledge, and attitudes that can help people succeed in school and in the larger society?

Journal: Critical Thinking—Consider Other Interpretations

When tests find IQ differences between groups of children from different cultures, many people assume that the children who score lower are inherently less intelligent. What other explanations are possible?

9.3

Measuring Intelligence: The Psychometric Approach

Survey on Intelligence

Measuring the Invisible

Watch

The Invention of IQ Tests

Watch

Figure 9.8

Figure 9.9

Video9.5