Watch a video

View this video published on the CrashCourse channel on Youtube. Although the video begins with a summary of different theories of intelligence, the second half provides some historical context about why the study of intelligence has often been controversial.

While you’re likely familiar with the term “IQ” and associate it with the idea of intelligence, what does IQ really mean? IQ stands for intelligence quotient and describes a score earned on a test designed to measure intelligence. You’ve already learned that there are many ways psychologists describe intelligence (or more aptly, intelligences). Similarly, IQ tests—the tools designed to measure intelligence—have been the subject of debate throughout their development and use.

When might an IQ test be used? What do we learn from the results, and how might people use this information? IQ tests are expensive to administer and must be given by a licensed psychologist. Intelligence testing has been considered both a bane and a boon for education and social policy. In this section, we will explore what intelligence tests measure, how they are scored, and how they were developed.

It seems that the human understanding of intelligence is somewhat limited when we focus on traditional or academic-type intelligence. How then, can intelligence be measured? And when we measure intelligence, how do we ensure that we capture what we’re really trying to measure (in other words, that IQ tests function as valid measures of intelligence)? In the following paragraphs, we will explore the how intelligence tests were developed and the history of their use.

French psychologist Alfred Binet helped to develop intelligence testing. (b) This page is from a 1908 version of the Binet-Simon Intelligence Scale. Children being tested were asked which face, of each pair, was prettier.

The IQ test has been synonymous with intelligence for over a century. In the late 1800s, Sir Francis Galton developed the first broad test of intelligence (Flanagan & Kaufman, 2004[1]). Although he was not a psychologist, his contributions to the concepts of intelligence testing are still felt today (Gordon, 1995[2]). Reliable intelligence testing (you may recall from earlier chapters that reliability refers to a test’s ability to produce consistent results) began in earnest during the early 1900s with a researcher named Alfred Binet. Binet was asked by the French government to develop an intelligence test to use on children to determine which ones might have difficulty in school; it included many verbally based tasks. American researchers soon realized the value of such testing. Louis Terman, a Stanford professor, modified Binet’s work by standardizing the administration of the test and tested thousands of different-aged children to establish an average score for each age. As a result, the test was normed and standardized, which means that the test was administered consistently to a large enough representative sample of the population that the range of scores resulted in a bell curve (bell curves will be discussed later). Standardization means that the manner of administration, scoring, and interpretation of results is consistent. Norming involves giving a test to a large population so data can be collected comparing groups, such as age groups. The resulting data provide norms, or referential scores, by which to interpret future scores. Norms are not expectations of what a given group should know but a demonstration of what that group does know. Norming and standardizing the test ensures that new scores are reliable. This new version of the test was called the Stanford-Binet Intelligence Scale (Terman, 1916[3]). Remarkably, an updated version of this test is still widely used today.

In 1939, David Wechsler, a psychologist who spent part of his career working with World War I veterans, developed a new IQ test in the United States. Wechsler combined several subtests from other intelligence tests used between 1880 and World War I. These subtests tapped into a variety of verbal and nonverbal skills, because Wechsler believed that intelligence encompassed “the global capacity of a person to act purposefully, to think rationally, and to deal effectively with his environment” (Wechsler, 1958, p. 7[4]). He named the test the Wechsler-Bellevue Intelligence Scale (Wechsler, 1981[5]). This combination of subtests became one of the most extensively used intelligence tests in the history of psychology. Although its name was later changed to the Wechsler Adult Intelligence Scale (WAIS) and has been revised several times, the aims of the test remain virtually unchanged since its inception (Boake, 2002[6]). Today, there are three intelligence tests credited to Wechsler, the Wechsler Adult Intelligence Scale-fourth edition (WAIS-IV), the Wechsler Intelligence Scale for Children (WISC-V), and the Wechsler Preschool and Primary Scale of Intelligence—IV (WPPSI-IV) (Wechsler, 2002[7]). These tests are used widely in schools and communities throughout the United States, and they are periodically normed and standardized as a means of recalibration. Interestingly, the periodic recalibrations have led to an interesting observation known as the Flynn effect. Named after James Flynn, who was among the first to describe this trend, the Flynn effect refers to the observation that each generation has a significantly higher IQ than the last. Flynn himself argues, however, that increased IQ scores do not necessarily mean that younger generations are more intelligent per se (Flynn, Shaughnessy, & Fulgham, 2012[8]). As a part of the recalibration process, the WISC-V was given to thousands of children across the country, and children taking the test today are compared with their same-age peers.

The WISC-V is composed of 14 subtests, which comprise five indices, which then render an IQ score. The five indices are Verbal Comprehension, Visual Spatial, Fluid Reasoning, Working Memory, and Processing Speed. When the test is complete, individuals receive a score for each of the five indices and a Full Scale IQ score. The method of scoring reflects the understanding that intelligence is comprised of multiple abilities in several cognitive realms and focuses on the mental processes that the child used to arrive at his or her answers to each test item.

Ultimately, we are still left with the question of how valid intelligence tests are. Certainly, the most modern versions of these tests tap into more than verbal competencies, yet the specific skills that should be assessed in IQ testing, the degree to which any test can truly measure an individual’s intelligence, and the use of the results of IQ tests are still issues of debate (Gresham & Witt, 1997[9]; Flynn, Shaughnessy, & Fulgham, 2012[10]; Richardson, 2002[11]; Schlinger, 2003[12]).

What do you think?: Intellectually disabled criminals and capital punishment

The case of Atkins v. Virginia was a landmark case in the United States Supreme Court. On August 16, 1996, two men, Daryl Atkins and William Jones, robbed, kidnapped, and then shot and killed Eric Nesbitt, a local airman from the U.S. Air Force. A clinical psychologist evaluated Atkins and testified at the trial that Atkins had an IQ of 59. The mean IQ score is 100. The psychologist concluded that Atkins was mildly mentally retarded.

The jury found Atkins guilty, and he was sentenced to death. Atkins and his attorneys appealed to the Supreme Court. In June 2002, the Supreme Court reversed a previous decision and ruled that executions of mentally retarded criminals are ‘cruel and unusual punishments’ prohibited by the Eighth Amendment. The court wrote in their decision:

Clinical definitions of mental retardation require not only subaverage intellectual functioning, but also significant limitations in adaptive skills. Mentally retarded persons frequently know the difference between right and wrong and are competent to stand trial. Because of their impairments, however, by definition they have diminished capacities to understand and process information, to communicate, to abstract from mistakes and learn from experience, to engage in logical reasoning, to control impulses, and to understand others’ reactions. Their deficiencies do not warrant an exemption from criminal sanctions, but diminish their personal culpability (Atkins v. Virginia, 2002, par. 5[13]).

The court also decided that there was a state legislature consensus against the execution of the mentally retarded and that this consensus should stand for all of the states. The Supreme Court ruling left it up to the states to determine their own definitions of mental retardation and intellectual disability. The definitions vary among states as to who can be executed. In the Atkins case, a jury decided that because he had many contacts with his lawyers and thus was provided with intellectual stimulation, his IQ had reportedly increased, and he was now smart enough to be executed. He was given an execution date and then received a stay of execution after it was revealed that lawyers for co-defendant, William Jones, coached Jones to “produce a testimony against Mr. Atkins that did match the evidence” (Liptak, 2008[14]). After the revelation of this misconduct, Atkins was re-sentenced to life imprisonment.

Atkins v. Virginia (2002[15]) highlights several issues regarding society’s beliefs around intelligence. In the Atkins case, the Supreme Court decided that intellectual disability does affect decision making and therefore should affect the nature of the punishment such criminals receive. Where, however, should the lines of intellectual disability be drawn? In May 2014, the Supreme Court ruled in a related case (Hall v. Florida) that IQ scores cannot be used as a final determination of a prisoner’s eligibility for the death penalty (Roberts, 2014[16]).


Reflection question

Compare and contrast the benefits of the Stanford-Binet IQ test and Wechsler’s IQ tests.


  1. Flanagan, D., & Kaufman, A. (2004). Essentials of WISC-IV assessment. Hoboken: John Wiley and Sons, Inc.
  2. Gordon, O. E. (1995). Francis Galton (1822–1911). Retrieved from
  3. Terman, L. M. (1916). The measurement of intelligence. Boston: Houghton-Mifflin.
  4. Wechsler, D. (1958). The measurement of adult intelligence. Baltimore: Williams & Wilkins.
  5. Wechsler, D. (1981). Manual for the Wechsler Adult Intelligence Scale—revised. New York: Psychological Corporation.
  6. Boake, C. (2002, May 24). From the Binet-Simon to the Wechsler-Bellevue: Tracing the history of intelligence testing. Journal of Clinical and Experimental Neuropsychology, 24(3), 383–405.
  7. Wechsler, D. (2002 ). WPPSI-R manual. New York: Psychological Corporation.
  8. Flynn, J., Shaughnessy, M. F., & Fulgham, S. W. (2012) Interview with Jim Flynn about the Flynn effect. North American Journal of Psychology, 14(1), 25–38.
  9. Gresham, F. M., & Witt, J. C. (1997). Utility of intelligence tests for treatment planning, classification, and placement decisions: Recent empirical findings and future directions. School Psychology Quarterly, 12(3), 249–267.
  10. Flynn, J., Shaughnessy, M. F., & Fulgham, S. W. (2012) Interview with Jim Flynn about the Flynn effect. North American Journal of Psychology, 14(1), 25–38.
  11. Richardson, K. (2002). What IQ tests test. Theory & Psychology, 12(3), 283–314.
  12. Schlinger, H. D. (2003). The myth of intelligence. The Psychological Record, 53(1), 15–32.
  13. Atkins v. Virginia, 00-8452 (2002).
  14. Liptak, A. (2008, January 19). Lawyer reveals secret, toppling death sentence. New York Times. Retrieved from
  15. Atkins v. Virginia, 00-8452 (2002).
  16. Roberts, D. (2014, May 27). U.S. Supreme Court bars Florida from using IQ score cutoff for executions. The Guardian. Retrieved from


This page was proudly adapted from Psychology published by OpenStax CNX. Oct 31, 2016 under a Creative Commons Attribution 4.0 license. Download for free at