Robert Nosofsky
In a classic investigation into the nature of category learning, Shepard, Hovland, and Jenkins (1961) tested people's ability to learn six types of classification problems. This landmark study was pivotal in developing constraints for models of category learning, and it remains highly influential. Numerous modern theorists continue to use the Shepard et al. (1961) tasks as benchmarks for models of classification (Anderson, 1991; Estes, 1994; Gluck & Bower, 1988; Kruschke, 1992; Nosofsky, 1984).
A central contribution of Shepard et al.'s (1961) study was the demonstration that models based solely on elementary principles of stimulus generalization were inadequate to explain the nature of category learning. Rather, some abstract process of selective attention to dimensions appeared to be critically involved. Furthermore, this study, together with follow-ups reported by Shepard and Chang (1963) and Shepard (1964), was instrumental in sparking the distinction between "integral" and "separable" dimensions that is fundamental in modern thinking about perception and cognition.
The structures of Shepard et al.'s (1961) problems are shown in Figure 1. In all cases, there were eight stimuli varying along three binary-valued dimensions. For purposes of illustration, in the example in Figure 1, the dimensions are color, shape, and size. In all the problems, four stimuli were assigned to Category A, and the remaining four to Category B. Although there are 70 distinct ways of assigning 4 of 8 stimuli to 2 categories, only 6 types of problems arise for stimuli varying along 3 binary-valued dimensions. All problems within a type have the same abstract structure, with only the assignment of physical dimensions to the logical dimensions varying.
In the Type I problem, only a single dimension is relevant. The example in Figure 1 is to classify all squares into Category A and all triangles into Category B. In the Type II problem, exactly two dimensions are relevant. The example in Figure 1 is to classify black squares and white triangles into Category A, and white squares and black triangles into Category B. In this example, the dimensions of color and shape are relevant, whereas size is irrelevant. In the Type VI problem, all three dimensions are equally relevant. Stating a rule for Type VI basically involves enumerating the stimuli in each of the categories. Finally, Types III, IV, and V are intermediate in structural complexity between Types II and VI. All three dimensions are relevant, but to differing extents. One way of thinking about these problems is as single-dimension-plus-exception structures. For instance, in the Type V example, squares belong to Category A, and triangles to Category B, except the small white triangle is switched with the small white square.
Shepard et al. (1961) found that upon initial exposure to each problem, people learned Type I most rapidly; followed by Type II; followed by Types III, IV, and V, which were approximately equal in difficulty; and finally Type VI. This ordering of difficulty was also observed by Nosofsky, Gluck, Palmeri, McKinley, and Glauthier (1994), who conducted a replication and extension of the original study. Shepard et al's (1961) result was important, because a vast class of models based on elementary principles of stimulus generalization failed to predict this ordering. Most critically, these models predicted that the Type II problem should be learned more slowly than Types III, IV, and V.
Shepard et al. (1961) suggested that a process of selective attention was involved when people learned to solve the problems. To solve Type II, people need attend only to two of the three dimensions; the third dimension is irrelevant. But to solve Types III, IV, and V, one needs to spread attention across all three dimensions.
This interpretation about the influence of selective attention was corroborated in theoretical analyses of Shepard et al.'s tasks conducted by Nosofsky (1984) and Kruschke (1992). These investigators demonstrated that, without allowing for selective attention processes, modern exemplar models of category learning, which formalize in quantitative fashion the principles of stimulus generalization assumed by Shepard et al. (1961), predict that the Type II problem should be learned more slowly than Types III, IV and V. However, with principles of selective attention incorporated, these same models predict perfectly the ordering of difficulty of the six problem types.
The stimuli used in Shepard et al.'s (1961) original tasks, and in Nosofsky et al.'s (1994) replication, varied along highly "separable" dimensions. Separable dimensions remain psychologically distinct when in combination; an example is forms varying in shape and color. A vast amount of converging evidence suggests that people are highly efficient at selectively attending to separable dimensions. By contrast, "integral" dimensions combine into relatively unanalyzable, unitary wholes; an example is colors varying in hue, brightness and saturation. Although people can selectively attend to integral dimensions to some degree, the process is far less efficient than occurs for separable-dimension stimuli.
Shepard and Chang (1963) reasoned that, if people learned to classify integral-dimension stimuli, models based solely on principles of stimulus generalization might indeed capture the results, because the selective attention process that operates for separable-dimension stimuli would be largely precluded. Shepard and Chang (1963) confirmed this prediction by demonstrating that models of stimulus generalization, without selective attention incorporated, provided reasonably good fits to data from six new category-learning problems in which participants classified integral-dimension color stimuli.
A shortcoming of the seminal investigations of Shepard et al. (1961) and Shepard and Chang (1963), however, is that in addition to varying whether the dimensions were separable or integral, different category structures were tested in the two studies. Indeed, whereas the stimuli used by Shepard et al. (1961) varied along three binary-valued dimensions, in Shepard and Chang's (1963) studies the stimuli varied along two continuous dimensions. Furthermore, the critical qualitative contrasts in problem difficulty that were present in Shepard et al.'s (1961) studies (Type II versus Types III-V) did not arise for the structures tested by Shepard and Chang (1963).
Thus, after all these years, there is still a missing part of the picture. A critical study that is needed is to replicate the Shepard et al. (1961) tasks, except using integral-dimension stimuli instead of separable-dimension ones. The purpose of the present research was to conduct such an experiment. The prediction stemming from current exemplar models is that when integral-dimension stimuli are used, the Type II problem should be learned more slowly than Types III, IV, and V.
In the scaling study, there were four blocks of similarity judgments. On each block, all 28 unique pairs of the 8 colors were presented, one pair per trial, in a random order. These four blocks were preceded by 20 practice trials. The color rectangles were presented simultaneously in the middle of the screen, separated by 3cm. Participants made similarity ratings by using a 10-point scale (1 very dissimilar, 10 very similar).
Classification Learning. The average probabilities of errors for each problem in each block of 16 trials are shown in Figure 3A. The means on late blocks reflect zero values for participants who had already reached criterion. Our assumption is that the participants who had reached criterion, and who thereby had already achieved between 32 and 40 consecutive correct responses, would have continued to respond without error if they maintained the same level of motivation.
The learning data confirm our critical prediction: Whereas the Type II problem was learned more quickly than Types III-V in previous studies in which separable-dimension stimuli were used (Nosofsky et al., 1994; Shepard et al., 1961), the reverse is observed for the present integral-dimension stimuli. The overall ordering of difficulty for the six problems in terms of average error probabilities is I, IV, III, V, II, and VI. Using the average error probability for each individual problem as the unit of analysis, pairwise t-tests indicate that Types IV and III were learned with significantly fewer errors than Type II, although the difference between Type V and Type II was not statistically significant.
More detailed analyses revealed that, regardless of which dimension was irrelevant (hue, saturation, or brightness), performance on the Type II problem was always worse than on Type IV. This result militates against concerns that poor performance on the Type II problem relative to Type IV arose solely from the discriminability differences that existed on the hue dimension. Even when hue was irrelevant, performance on Type II was worse than performance on Type IV.
The version of ALCOVE fitted by Nosofsky et al. (1994) had four free parameters: an overall sensitivity parameter (=), a background-noise constant (b), an association-weight learning rate (=), and an attention-weight learning rate (=) [see Kruschke (1992) and Nosofsky et al. (1994) for more detailed discussion]. The same model is now used to fit the current data. Because of the integral nature of the stimulus dimensions, however, the expectation is that the best-fitting value of the attention-learning parameter will be near zero.
We fitted ALCOVE by searching for the free parameters that minimized the sum of squared deviations between predicted and observed error probabilities. First, we constructed 120 random stimulus sequences. The characteristics of each sequence matched the constraints in our experimental design. For any given set of parameters, the model was used to generate predictions based on each random sequence. These 120 sets of predicted values were then averaged, and the averaged values constitute the predictions that were fitted to the observed data. A hill-climbing parameter search routine was used to find the best-fitting parameters. Finally, in fitting ALCOVE, the MDS solution derived for the colors (Figure 2B) was used for computing Euclidean distances among exemplars, and predictions of the model were averaged over balanced assignments of the dimensions of hue, brightness, and saturation to the logical structure of each problem.
The predicted learning curves are shown in Figure 3B. The 4-parameter model provides a reasonably good quantitative fit to the data, accounting for 94.2% of the variance in the 150 error probabilities. The best-fitting parameters were .608, .018, .338, and 0. Of most importance, the best-fitting attention-weight learning rate turned out to be zero. ALCOVE predicts perfectly the ordering of difficulty of the six problems for these integral-dimension stimuli. Most critically, it predicts that the Type II problem is learned more slowly than Types IV, III, and V.
The model also characterizes the subtle differences among Types III-V. Although each of these problems can be characterized as single-dimension-plus-exception structures, the nature of the exception varies. For instance, similarity relations among exemplars are more favorable in the Type IV structure than in Type V. Type IV is linearly separable, in the sense that the categories can be partitioned by drawing a single oblique plane through the cube. The exemplars in each category form a similarity cluster on each side of the plane. By contrast, in Type V the exception in each category is isolated from the remaining three exemplars, which leads to relatively inefficient learning.
No single study can be considered definitive, and caution is needed in making cross-experiment comparisons. Conceivably, other differences between the experiments could also account for the dramatically different patterns of results. For example, the rate of learning on all problems was slower in the present study than in the previous studies that used separable-dimension stimuli. Perhaps differences in overall discriminability among members of the stimulus sets affected selective-attention processes. Although further research is needed, the present study nevertheless goes a long way towards completing the picture conceived long ago by Shepard et al. (1961) and Shepard and Chang (1963) regarding the role of integral and separable dimensions in classification learning.
Anderson, J.R. (1991). The adaptive nature of human categorization. Psychological Review, 98, 409-429.
Estes, W.K. (1994). Classification and cognition. Oxford University Press.
Gluck, M.A., & Bower, G.H. (1988). Evaluating an adaptive network model of human learning. Journal of Memory and Language, 27, 166-195.
Kruschke, J.K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22-44.
Nosofsky, R. M. (1984). Choice, similarity, and the context theory of classification. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 104- 114.
Nosofsky, R.M., Gluck, M.A., Palmeri, T.J., McKinley, S.C., & Glauthier, P. (1994). Comparing models of rule-based classification learning: A replication and extension of Shepard, Hovland, and Jenkins (1961). Memory & Cognition, 22, 352-369.
Shepard, R. N. (1964). Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology, 1, 54-87.
Shepard, R. N., & Chang, J. J. (1963). Stimulus generalization in the learning of classifications. Journal of Experimental Psychology, 65, 94-102.
Shepard, R. N., Hovland, C. I., & Jenkins, H. M. (1961). Learning and memorization of classifications. Psychological Monographs, 75 (13, Whole No. 517).