Ellen M. Markman
Stanford University
To begin, I briefly review the evidence for the three word-learning constraints. I then address misconceptions about the nature of biological constraints that pervade recent discussions of constraints on word learning where word-learning biases are interpreted as implying rigid, hardwired, innate mechanisms that are immune from input. I argue that such constraints should be thought of as default assumptions, as probabilistic biases that provide good first guesses but not final solutions. Another misconception is to interpret these biases as necessarily being language specific. Analyses of other domains reveal, however, that all three assumptions appear in contexts other than word learning. This is not to say that they are are completely general because, although some domains are governed by very similar principles, clear, important exceptions can readily be found. Domain specificity bears on questions about the origins of these constraints in that if comparable principles are found in other domains they may well be recruited for word learning. As for questions about the origins of the constraints, this complex and subtle set of issues is sometimes reduced to a simple contrast between innate versus learned. I show how this dichotomy can obscure rather than clarify the issues. For one, claims about the universality of constraints has been conflated with claims about innateness. Moreover, the innate-learned formulation seems to presuppose that a given constraint is a single homogeneous ability with a simple developmental history. Insights from an ethology of learning reveal that the innate-learned dichotomy further oversimplifies the issues in failing to acknowledge that learning itself is an adaptation. I conclude by suggesting that this ethological perspective be applied to the problem of word-learning.
The claim to be examined is that constraints on hypotheses are needed to help children solve the inductive problem that word-learning poses. On the most extreme formulation of this hypothesis, a baby would be unable to learn even a single word by any other means. However, there is reason to believe that word-learning constraints could be necessary for language acquisition, and yet still appear only after some language has been acquired. The reason for this claim is that somewhere around 18 months of age, the character of children's language learning appears to change dramatically (Bloom et al, 1985; Corrigan, 1983; Dromi, 1987; Goldfield, & Resnick, 1990; Halliday, 1975; McShane, 1979; Nelson, 1973). At this point, children start acquiring words at a very fast pace--in some cases several new words a day. This "naming explosion" or "vocabulary spurt" may mark a qualitatively new way of acquiring language. Such fast learning must be a constrained form of learning. Before the onset of the naming explosion, however, "word" learning might occur through a more brute force paired-associate kind of learning. Children may well acquire the first 50 or so words in their vocabulary by some slow associative mechanism, but this would account for only a tiny fraction of their language. As a working hypothesis, then, the prediction is that word-learning constraints should be available to babies by the time they are capable of fast word-learning--at least by 18 months of age on the average.
Once children decide a term refers to the whole object, they still need to decide how to extend it to other objects. The term could refer to some external relation between two objects. Spatial relations, causal relations, possessor-possessed are some examples of common relations between objects that a term could in principle label. More generally, objects can be related through the variety of ways in which they participate in the same event or theme (e.g., cats eat mice; people read books; birds build nests). Many studies of classification in children demonstrate that children often find thematic relations particularly salient and interesting (see Gelman & Baillargeon, 1983; Markman, 1989 and Markman & Callanan, 1983 for discussions).
If children are attending to thematic relations between objects, how is it that they so readily learn labels for kinds of objects instead? To answer this question, Markman and Hutchinson (1984) proposed that children constrain the possible meanings of words to refer to objects of like kind. This taxonomic assumption leads children to rule out thematic meanings. That is, children reject thematic relations as a first hypothesis about what a novel label might refer to, despite finding such relations to be salient and interesting. Markman and Hutchinson conducted a series of studies which compared how children would organize objects when an object was referred to with a novel label versus when it was not. When presented with two objects, such as a dog and cat, and a third object that was thematically related such as dog food, children would often select a dog and dog food as being the same kind of thing. If, however, the dog was called by an unfamiliar label such as dax and children told to find another dax, they now were more likely to select the cat. This illustrates the basic phenomenon: When children believe they are learning a new word, they focus on taxonomic, not thematic, relations. These findings have been extended and refined in a number of studies (Baldwin, 1989; Hutchinson, 1984; Landau, Smith, & Jones, 1988; and Waxman has a series of studies documenting both the effectiveness and the limits of the taxonomic assumption and how it interacts with the hierarchical level of the category being named (Waxman, 1990; in press; Waxman & Gelman, 1986; Waxman & Kosowski, 1990).
The main limitation of these studies is that most of them provide evidence for children two and older. One exception is a study that Backscheider and I conducted with 18 to 24 month-olds (Backscheider & Markman, 1990). Our results replicated the original Markman and Hutchinson (1984) findings, with these younger children. In the absence of a label, the children tended to select thematic associates to the target. In marked contrast, when an object was given a novel label these 18-24 month old children interpreted the novel label as referring to objects of the same taxonomic category the clear majority of the the time. Thus the taxonomic assumption is used by children by 18 months of age.
Huttenlocher and Smiley (1987) have evidence from children's very early word use that further confirms that early language learners honor the taxonomic assumption. They examined the language use of children they followed from the time of their first word (around 13 months for most of the children) until the children were 2 or 2 1/2 years old. Their goal was to determine on what basis children extend words beyond their original context and to test whether early on children extend words complexively. A complexive use of a word would be tantamount to what we referred to as a thematic use--extending the word to a spatial, temporal, or causal associate of an object, rather than to objects of like kind.
Previous researchers have reported finding that children's early word meanings were sometimes complexive (Nelson, 1974; Snyder, Bates, & Bretherton, 1981), but Huttenlocher and Smiley argued that some of the previously reported instances of apparent complexive extensions of words by children may actually have been nonreferential uses of language by children. For example, a child who says "cookie" while reaching towards a cookie jar isn't necessarily labeling the jar as "cookie." Instead the child might know that cookies are kept in the jar and, being in the one word stage, about the only way to formulate a request for a cookie when no cookie is visible would be to say "cookie." Huttenlocher and Smiley set forth criteria to differentiate between that and other communicative uses of language from genuine complexive extensions. They found that even from the onset of language production, children were not using words to refer to complexively (thematically) organized objects. Instead, early language learners generalized object labels in ways that fit the whole object and taxonomic assumptions.
Baldwin and Markman (1989) looked at what might be considered a precursor of the whole object assumption, namely, does labeling an object for a baby cause the baby to attend more to that object than if it weren't labeled. We argued that if infants are biased to attend more to objects when they hear them labeled, then that could help them to notice word-object pairings. To test this, a first study compared how long 10- to 14-month-old infants looked at unfamiliar toys when a novel label was provided, versus when no label was offered. As predicted, labeling the toys increased infants' attention to them.
A second study examined whether labeling increased infants' attention to objects over and above what pointing, a powerful nonlinguistic method for directing infants' attention, can accomplish on its own. Infants ranging in age from 10 to 20 months were shown pairs of unfamiliar toys in two situations: (a) in a pointing alone condition, where the experimenter pointed a number of times at one of the toys, and (b) in a labeling and pointing condition, where the experimenter labeled the target toy while pointing to it. While pointing occurred, infants looked just as long at the target toy whether or not it was labeled. However, during a subsequent play period in which no labels were uttered, infants gazed longer at the target toys that had been labeled than at those that had not. Thus labeling can increase infants' attention to objects beyond the time that the labeling actually occurs. This tendency of language to sustain infants' attention to objects may help them learn the mappings between words and objects. It could also serve as a precursor or component of the full-blown whole object assumption.
To see how mutual exclusivity overrides the whole object assumption and helps children acquire property terms suppose a novel term is applied to an object for which a child already has a label. In order to adhere to the principle of mutual exclusivity, the child would have to reject the novel term as a label for the object. The child could simply reject the term as a label for the object without coming up with an alternative meaning. Rejecting one meaning for the term, however, leaves the child with a term without a referent. This in itself may motivate children to try to find some meaning for the novel term. The mutual exclusivity principle does not speak to how children select among the potential meanings, but they might analyze the object for some interesting part or property and interpret the novel term as applying to it. Markman and Wachtel (1988) demonstrated that 3- and 4-year-old children can use mutual exclusivity to learn terms for parts and for substances. When a novel label was mentioned in the presence of an object with a known label, children rejected the term as a second label for the object and interpreted it instead as a label for a part of the object or its substance.
Mutual exclusivity could further contribute to word learning by helping children to narrow overextensions (Barrett, 1978; Clark, 1983, 1987; Merriman & Bowman, 1989). Suppose a child has overextended dog to apply to sheep as well as dogs, but then learns to the correct name for sheep. The child would then need to stop calling sheep dog in order to avoid having two names for the same object.
Clark (1983, 1987) postulates another related principle to help account for semantic acquisition. (See Markman, 1989 for a comparison of lexical contrast and mutual exclusivity). She argues, following Bolinger (1977), that every word in a dictionary contrasts with every other word and that to acquire words children must assume that word meanings are contrastive. Mutual exclusivity is one kind of contrast, but it is a more specific and stronger assumption: many terms that contrast in meaning are not mutually exclusive. Terms at different levels of a class-inclusion hierarchy, such as dog and animal, contrast in meaning in Clark's sense, since obviously the meaning of animal is different from that of dog. Yet, these terms violate mutual exclusivity. In fact, this points out one disadvantage of the mutual exclusivity bias: it impedes children's ability to learn class-inclusion relations (Markman, 1987; Markman, 1989).
Merriman and Bowman summarize the literature by outlining four ways in which children can act in accord with mutual exclusivity: (1) If a new term is used in a context in which it could either refer to an object with a known label or one whose label is not yet known, children should avoid interpreting the term as a second label for the known object and interpret it instead as referring to the object they cannot name. (2) Alternatively, when presented with a second label for an object, a child could correct the old label, replacing it with the new one. (3) Another option would be to simply reject the second label, either by explicitly denying that the term is appropriate (e.g. "No, that's not a ...") or by just ignoring the second label. (4) Finally, in order to preserve mutual exclusivity, children should avoid generalizing a new label to already named items. Which option is selected for maintaining mutual exclusivity is argued to depend on the situation. If the reference of the second label is ambiguous, the child is likely to map the label onto an object without a known name. If the child is uncertain about the old name, he or she may correct it, replacing it with the new one. In a series of studies, children 2 1/2 and over were found to use mutual exclusivity in accord with these predictions.
As for children closer in age to the beginning of the naming explosion, Merriman and Bowman failed to find any evidence for the use of mutual exclusivity in two-year-olds. However, Woodward and Markman (in press) point out several methodological flaws in Merriman and Bowman's (1989) procedures. Their measures of mutual exclusivity were overly taxing and thus overly conservative measures of very young children's use of mutual exclusivity. Although two and a half year olds were capable of maintaining mutual exclusivity despite these problems, two-year-olds were not.
I now describe two lines of work in progress that are providing evidence for mutual exclusivity in children younger than 18 months of age. The first addresses whether mutual exclusivity can be used by babies about the age of the naming explosion to infer the referent of a novel label. One advantage of mutual exclusivity is that it allows children to acquire words for objects through indirect means, without anyone actually pointing to and labeling an object. Suppose, for example, a child sees two objects, one of which has a known label, say a ball, and another whose label is unknown, say a whisk, and hears someone say "Can you hand me the whisk?". A child who attempts to preserve the mutual exclusivity of terms should reject a second label for ball and thus infer that "whisk' must refer to the whisk given it is the only other object around. This ability to infer the appropriate referent has been documented in studies of children 2 or 2 1/2 and older (Au & Glusman, 1990; Dockrell & Campbell, 1986; Golinkoff et al., 1985; Hutchinson, 1986; Markman & Wachtel, 1988; Merriman & Bowman, 1989).
Recently, however, Merriman and Bowman (1989) argued that these results could be obtained without recourse to mutual exclusivity, if children had a bias to fill lexical gaps. The lexical gap hypothesis states that in the presence of an object that as yet has no known label, children are motivated to discover its name (Clark, 1983; 1987). Thus in the tests of mutual exclusivity just described children could map the novel word to the novel object because they have a novel object that they want to name rather than because they were reluctant to acquire a second label.
Work in progress (Markman & Wasow, in preparation) is designed to address both whether mutual exclusivity is available to children around 18 months of age and whether it can be differentiated from a propensity to fill lexical gaps. The lexical gap explanation is that upon seeing an object whose label is unknown, children are motivated to find out what it is called. To rule this out, we did not allow babies to view novel objects at the time the novel labels were heard. Babies heard a novel label in the presence of a familiar object with a known label, but no novel object was visible. Mutual exclusivity should lead children to reject second labels for objects and to search for an object whose name they don't yet know as a referent for a novel label. Both of these predictions were supported. Babies as young as 15-months-old were found to honor the mutual exclusivity assumption, unaided by a bias to fill lexical gaps. Use of mutual exclusivity in these babies is particularly impressive because the situation was stripped of all other cues to a word's meaning. There was no object visible as a potential referent nor were there any of the typical cues that the speaker provides such as eye gaze towards the relevant object, pointing, or touching, or any other contextual cue as to the intended referent. Fifteen-month-olds thus relied heavily on mutual exclusivity to guide their search for an appropriate referent.
A second source of evidence that mutual exclusivity is used by very young children comes from studies of second label learning (Liittschwager & Markman, 1991). Mutual exclusivity should lead children to reject second labels for objects thereby interfering with their ability to learn second labels for words. There have been several experimental studies of second label learning in young children but all of these studies were undertaken for other theoretical reasons and did not systematically compare first versus second label learning (Banigan & Mervis, 1988; Taylor & Gelman, 1989; Tomasello, Mannle & Werdenschlag, 1988; Waxman & Senghas, 1990). We focus on Banigan and Mervis' (1988) study which was designed to compare the effectiveness of several training methods for teaching 2-year-old children second labels for objects. Because some of the methods were successful, Banigan and Mervis conclude that children this young do not yet expect terms to be mutually exclusive.
Although Banigan and Mervis have shown that children are capable of violating mutual exclusivity, this is not tantamount to demonstrating children lack the mutual exclusivity bias. In fact, closer inspection of Banigan and Mervis' findings suggest that their study might be providing evidence in favor of mutual exclusivity. In particular, they found that it was rather difficult to teach children second labels for things. They included four different kind of teaching methods. In one condition they simply pointed to the object, e.g., a unicorn that children called "horse" and said "this is a unicorn." There was no learning at all in this condition. Yet this is a very common way of teaching object labels to young children. Had this been a first label for the object children may well have readily learned the new label. There was still no learning when labeling was supplemented by a description (e.g. "see it has a horn"). There was a small amount of learning when labeling was supplemented by a demonstration (having the unicorn butt with its horn). It was only when labeling, description and demonstration were all included that appreciable learning took place. Again, this is unlikely to be needed for first label learning. It may have been because children are reluctant to accept second labels for objects that such extensive training was needed.
To test this, Liittschwager and I attempted to teach 16-month-olds a novel label and then assessed their comprehension of the newly taught term. There were two conditions. In the first-label condition, babies were taught a term for an object for which they had no known label. In the second-label condition, babies were taught a term for an object for which they already knew a label. As predicted by the mutual exclusivity hypothesis, the babies readily learned first labels for objects but failed to learn the second labels.
These very recent findings, then, document the use of mutual exclusivity by babies 15-18 months of age. As predicted these young babies are led by mutual exclusivity to reject a novel label as a second label for an object which (1) impedes their ability to learn second labels (Liittschwager & Markman, 1991) and (2) allows them to infer that a novel object is the more likely referent of a novel label (Markman & Wasow, in preparation).
In sum, although there are disadvantages to assuming that object labels are mutually exclusive, the advantages are that by assuming mutual exclusivity, children could avoid redundant hypotheses about the meanings of category terms, narrow overgeneralizations of terms, infer the correct referent of a term without anyone explicitly pointing it out, and override the whole-object assumption. At least some of these uses of mutual exclusivity are available to children from 15 months of age, and possibly earlier. Thus, there is evidence suggesting that all three postulated word-learning constraints could contribute to very young children's ability to quickly figure out a word's meaning. With this as background, I now turn to consider questions about the nature of these constraints.
There has been a good deal of controversy about whether postulating constraints on word learning is a useful way to conceptualize the problem (Gathercole, 1989; MacWhinney, 1989; Nelson, 1988). One source of the debate is disagreement at to the nature of the postulated constraints. Nelson (1988) argues that constraints must be absolute and that any deviation in a child's performance is evidence against a constraint operating. For example, she criticizes the Markman and Hutchinson (1984) evidence for the taxonomic assumption on the grounds that children in these studies were not scoring 100% correct. Nelson's view is that to argue there may be constraints on word learning requires that these biases be absolute, admitting of no variance. This is certainly not the position of researchers who have proposed constraints on learning for domains such as conceptual development (Keil, 1979), causal reasoning (Brown, 1990; Gelman, 1990), counting (Gelman, 1990), object knowledge, (Spelke, 1990), and language acquisition (Markman, 1987, 1989, 1990; in press; Markman & Hutchinson, 1984; Markman & Wachtel, 1988; Merriman & Bowman, 1989; Newport, in press; Pinker, 1984; Waxman in press) (see also Keil, 1981, 1990, and Gelman, 1990). Moreover, the notion of constraints as default assumptions is widely held by ethologists arguing for biological bases of learning (cf. Marler & Terrace, 1984).
One way in which biases are not absolute is that they may be ordered into a hierarchy such that one bias overrides another. The extraordinary ability of migratory birds exemplifies such a case. Keeton (1974) summarizes some of the most impressive of the documented feats of such birds. A manx shearwater, for example, migrated over 3000 miles in twelve and a half days to return to its burrow (Matthews, 1953 as cited in Keeton, 1974). In studying homing pigeons, Keeton concludes that when the sun is visible the pigeons use it as a compass. On overcast days, however, the pigeons are still able to find their way home. Thus, the birds have some alternative means of navigation that serve as a back-up system. Keeton reviews the controversy about whether pigeons could be using the earth's magnetic field as one such system. Although this hypothesis was first put forward in 1882 and revived in 1947 there was so much contradictory evidence that it fell into disrepute. The reason for the failures to find that pigeons could navigate by geomagnetism is that it is not the birds' preferred strategy. Only when the preferred cue for navigation (the position of the sun) is unavailable do pigeons resort to relying on magnetism.
Imprinting provides a good example of a system of substantial plasticity that is nevertheless governed to some extent by innate predispositions. A given species of bird can be sexually imprinted onto a different species or even, in the case of hand-reared birds, onto humans. The birds will later show mating displays towards the foster species. On the other hand, Immelmann (1972) documents that, despite this plasticity, there are preferences for a member of a bird's own species. In a test of whether zebra finches imprint most easily on their own species, male zebra finches were raised by a mixed pair of foster parents, a zebra finch and a Bengalese finch. Although there was equal opportunity for imprinting on either species, the birds nearly always had a sexual preference for their own species. Furthermore, imprinting onto a member of a bird's own species occurs more quickly, is more rigid, and is less likely to be reversed than imprinting onto a different species. Thus the ease of learning and the quality of learning through imprinting is governed in part by the species-specific biases of the animal.
In a recent conference designed to consider issues of constraints on learning in biology, this point that constraints should be thought of as probabilistic biases was made repeatedly (Marler and Terrace, 1984). Here is an example from Gould and Marler (1984) who argue:
"Indeed, it is tempting to place a default value interpretation on the associative biases of animals. Although bees, for instance, can learn that a flower is any color from yellow to ultraviolet, it learns the color of purple flowers far more quickly than any other color of flowers (Gould, 1984). At the same time, bees prefer purple silhouettes to all other colors on a spontaneous preference test. It is as though purple is the default parameter--a probabilistic bias which helps guide bees when they experiment with various flowers while searching for food." (p.65)And from Gould (1984):
"In a very real sense, many cases of selective learning should be thought of as mechanisms by which experience serves to tune an animals' behavior from the default distribution of alternatives to the actual odds in the world around it." (p.153)Among ethologists, constraints are postulated as one means of helping the organisms to solve the inductive problems they face. In many cases, these biases do not and could not provide absolute guarantees of correct answers. The environment is too unpredictable for absolute biases to be adaptive. Rather the organism must be capable of learning--of extracting information flexibly from the environment. These biases give the organism a good first guess--a head start in solving the problem, compared to if it were sampling randomly from an extraordinarily large number of options.
It is in this way that constraints may be useful for young children trying to figure out what words in their language mean. The constraints that have been postulated, such as the whole-object, taxonomic, and mutual exclusivity assumptions, give the young child good first guesses about the meaning of a novel term. They provide powerful means to begin word learning, but not at all the final solutions.
Take the whole object assumption, for example: without evidence to the contrary, children should interpret a novel term as a label for an object--rather than a part or substance of the object or its color, size, shape, weight, action, etc. Children must, however, be able to override this assumption in order to learn terms for parts, substances, properties, and events. Several different kinds of information could provide evidence to the contrary. If there were no salient object around at the time a novel term was introduced, the absence of a candidate object could override the whole object assumption. For example, Soja et al. (in press) found that when presented with a blob of stuff rather than a discrete object, children will interpret a novel label as a substance term.
Baldwin's 1989) work on how infants contribute to the establishment of joint reference points out another interesting factor that affects how children apply the whole object assumption. Even if children honor the whole-object assumption, they still need to decide which object in their environment is the appropriate referent of a novel term. One solution could be to treat the label as referring to whatever object they happen to be viewing at the time they hear the label. If so, then errors would be expected on those occasions when the adult labels something other than what the child is attending to. Although vocabulary is acquired more quickly when parents tend to label what their children are attending to (Tomasello & Farrar, 1986), there is no evidence suggesting that wrong mappings occur when joint focus of attention is not achieved. One way that such errors could be avoided is if children monitor the speaker's focus of attention, through eye-gaze, posture, or some other cues. To investigate this, Baldwin (1989) taught 16-19 month olds a novel word in two conditions. In the joint labeling condition, the experimenter labeled a toy at which the baby was looking. In the discrepant labeling condition, the experimenter labeled a toy other than the one at which the baby was looking. This was accomplished by having the experimenter look into a bucket, presumably at an object contained there, while she provided the novel label. The results indicate that babies do not simply map a novel label onto the object they are attending to. Even though the babies heard the label as they were looking at and playing with a novel toy, they only treated the term as a label for that toy when the speaker was also looking at the same toy. Children did not mistakenly treat the term as referring to the toy they were looking at if the experimenter was looking in the bucket. Instead children throughout this age range avoided errors. The 16-17 month olds simply failed to learn the new term, while the 18-19 month olds correctly inferred that the label referred to the (unseen) toy in the bucket. When that toy was later revealed, the older babies treated it as the referent of the novel label. Thus, children monitor the speaker's focus of attention before concluding that a given object is the referent of the term. The whole-object assumption is moderated by babies' requirement for a joint focus of attention.
Lack of joint focus of attention might, therefore, provide another means of overriding the whole-object assumption. Moreover, the absence of a joint focus of attention coupled with certain rhythmic cues might further discourage treating a term as an object label. Imagine, for example, an adult repeatedly lifting a baby overhead while saying "whee!". A child could, of course, find some object in the room to label. But the absence of ostensive cues from the adult along with the synchronizing of the "whee" with the swinging motion would lead the child to interpret "whee" as part of the event rather than as a label for an object. This rhythmic and synchronous use of "whee" or peek-a-boo and other games becomes part of the activity itself. This coordination of language with an activity is very different form saying, for example, "Let's do whee" and then silently lifting a baby. The absence of a salient object, the absence of joint focus of attention, and the rhythmic synchrony of language and events could work together to block the whole object assumption in such cases.
As Markman and Wachtel (1988) demonstrated, the mutual exclusivity assumption can be another source of information in conflict with the whole object assumption. By rejecting a novel term as a second label for an object, children will then search for a part, or substance, or other attribute of the object to label. Thus children will violate the whole object assumption in order to preserve mutual exclusivity.
As children learn more about their language, grammatical form class can serve as further means of overriding the whole object assumption. If for example the novel word is clearly recognizable as a verb, that would cause children to override the whole object assumption.
In sum, the whole object constraint serves as a first hypothesis that can be overridden in a variety of different ways ranging from lack of environmental support (e.g. when there are no salient objects around), to lack of joint focus of attention on an object (as when an adult fails to attend to a candidate object), to its coming into conflict with other word learning constraints (e.g., mutual exclusivity) to its conflict with other aspects of the linguistic system (e.g. grammatical form class).
One implication of viewing constraints as default assumptions is that violations of a constraint found in a child's lexicon are not necessarily evidence against the existence of the constraint. Yet such counterexamples constitute much of what has been taken as evidence against constraints (Banigan & Mervis, 1988; Gathercole, 1987, 1989; Merriman, 1987; Mervis, 1987, 1989; Nelson, 1988). Instead of treating such violations simply as negative evidence, we could look to such violations as information about how children go about overriding the constraint when needed. Merriman and Bowman (1989) approach the literature on mutual exclusivity from this perspective and argue that there is flexibility in how it can be manifested. Even when mutual exclusivity is preserved, children are not restricted to one set response, but, rather, are able to make use of different aspects of the situation to maintain mutual exclusivity.
This view of word-learning biases as default assumptions implies that violations of a constraint in a child's lexicon do not necessarily invalidate the constraint. The existence of violations is not sufficient to show that children lack the bias. How the interpretation was arrived at is what is at issue, not only what was acquired. For example, by postulating the whole object assumption, one is not committed to a position that says children are incapable of learning property terms and that if one finds an adjective in a child's vocabulary the constraint is disproved. Rather, the argument is that object labels will typically constitute children's initial hypotheses upon hearing a novel word, and in order to learn property terms children must override that initial bias. Similarly, to claim that children are biased to treat object labels as mutually exclusive is not to claim that they can never learn more than one label for the same object. The test of the hypothesis requires examining the order of hypotheses children consider to see whether they resist violating mutual exclusivity on first hearing a novel word. If a child's initial hypothesis reveals an attempt to preserve mutual exclusivity then that would argue in favor of mutual exclusivity as a constraint on word learning even if the child is ultimately successful at overriding the constraint. For this reason, examining children's lexicons for counterexamples is not the appropriate test of whether children are guided by these assumptions. The lexicon reflects the conclusion of some process of word learning and not the process itself. We cannot judge what hypotheses children may have begun with when the only record is the end product, the words they have learned.
From a large body of such experimental evidence, Spelke (1990) concludes that infants possess a primitive theory of the physical world that is guided by three constraints on the behavior of physical bodies: a cohesion constraint which states that objects move as wholes; a boundedness constraint which states that objects move independently of each other; and a spatiotemporal continuity constraint which states that objects move on connected paths. These constraints guide babies perception or interpretation of scenes by serving as criteria against which to identify objects found embedded in complex, cluttered, and changing arrangements that are typical of real- world scenes. Thus, objects may have a privileged conceptual as well as perceptual status in very young babies. The whole object assumption in word learning might, then, directly reflect this nonlinguistic status of objects.
In this vein, Gentner (1981; 1982) argues that the reasons children learn nouns before verbs stems from conceptual differences between object reference and relational meaning. She argues that the representation of nouns is more "dense" than that of verbs, where density refers to the ratio of number of internal links and the number of components linked, including external links. This implies that the meanings of nouns are more redundant and overdetermined than the meanings of verbs. Objects are argued to be conceptually cohesive and readily perceived as wholes. Thus determining the boundary of object reference should be straightforward. In contrast, relational terms such as verbs refer to concepts that more abstract, less cohesive, and whose boundaries are less clear. Verbs are more likely to alter their meanings as a function of context than are object terms, whose meanings are more stable. Children about to learn language will have already parsed the world into objects. According to Gentner "since the language they are about to learn will have been constrained to make the same mapping between perceptual field and linguistic description, the child need only match these preconceived objects with co-occurring words." In contrast, "for verbs and other predicate terms, the child must come to discover which elements of the perceptual field can be combined and lexicalized."
The evidence from which Gentner draws these conclusions comes from a variety of different studies. In one study, the number of word senses was tallied for each dictionary entry for the 20 most frequent nouns and verbs. As predicted, there were more senses per verb than per noun. In a second study, subjects were asked to paraphrase metaphorical senses such as "The lizard worshiped." In their paraphrase, subjects were more likely to change the meaning of the verbs than the nouns. In a third study, bilingual subjects were asked to translate English sentences into their second language, and other bilingual subjects then translated these sentences back into English. More of the original nouns than the original verbs reappeared in these second translations. On a variety of measures, nouns were found to be recalled more accurately than verbs. Most importantly, there is cross-linguistic evidence from languages as diverse as English, Mandarin Chinese, and Kaluli, that nouns are learned more readily than verbs and comprise more of children's vocabulary (Gentner 1982).
Maratsos (in press) elaborates Gentner's position while considering how children acquire grammatical form class. He argues that nouns are a family resemblance category partly defined by semantic properties and partly defined by structural properties, and that concrete objects comprise the semantic core of the family resemblance definition. Maratsos further argues that nouns are the only candidate for a universal form class.
Adjectives may contrast with nouns in some of the same ways that verbs do. Adjectives may be more prone to adjust their meaning according to context than are nouns. For example, the meaning of "good" is adjusted to fit the category it modifies ("good person versus "good knife" (Katz, 1964)). "Large" interacts with what it modifies ("large house" versus "large mouse"), as does "red" ("red hair" versus "red apple"). Bolinger (1967) argues that, in general, the interpretation of an adjective varies, sometimes dramatically, depending on the noun it modifies. For example, "criminal" means roughly "defending criminals" in "criminal lawyer" but "committed by criminals" in "criminal act." In these examples the noun refers to an object; the adjective presupposes that object for its interpretation.
Analogously, adjectives adjust their form in some languages, depending on the noun they modify. For example, adjectives in French must agree in number and gender with the noun they modify. In fact, it is a language universal that, of all languages in which the adjective follows the noun, "the adjective expresses all the inflectional categories of the noun whereas the noun may lack overt expression of one or all of these categories" (Greenberg, 1966). Again, this implies that nouns have a fixed form independent of any modifier they receive, whereas adjectives presuppose a noun and must adjust their form to correspond to the inflections of that noun. In Markman (1989) I summarize some experimental findings with nouns and adjectives that parallel some of those of Gentner's with nouns and verbs.
Another way in which object reference might be primary can be derived from work on holistic vs analytic strategies for determining category membership (Kemler, 1983; Smith, 1979; and Kemler-Nelson, 1984). Children have been shown to be less capable of analyzing objects into their component features. Yet, this difficulty in analyzing objects could benefit children in a number of ways (see Markman, 1989). One is that it could simplify the induction problem in word learning. The problem is to understand how children can so quickly settle on object labels as the referent for terms, given that a label could, in principle, refer either to an object or to its color, size, shape and so on. To think of hypotheses such as color, size and shape, however, requires analyzing the object into those dimensions in the first place. Thus, these competing hypotheses may not be so readily available to the young child. Another way of stating this is that a limitation on children's information-processing abilities may actually provide part of the solution to the induction problem. This is reminiscent of an argument put forward by Newport (1984) to explain how it is that children are so competent at acquiring language. Newport raises the possibility that some cognitive limitations of children may actually work to their advantage in learning language. Here too, in the domain of acquiring object labels, a limitation may work to children's advantage. Their limited analytic abilities may effectively narrow down the hypothesis space that they need to consider. The hypothesis space that children at first generate would not expand to encompass all of the possible features and Boolean combinations of features. By not generating the hypotheses in the first place, children do not have to subsequently rule them out. Thus children's holistic processing could also contribute to the whole object assumption.
There is a variety of evidence, then, suggesting that the conceptual status of objects is privileged relative to that of properties or relations. The whole object assumption could then be a direct reflection of the non-linguistic status of objects. In other words, children treat labels as referring to objects because typically objects are most salient. On the other hand, imagine a child watching a colorful pulsating neon object twirling around in an interesting way. Under these conditions, it is very likely that the color and motion of the object would be attended to as well as the object per se. If so, what would happen when the object was labeled? Would the child interpret the label as referring to the brilliant color, the pulsating rhythm, the interesting motion, or to the object itself? The whole object assumption predicts that children should interpret the novel label as a label for the object even in cases where the nonlinguistic salience of properties is greater than that of the object. In other words, the constraint is presumed to operate in language learning even when it fails to coincide with what is salient nonlinguistically. Although the whole object assumption has not yet been subjected to this stringent a test, Baldwin's (1989) and Backscheider and Markman's (1990) studies suggest that in those cases where color or a dynamic activity are made salient to children, children will still interpret the label as a label for objects. Perhaps the whole object assumption in word learning capitalizes on a cognitive bias to parse the world in terms of objects and that labeling objects may exaggerate this more general tendency, strengthening it enough to promote objects to the preferred interpretation of a novel label even in those cases where properties are otherwise more salient.
The widespread use of the taxonomic principle in governing inductive projections might provide a possible account of the origins of the taxonomic constraint in word-learning. Word-learning might exploit the basic principles of inductive generalization. Here is a schematic description of inductive projection that might help to draw the parallel to word learning. (This sketch simply stipulates that generalization occurs and begs the central problem of induction which is why only a small set of the logically possible inductions are ever made (Goodman, 1955; Quine, 1960).
In inductive projection, a property is attributed to an object Other objects of like kind are then assumed, with some degree of confidence, to also possess the property, while strong associates of the target object are not assumed to have the property.
In word learning, an object is named. Other objects of like kind are then assumed, with some degree of confidence, to also have the same name, while strong associates of the first object are not assumed to have the same name.
If the taxonomic constraint in word learning is derived from the taxonomic basis of inductive projection, then that might help explain Premack's (1990) finding that language trained chimps also show the taxonomic assumption. Chimps (language trained and normal) were given a match-to-sample task where they could choose either a taxonomic or thematic associate of a target object. For example, if the target were a lime, then the two choices would be another lime (the taxonomic associate) and a knife (the thematic associate since these chimps enjoy cutting limes with knives). Juvenile chimps select the thematic associate (e.g., the knife) at above chance levels. Young chimps, then, show the thematic bias seen in young humans. When the target object is labeled, either with a well-known word ( e.g. the plastic symbol for lime) or a novel word (a novel plastic symbol), then the language trained chimps now select the object of like kind (e.g., another lime) rather than the thematic associate. The chimps without language continue to select thematically.
One interpretation of these findings is that the taxonomic assumption is tied to word-learning but that it is not species specific. Premack rejects this interpretation however. He argues that these chimps, who were only at the beginning stages of word learning, could not be said to have a real word, in the sense of understanding reference. Instead, they have learned some non-linguistic contingency, such as "in the presence of this symbol, one can obtain a lime." This then becomes generalized to something like "when a symbol is presented along with an object, one can obtain that kind of object in the presence of the symbol." Premack concludes that if a weaker explanation is possible for young chimps, it may also be possible for children and that this therefore undermines the evidence that children use word-learning constraints.
Suppose Premack is correct and that these chimps do not know words per se. One way of interpreting these findings is that the chimps have solved this discriminative learning task following more general inductive principles. The discriminative stimulus signals that objects of like kind will be available rather than associates. In other words, this could be an instance of the taxonomic basis for inductive generalization. What about Premack's conclusion that this kind of task could not, then, provide evidence for word-learning constraints in children? Here I don't agree unless Premack means only that the taxonomic assumption is not limited to word-learning. More general constraints may enable children to figure out what words refer to as well as domain specific constraints (if any exist). Word learning is governed by the taxonomic bias whether the bias is domain specific or not.
Although the taxonomic assumption is seen both in word learning and in inductive projection of properties from one object to another, there are important domains where taxonomic inferences are avoided. One of these is causal reasoning. In general causes and effects are not thought to be similar. In inferring a cause from an effect or an effect from a cause, one would typically avoid looking for objects or events of like kind. Suppose for example, that someone saw a popped balloon and needed to infer what caused it to pop. Another balloon, popped or otherwise, would be a very unlikely candidate for the cause. Conversely if someone viewed a pair of scissors opening and closing and had to predict what they might do, causing another pair of scissors to open and close would be a very unlikely inference. Thus, one fundamental and pervasive domain, causal reasoning, eschews generalization based on the taxonomic assumption.
Classical conditioning provides an example of a kind of learning that relies on thematic rather than taxonomic associations. In classical conditioning, after repeated pairings with an unconditioned stimulus, a conditioned stimulus becomes capable of eliciting a conditioned response. Dogs salivate at the sound of a bell or the sight of a bowl after the bell or bowl has become associated with food. The conditioned stimulus is a temporal or spatial associate of the unconditioned stimulus. Although a bowl and meat, for example, are strongly associated, they are not things of like kind.
Pettito, in this volume, compares the principles that govern the extension of signs that are words in sign-language to nonlinguistic gestures. She reports that signs conform to the taxonomic principle, as expected, but that nonlinguistic gestures do not.
In sum, properties learned of one object are generalized to objects of like kind rather than to strongly associated objects. This widespread use of the taxonomic assumption in governing inductive inferences might be the origin of its use in word learning. Although the taxonomic assumption is clearly not limited to word learning, there are important domains, such as causal reasoning and classical conditioning, where inferences and associations are more likely to be based on spatial or temporal contiguity and causal relations rather than taxonomic relations.
Other Linguistic Constraints. One possibility is that mutual exclusivity is not limited only to word learning but is used more broadly in language acquisition. Or to be more accurate, mutual exclusivity would result from a more general principle, either a one-to-one mapping principle (Slobin, 1973; 1977) or the uniqueness principle (Pinker, 1984; Wexler & Culicover, 1980) applied to word learning. Slobin's (1973, 1977) one-to-one operating principle of language acquisition is that children expect the organization of language to be clear. One way of accomplishing this is for languages to establish a one-to-one mapping between the underlying semantic structures and the surface forms. Although this principle was formulated for morphemes in a sentence, if it were extended to category terms it would result in the mutual exclusivity of the terms. That is, each category term would be referred to by only one category term.
The uniqueness principle (Pinker, 1984; Wexler & Culicover, 1980) is another related principle that has been hypothesized to help account for the acquisition of syntax. The motivation for this principle is to help account for how children can acquire grammatical rules in the absence of negative feedback (Pinker, 1984; Wexler & Culicover, 1980). If children are not informed that a given grammatical rule they have hypothesized is wrong, how could they reject erroneous hypotheses and settle on the correct grammar for their language? Pinker (1984) argues following Wexler and Culicover (1980) that, in some cases, the need for negative evidence can be eliminated if the child assumes the uniqueness principle. That is, when the child is faced with a set of alternative structures fulfilling the same function, he or she should assume that only one of the structures is correct unless there is direct evidence that more than one is necessary. The principle allows the child to reject structures even when there is no negative feedback indicating that they are ungrammatical. Languages do violate the uniqueness assumption to some extent. According to Pinker, the child requires more evidence to accept a construction with a uniqueness violation than a construction that does not violate the assumption.
Mutual exclusivity is consistent with the uniqueness principle as applied to category terms. As in the domain of syntax, if children start out biased to assume that terms are mutually exclusive, then they should require more evidence to accept a construction, such as class inclusion, that violates mutual exclusivity than a construction that is consistent with it. There is one major difference in the rationale for the uniqueness principle and that for mutual exclusivity. The major impetus behind postulating the uniqueness principle is the problem of lack of negative evidence in the acquisition of syntax. There is little evidence that parents or other adults explicitly correct children's ungrammatical sentences and even some evidence that they do not (Brown & Hanlon, 1970). A theory of the acquisition of syntax cannot depend on children getting explicit correction when their constructions are wrong. Moreover, even when children do receive negative feedback, there is still a serious problem left of interpreting what aspect of an utterance is being criticized (Bowerman, 1987; Gleitman, 1981). The situation is different for the acquisition of the vocabulary of a language. Children are frequently provided with corrections to wrong labels for objects or other mistaken use of terms: e.g., "It looks like a dog but it is a wolf." "That's not a dump-truck it's a fire-truck." How widespread such corrections are for children, and how children make use of these corrections, is not yet known, but the situation is clearly different from that of the acquisition of grammar. Nevertheless, children may still be able to make use of the mutual exclusivity assumption in an analogous way to that of the uniqueness principle: allowing them to to reject certain hypotheses about a word's meaning because it would violate mutual exclusivity, even if no negative evidence were provided.
Essentialist Biases. Mutual exclusivity may not be limited to language, however. It could derive from children's beliefs about objects and not just their beliefs about object labels. Children might believe that an object has only one identity, that is, it can be only one kind of thing, and that its identity is revealed by object labels. The mutual exclusivity of object labels would then be derived from this more basic essentialist belief about objects. Similarly, Flavell (1988) argues that young children assume that each thing in the world has only one identity, an assumption that adults may share. Unlike adults, however, children do not understand that each thing may, nevertheless, be represented in more than one way. According to Flavell (1988), this limitation on multiple representation is revealed in a number of diverse tasks, including visual and conceptual perspective taking and understanding the appearance reality distinction, along with assuming mutual exclusivity of category terms.
A domain general constraint on systematization. A final possibility is that mutual exclusivity or some more general principle that subsumes mutual exclusivity is a domain general constraint--appearing widely in various manifestations across many diverse domains. Here are some examples from classical conditioning and social psychology.
Blocking and overshadowing in Classical Conditioning. Two well-known phenomena in classical conditioning, blocking and overshadowing, are governed by some principle that appears related to mutual exclusivity in that animals seem best prepared to learn only one conditioned stimulus (CS) for a given unconditioned stimulus (US). Whether blocking and overshadowing are the results of attention, learning, or performance is still under debate (See Gallistel, 1990; Kehoe, 1987; and Macintosh, 1975 for reviews and discussion). Whatever the process, its result suggests something akin to mutual exclusivity in that a previously learned or salient CS seems to preempt an animal's learning or use of another CS. In both blocking and overshadowing, a candidate CS, say a tone, that would ordinarily serve as a perfectly learnable CS is rendered ineffective by the presence of another CS, say a light. In blocking an animal is first presented with one CS until conditioning occurs. For example, a light would be repeatedly paired with a US until the animal shows signs of anticipating the US on seeing the light. In the blocking phase of the experiment, the first CS, the light, is now presented along with a second CS, for example a tone. The light and tone are then repeatedly paired with the US. In this situation, despite the repeated pairing of the tone with the US, the animal does not become conditioned to the tone. That is, the animals do not appear to anticipate the US when the tone alone is presented. Yet, animals have no trouble becoming conditioned to tones in a simple conditioning procedure. Having learned one CS appears to block learning of a second CS.
In an overshadowing procedure, from the start of conditioning two stimuli, for example a light and a tone, are presented together during training. Either one of these stimuli presented alone would readily be conditioned to the US. However, when both CSs are presented together and thus both paired equally often with the US, animals become conditioned to only one of them and show no signs of conditioning or diminished conditioning to the second. For example, animals who experienced repeated pairings of light and tone with a US, might learn that the light predicted the US but not show any signs of learning that the tone did. Presumably the more salient of the two CSs overshadows the the other and reduces or prevents learning of a second CS to the US.
One extremely detailed and general model has recently been proposed by Gallistel (1990) to account for a wide variety of phenomena in classical conditioning including overshadowing and blocking. One component of the model consists of three constraining principles: "additivity," "inertia," and "uncertainty minimization". According to Gallistel: "The first two principles may be summarized by saying that unless it has to, the system does not entertain solutions in which CSs act interactively (conjointly) or in which different rates are ascribed to a CS during different epoches." The Uncertainty Minimization principle "comes into play in situations in which there is more than one way (usually an infinite number of ways) to apportion the observed rates of US occurrence to the influence of different CSs in a manner consistent with the first two constraints." The uncertainty minimization principle "drives the ascriptions to the solution that minimizes the number of CSs to which a nonzero influence on the rate is ascribed." Together then, these principles yield a bias towards simplifying the possible associations.
Attribution theory: the discounting principle. The discounting principle in attribution theory provides an example of a principle similar to mutual exclusivity operating in yet another domain. Here the problem domain is how people come to infer causes of their own or others' behavior. Kelly (1973) proposed that people reason about causes of behavior in accord with the Discounting Principle: "The role of a given cause in producing a given effect is discounted if other possible causes are also present." Similarly Kanouse (1971) concludes that "individuals may be primarily motivated to seek a single sufficient or satisfactory explanation for any given event, rather than one that is the best of all possible explanations. That is, individuals may exert more cognitive effort in seeking an adequate explanation when none has yet come to mind than they do in seeking for further (and possibly better) explanations when an adequate one is already available. This bias may reflect a tendency to think of unitary events and actions as having unitary (rather than multiple) causes...." This analysis led Kanouse to conclude there is a "primacy effect" in the formation of attributions such that one stable attribution tends to preempt or preclude others by making individuals relatively unresponsive to new information (Kanouse, 1971).
The best known example of the discounting principle at work in children comes from the studies comparing intrinsic and extrinsic motivation and studies of overjustification (Kassin & Lepper, 1984; . Lepper, Sagotsky, Dafoe, & Greene, 1982; Lepper, Greene & Nisbett, 1973). This work has documented that extrinsic rewards can undermine a child's intrinsic interest in an activity. Under some circumstances, rewarding children for performing tasks they enjoy can reduce children's interest in the activity. The idea here is that children seek a cause for their behavior. When engaged in an activity, for example, coloring with markers, they could assume they color because they like to or that they color for some extrinsic reason. If children are provided with a salient, compelling external reason for engaging in an activity, then they will accept that as the cause for their behavior and discount the intrinsic value of the activity. For example, Lepper et al. (1973) selected children who showed a high level of interest in a given activity, such as coloring with markers, and assigned them to one of three conditions. In one condition, children were informed that if they performed the activity they would receive a "Good Player's Award." This was the experimental condition. There were two control conditions. In both of the control conditions children were invited to engage in the activity and no mention was made of a reward. In one condition children were unexpectedly given the Good Player's award at the conclusion of the session and in the other they were not. Two weeks later the children were again observed at play in the classroom and the amount of time they spent engaged in the various activities was measured. The children who had expected a reward engaged in the activity significantly less than children in the other two conditions. Thus, providing children with a compelling external cause for their behavior reduces their intrinsic interest in the task. Although the discounting principle is revealed in children's loss of interest in a task, young children may not have an explicit awareness of the principle and may not predict or explain other children's performance in accord with the principle (see discussion by Kassin & Lepper, 1984). Given that many parents would find this principle counter-intuitive and continue to reward their children in ways that often backfire, it is not surprising that young children fail on some tasks that require explicit knowledge of discounting.
Similarities and differences across domains. One similarity between mutual exclusivity in word learning and similar principles in classical conditioning and social attribution is that they are fallible. One objection raised against the argument that mutual exclusivity serves as a guiding principle of word learning is that languages do not completely conform to mutual exclusivity. Investigators have questioned why a child should be equipped with a principle that is wrong. Languages have synonyms, hypernyms, and overlapping terms all of which violate mutual exclusivity. I have argued that the advantages of mutual exclusivity outweigh these disadvantages. Although mutual exclusivity is not an infallible assumption, it is a useful one. Like mutual exclusivity in word learning, the simplifying principles used in classical conditioning and social attribution, though useful to the organism, are sometimes wrong. An animal in a blocking or overshadowing procedure is failing to learn about another reliable predictor of an event. People's motivation for doing things is multiply determined, yet given one sufficient explanation for someone's behavior we ignore others. In these domains as well as word learning the simplification may effectively discover and maintain simple relations at the expense of some loss of information.
Another similarity between mutual exclusivity in word learning and these related principles in other domains is that none of the principles is absolute or inviolate. When the evidence is irreconcilable with the simplifying assumption, animals and people will violate the assumption and construct a more complicated solution to the problem.
There are several differences in the way these simplifying principles are stated in other domains that suggest possible reformulations of the mutual exclusivity assumption. Mutual exclusivity in word learning leads children to prefer only one object label for a given object. Gallistel's model of classical conditioning leads animals to seek the fewest possible solutions, but not exactly one. Moreover, in both classical conditioning and in social attribution, the difference in learning is sometimes a matter of degree rather than complete rejection of a second solution. Thus one may discount a second cause for behavior by underestimating it rather than ignoring it completely. Overshadowing and blocking sometimes diminish the strength of the association that is learned rather than preventing it completely. We have suggested that mutual exclusivity is a default assumption in the sense that it is a probabilistic assumption that can be overridden, but have proposed that children expect each word to have only one label (rather than a bias to minimize the number of labels) and that children will reject a second label, rather than learn it less well. These alternative formulations might be worth considering, however.
In sum, mutual exclusivity or some more general principle that subsumes mutual exclusivity, may be a widespread strategy for first systematizing a new domain of knowledge. Karmiloff-Smith (1975, 1979) and Carey (1978) have argued that children may begin acquiring knowledge in a domain by learning basic concepts in relative isolation, but after awhile are driven to try to organize and systematize their knowledge. Mutual exclusivity is a simple, primitive form of systematization. Basically it works to keep relations between elements distinct and to maximize predictability from one element to another. Perhaps there is some very general oversimplification principle that operates across domains that serves to exaggerate regularities. Given limited resources, one way to impose order on a complex domain would be to ignore subtleties and complexity in favor of establishing some order and predictability. This would lead the learner to expect strong correlations between elements in a domain and to ignore or reject counterexamples and exceptions. Once regularities are discovered, they can serve as a scaffolding which can be elaborated and modified to incorporate exceptions and inconsistencies. The mutual exclusivity bias in language, then, could be one instantiation of a widespread attempt to find simple, regular, relations between elements in a domain. Mutual exclusivity, attribute discounting, blocking and overshadowing all cause animals and humans to overlook some regularity, to fail to learn something that could be useful. They are not flawless, but given limited resources may be highly adaptive.
In sum, the whole-object, taxonomic, and mutual exclusivity assumptions vary in their degree of generality across domains but none of them are specific to word learning. Knowing what other domains appear to be governed by principles similar to the ones postulated to guide word learning, may provide some insight into the possible origins of these constraints. I turn now to discussion of issues that arise in considering the possible origins of word-learning assumptions.
Take the taxonomic assumption as an example. There are many possible accounts of its origins as a word-learning constraint besides the extreme views of either being hardwired or being acquired by an unconstrained learning-mechanism. As argued earlier, inductive projection of properties is also governed by a taxonomic principle of generalization. Word-learning might be interpreted by the child as just another instance of inductive projection and thus be governed by the taxonomic constraint. On this model, a more general constraint is simply applied to the specific case of word learning. The question of the origins of the constraint arise here too but now with respect to the taxonomic principle of generalization in inductive projection. There is surely a biological basis for this kind of generalization in animal and human learning. But this account would not require any specially evolved mechanism for word learning per se. It does require that children construe word-learning as another case of inductive projection however. Another possibility is that enough exposure to language is required to enable the system to trigger the taxonomic assumption. That is, once some words are learned, on the basis a few examples of generalization that honor the taxonomic principle, it could then function as a general assumption about how words are extended.
Innate vs Learned reduced to age of onset. Second, the innate-learned debate sometimes becomes treated as an argument only about age of onset. Although age of onset may serve as a rough guide as to what kinds of experience are likely, it hardly resolves this issue. Innate abilities can be late emerging, and learned abilities can be acquired early. No one would seriously question the biological basis of puberty even though it does not occur early in life. Some constraints might appear later than others because some preconditions need to be met before they are relevant, rather than because they need to be learned. The mutual exclusivity might be one such case. Mutual exclusivity may be a simple way to begin to systematize a lexicon. As such, one could imagine a delay in the appearance of mutual exclusivity until enough of a lexicon has been acquired to warrant systematization. On this account, mutual exclusivity would not be expected to be used by babies acquiring their first few words. Such a delay in the onset, however, would not be evidence that mutual exclusivity is learned.
Assumption om Homogeneous Abilities. A third way that the innate-learned dichotomy is problematic is that it seems to presuppose that a given constraint is a simple homogeneous ability. In contrast to this view, a given constraint might instead recruit components from a variety of sources. The resulting constraint in word learning would then reflect the convergence of several components each with its own developmental history. For example, consider the whole-object and taxonomic assumptions which operate at first in ostensive definition. Someone points to an object and labels it. The child takes the label to refer to the whole object and extends it to other objects of like kind. Ostensive definition has as one component ostension. Someone points to an object and the child's attention is drawn to that object. Ostension itself may be an important component of these constraints. Although it is well-documented that babies can follow the direction of pointing and eye-gaze, Churcher & Scaife, 1981; Murphy & Messer, 1977; Scaife & Bruner, 1975) there are as yet no studies of exactly what the baby attends to or infers the adult is attending to. Perhaps the only information contained in pointing per se is information about direction. The baby would then attend to whatever interesting was found in that location--an event, interesting pattern, noise, or even the location itself. Or it could be that something like the whole object assumption operates in ostension leading babies to search for an object per se as the referent of the point. If so, then to understand the origins of the whole-object assumption in word learning, we would need to understand its development in the context of nonlinguistic ostensive acts. Similarly, ostensive definition involves labeling at the time of pointing and labeling takes the form of some auditory stimulation. We know from studies of intersensory stimulation that babies attention to what they are looking at is heightened if a moderate noise is heard at the same time (Mendelsohn & Haith, 1976; Paden, 1975; Self, 1975). Maybe some of the power of the whole object assumption to direct and sustain babies attention to objects comes from its being an instance of intersensory stimulation (see Baldwin & Markman, 1989). If so, then the origins of intersensory development would be relevant to the question of the origins of the whole object assumption. Similarly, labeling phrases addressed to babies are carried by distinctive motherese intonation contours (see Fernald, in press-a for a review). At the start of language learning babies may be interpreting these intonation per se rather than understanding any of the words. According to Fernald, the distinctive pattern that accompanies labeling serves to arouse babies and help sustain their attention. How human babies come to respond to differences in intonation would then be relevant to this question about the origins of the whole-object assumption as well. In sum, questions about the origins of word-learning constraints may require a subtle and complex analysis about their components. To formulate this question as a simple innate-learned dichotomy obscures rather than clarifies the issue.
Learning as one of many possible solutions. Ethologists view learning as one possible solution to a biological problem. It will be a successful biological adaptation in some circumstances but not others. One example used to illustrate this (Rozin & Schull, 19xx) is to consider how animals come to recognize other members of their species--an ability essential for mating. Imprinting, a learning mechanism that solves this problem for some birds, would not work for cuckoos who deposit their eggs in the nests of other birds and so are reared by other species. Imprinting, or any other learning mechanism that depends upon early exposure to one's own species, is impossible under these circumstances. Thus species recognition in the cuckoo is largely innate.
Foraging provides many examples of the range of solutions available for solving the same biological function in different species. Animals need mechanisms for extracting information from the environment about the location and abundance of food. They need to know where and how to search for food, what and how much to eat. These answers often cannot be hardwired--the location of food in the environment often changes, the availability of different foods fluctuates and so forth. Different species have solved these problems using a wide range of mechanisms including simple reflexes where, for example, how much food is in the stomach of a mantid determines how near a fly has to be before the mantid will strike, (Shettleworth, 1984) to sensitivity to operant reinforcement schedules, to elaborate communication systems such as the dance of the bees that specify the direction, distance, and kind of food (von Frisch, 1967). Abilities needed to forage range from simple memory or taste of what an animal has just eaten, to long term memory of where food has been stored, from simple reflexes to strike at fixed perceptual patterns to complex spatial representations of a changing environment (Rozin & Schull, 19xx; Shettleworth, 1984). In sum, there are a variety of mechanisms available to solve the same biological function of ensuring adequate nutrition. Learning may have evolved to ensure successful foraging for some animals and not others.
Redundancy of Mechanisms. Another characteristic of biological solutions to important problems is that there tend to be multiple mechanisms for achieving the same end (Rozin, 1976; Rozin & Schull, 19xx; Shettleworth, 1983). Equipping an animal with redundant means of achieving the same outcome offers a greater likelihood of success because any single mechanism has some probability of failing. Several mechanisms can work together to achieve a given end as when motivational, attentional, and learning abilities converge on a given solution. Alternatively, some mechanisms can serve as backup when dominant means fail. The example described earlier of how migratory birds navigate by magnetism only when the position of the sun is unavailable is one such example (Keeton, 1974; Shettleworth, 19xx).
Questions raised by an ethology of learning. An analysis of learning from an ethological perspective raises many issues not normally addressed by psychologists. In arguing that learning itself should be examined from an evolutionary perspective, Shettleworth suggests the following issues be considered:
Why learn in a specific case as opposed to relying on other solutions, including that of not making the adjustment at all?What to learn? For example, when an individual must learn to recognize individual conspecifics or the approach of a predator, what cues does he use? Are they the optimal ones in the sense of being the most reliable predictors in the situation?
When to learn? Does learning begin immediately on first exposure to a situation? Does general learning about the environment occur during periods of 'sampling' or play and get put to use when it is needed?....
How to learn? Trial and error, imprinting, observation, association: is a particular process the only one that can solve the problem, or if several might serve the purpose, what determines which one a given species uses?
How fast to learn and how long to remember? (Shettleworth, 1984, p. 429)
Why Learn? Following Shettleworth's suggestion, we should consider why words should be learned at all. Why not have an innately determined vocabulary? For such an evolutionary change to take place, there would have to be extraordinary stability and regularity in the environment in what kinds of things exist and are worth naming. Such things do exist: the sun and the moon, day and night earth and sky, man, woman, boy, and girl, food and water, walking and running, laughing and crying, are universal and certainly stable over long periods even in evolutionary time. Other natural kinds are possibilities although only when specified very generally: plants, animals, and food are universal but the specific kinds of plants or animals found in a given location vary. So "plant" could be an innately determined word but not "tree" or "cactus." Thus, in principle, there are candidate words that could have been innately determined. Obviously, such a limited vocabulary would be drastically impoverished compared to the vocabulary of natural languages. One of the hallmarks of human languages is the fantastic range, richness, and diversity of the concepts that can be expressed. The vocabulary readily expands to include cultural artifacts and inventions, abstract concepts, and to distinguish between concepts with great subtlety and precision. As innovations occur, or ideas develop, they can be incorporated into the language. Another problem is that if languages were limited to forming propositions only about the absolutely universal concepts, such as sun and moon, walk, run, sleep, and so on, it would be a waste of the enormous expressive power of language. Given such limitations on the notions that could be expressed, there would seem to be little use for the complexity and richness of the grammars that characterize human languages. A number of much more limited signal systems might suffice if the range of concepts to be expressed were so limited. Further, to allow languages to contain even such basic words as "apple," "tiger," "cup," or "arrow" requires that words must be learned. If the majority of the vocabulary, or even some of it is learned then a learning mechanism must be available. A mixed system might be possible with some words innately determined and others learned. But if an effective learning mechanism existed, it would likely obviate a need to have any vocabulary innately specified. (See Pinker and Bloom (1989) for other ideas about why sound-meaning correspondences should be learned).
How to learn. A second question one might speculate about is how to learn. Why would one kind of learning mechanism be preferred or more effective or selected over another. It is important to address this issue without falling into the trap of reasoning from how languages are as to how they must be. Facile ad hoc evolutionary speculations are easy to generate (see Fernald, in press-b and Rozin and Schull, 19xx for interesting discussions of this problem). Given what we know about language we could speculate about how it could be learned which is very different from arguing from basic principles about why one mechanism would be selected over another. The vocabulary of natural languages as we know them has two fundamental requirements that present conflicting goals: First, as mentioned earlier, one of the most distinctive characteristics of human language is the enormous range of ideas and concepts that can be expressed as well as the potential to expand to express as yet undiscovered or uninvented objects, events, and ideas. So one goal of a learning mechanism would be to enable humans to acquire this variety of words. On the other hand, there is the problem that if the hypothesis space for what a given word might mean were unconstrained, word-learning would not be possible---especially for 18-month-olds. There is tension then between the need for openness and flexibility in learning words and the need to solve the inductive problem. Certainly there are many ways, in principle, of resolving this tension. In other words, these goals do not dictate a single evolutionary solution. Treating constraints on word learning as default assumptions is one possible solution and fits with the existing evidence. Default assumptions provide some balance between constraining the hypotheses to simplify the task, especially at first, but still allowing flexibility.
Begin with what is fundamental. Another principle that might prove useful in considering which word-learning biases might be most useful, would be to build in a bias to begin with what is likely to be most fundamental. Objects are likely candidates for a variety of reasons. As I have already argued, because objects are so salient and basic to human perception and cognition, they are fundamental to any vocabulary and provide anchor points from which to elaborate the lexicon. Once object labels are known they along with mutual exclusivity provide a focus from which to expand to terms that refer to parts, substance, color, and so on. Another related point is that objects may be presupposed in many ways by other concepts. This can be seen in analyses of grammatical structure which incorporate a predicate-argument structure (Maratsos, in press). Predicates contain terms which refer to properties and relations while arguments consist of the things to which the properties and relations apply. Thus, predicates presuppose arguments: properties and actions are most naturally conceived as properties or actions of something. One can think of a man without thinking of running but it is hard to think of running without a runner. On Maratsos' (in press) argument, "concrete objects terms are natural argument head terms. Argument head terms are in turn a universal form class because they contain the concrete object terms, which give the class much of its resultant unity and distinctiveness." If learning mechanisms are established to begin with what is most fundamental, objects would make natural candidates as good first guesses about what words mean.
Redundancy of mechanisms. Some of the controversy surrounding the arguments and evidence for word learning constraints involve claims either that such constraints are unnecessary because the input to the child could accomplish the same ends (Nelson, 1988) or that the constraints are better conceptualized as pragmatic (Clark, 1990; Gathercole, 1987) or social (Nelson, 1988) than as lexical. For example, Gathercole (1987) has argued that the mutual exclusivity assumption falls out of pragmatic rather than lexical principles. On this view, children reject second labels for things based on assumptions about what speakers intend rather than because of a bias against second labels. A child who views a whisk and a ball as an adult says "Please hand me the whisk" would reason "If he meant the ball, he would have said 'ball,' so he must mean the other thing." This would explain why children infer that a novel label refers to a novel rather than a familiar object. However, several investigators report children who deny that, e.g., a poodle is a poodle insisting that it is a "dog," even when adults repeatedly clarify their communicative intent, stating, for example, "Yes it is a poodle." In such cases, the adults make their intent to refer quite plain but children nevertheless persist in rejecting the label. So a pragmatic constraint is not sufficient to account for the mutual exclusivity bias.
In concluding that there is a lexical constraint, however, I do not mean to preclude the possibility of a pragmatic constraint as well. Although I have argued that input or pragmatics is not sufficient to account for the evidence and that constraints on word meaning are needed, this is not to say that other sources of information are irrelevant or unnecessary. Pragmatic and lexical constraints should often converge on the same hypothesis providing redundant sources of information about the meaning of a term. First, the argument for word-learning constraints is an argument for how words are learned--not how they are hardwired. Input is essential. Second, the lesson from the ecology of learning is that multiple mechanisms are to be expected for solving important problems. In the example just provided about mapping novel labels to novel objects there are at least three sources of information a child could use to arrive at the same conclusion: the mutual exclusivity assumption leads children to reject a second label for the familiar object; a bias to fill lexical gaps leads children to prefer a label for an object without a known label; and a pragmatic assumption leads children to infer that if the speaker had intended to refer to the familiar object he or she would have used the familiar label. A fourth converging source of information would be found in more naturalistic situations where the speaker's eye gaze or other ostensive cues indicate the appropriate referent.
In word-learning not only would we expect to find multiple mechanisms but also some degree of coordination among them. Children as listeners would be unlikely to depend heavily on input that speakers fail to provide. Learnability theorists emphasize one aspect of this problem in considering the problem of negative evidence in acquiring syntax. The claim is not that children are never corrected for ungrammatical sentences. The negative evidence argument is that a learning mechanism that relied heavily on feedback about ungrammatical sentences would fail to acquire grammar because such feedback is sporadic, undependable, and in some cases nonexistent.
When considering evolution of learning mechanisms for problems that involve social coordination or communication, there must be some degree of correspondence between the partners of the interaction. This is obvious in courtship displays: the ostentatious display of the peacock would not do much good if the hen wasn't impressed. Bird-song learning provides an example of the importance of coordinating the timing of receptivity to information with the timing of input (Shettleworth, 1984). The adult birds must be singing at the time the young bird is sensitive to the song. To take an example from human communication, consider how mothers use vocalizations to soothe a distressed infant. Fernald (in press-b) reviews evidence documenting three types of acoustic signals which reduce crying: low rather than high frequency sounds, continuous rather than intermittent signals, and white noise. Vocalizations mothers use to comfort an infant are typically continuous, low in pitch, and even provide white noise with "shhhh". Coordination is expected, then, between caregivers and offspring, between males and females in courtship rituals, between adults as providers of information and children as recipients of the information. Fernald (in press-b) presents a very thoughtful analysis of this problem in considering an evolutionary perspective on human maternal vocalizations, documenting the parallels between adult speech to human infants and what is known about the evolution of vocal communication systems in other species.
In the case of word-learning, some degree of coordination between parental input to children and word-learning biases would be expected. Finding, for example, that parents tend to label objects for children does not in itself weaken the claim that children are predisposed to consider objects as the referents of novel labels. Such coordination between learning mechanisms and input is likely. Redundancy poses an experimental problem, however, in that alternative explanations for the same phenomena become possible. Experimental rigor requires that to document a given word-learning bias guides children's hypotheses one must rule out alternative explanations (including input) as possible contributing factors in a given experiment. But this should not be confused with a claim that in naturally occurring word-learning these factors are irrelevant or should should systematically conflict with the postulated word-learning assumptions. On the contrary, redundancy and coordination of word-learning biases and other sources of information should be common. Acquiring vocabulary is essential for language learning and any abilities children have to infer the communicative intent of the speaker, to retain information about past uses of words, to analyze the social situation in which a word is used will be exploited along with word-learning constraints to solve this problem.