Paper presented
at UQàM Summer Institute in Cognitive Categorisation
http://www.unites.uqam.ca/sccog/liens/program.html
http://www.ecs.soton.ac.uk/~harnad/Temp/catconf.ppt
To Cognize is to Categorize: Cognition is Categorization
Stevan Harnad
1. Sensorimotor
Systems. Organisms are sensorimotor systems. The things in the
world come in contact with our sensory surfaces, and we interact
with them based on what that sensorimotor contact "affords."

2. Invariant Sensorimotor Features ("Affordances"). To say this
is not to
declare oneself
a Gibsonian, whatever that means. It is merely
to point out that what
a sensorimotor system can
do
is determined by what can be extracted
from its motor interactions with its sensory input. If you lack sonar
sensors, then your
sensorimotor system cannot do what a bat's can
do, at least not
without the help of instruments. Light stimulation affords color
vision for those of us with the right sensory apparatus,
but not
for those of us who are color-blind. The geometric fact that,
when we move, the "shadows" cast on our retina by nearby objects
move faster than the shadows of further objects means that, for
those of us with normal vision, our visual input affords depth
perception. From more complicated facts of projective and solid
geometry it follows that a 3-dimensional shape, such as, say,
a boomerang, can be recognized as being the same shape Ð and
the same size Ð even though the size and shape of its shadow on
our retinas changes as we move in relation to it or it moves in
relation to us. Its shape is said to be
invariant under these
sensorimotor transformations, and our visual systems can detect and
extract
that invariance, and translate it into a visual constancy. So we
keep seeing
a boomerang of the same shape and size even though the shape
and size of
its retinal shadows keep changing.
3. What is Categorization? So far, the affordances I've
mentioned have
depended on having either the right sensors, as in
the case of
sonar and color, or the
right invariance-detectors,
as in the case of depth perception and shape/size constancy. Having
the ability to detect the stimulation or
to detect the invariants in the
stimulation is not trivial; this is
confirmed by the fact that sensorimotor
robotics and sensorimotor
physiology have so far managed to duplicate and
explain only a small
portion of this subset of our sensorimotor capacity.
But we are already
squarely in the territory of categorization here, for,
to put it most
simply and generally, categorization is any
systematic
differential
interaction between an autonomous, adaptive
sensorimotor system and its world
: Systematic,
because we don't
want arbitrary interactions like the effects of the wind
blowing on the
sand in the desert to be counted as categorization (though
perhaps there
are still some inherent similarities there worth noting). Neither
the wind
nor the sand is an autonomous sensorimotor system; they are, jointly,
simply dynamical systems, systems that interact and change according
to the laws of physics.
Everything in nature is a dynamical system, of course, but some things
are not
only dynamical systems, and categorization refers to a special
kind of dynamical system. Sand also interacts "differentially"
with
wind: Blow it this way and it goes this way; blow it that
way and
it goes that way. But that is neither the right kind of
systematicity
nor the right kind of differentiality. It also isn't
the right kind
of adaptivity (though again, categorization theory probably
has a lot to
learn from ordinary dynamical interactions too, even though
they do not count
as categorization).
Dynamical systems are
systems that change in time. So it is already clear that categorization
too will have to have something to do with changes across
time. But adaptive changes in autonomous systems are those in
which internal states within the autonomous system systematically
change with time, so that, to put it simply, the exact same input
will not produce the exact same output across time, every time, the
way it does in the interaction between wind and sand (whenever the wind
blows in exactly the same direction and the sand is in exactly the same
configuration). Categorization is accordingly not about exactly the
same
output occurring whenever there is exactly the same input. Categories
are
kinds
, and categorization occurs when
the same output occurs with
the same
kind of input, rather than the
exact same input. And
a different output occurs with a different kind
of input. So
that's where the "differential" comes from.
4. Learning. The adaptiveness
comes in with the real-time history. Autonomous, adaptive
sensorimotor systems categorize when they respond differentially to
different kinds of input, but the way to show that they are indeed
adaptive systems -- rather than just akin to very peculiar and
complex configurations of sand that merely respond (and have
always responded) differentially to different kinds of
input in the way
ordinary sand responds (and has always responded) to wind
from different
directions -- is to show that at one time it was not so:
that it did not
always respond differentially as it does now. In other words
(although
it is easy to see it as exactly the opposite):
categorization is intimately tied to learning.
Why might we have
seen it as the opposite? Because if instead of being designers and
explainers of sensorimotor systems and their capacities we
had simply been concerned with what kinds of things there
are in the world, we might have mistaken the categorization
problem as merely being the problem of identifying what exists
(that sensorimotor systems can then go on to categorize). But that is
the ontic side of categories, concerned with what does and
does not exist, and that's probably best left to the respective
specialists in the various kinds of things there are (specialists
in animals, vegetables, or minerals, to put it simply). The kinds
of things there in the world are, if you like, the sum total of
the world's potential affordances to sensorimotor systems like
ourselves. But the categorization problem is not determining
what
kinds of things there are, but
how it is that
sensorimotor systems like ourselves manage to detect those they
can and do detect: how they manage to respond differentially
to them.
5. Innate Categories. Now it might have turned out that we were
all born
with the capacity to respond differentially to all the kinds
of things that we do respond to differentially, without ever
having to learn to do so (and there are some, like Jerry Fodor (1975,
1981, 1998), who sometimes write as if they believe this is
actually the case). Learning might all be trivial; all the
invariances we can detect, we could already detect innately,
without the need of any internal changes that depend on time
or any more complicated differential interaction of the sort
we call learning. This kind of extreme nativism about categories is
usually not far away from something even more extreme than nativism,
which is the view that our categories were not even "learned"
through evolutionary adaptation: The capacity to categorize
comes somehow prestructured in
our brains in the same way that
the structure of the carbon atom came
prestructured from the Big
Bang, without needing anything like "learning"
to shape it. (Fodor's
might well be dubbed a
"Big
Bang" theory
of the origin of our categorization capacity.)
(Chomsky [e.g., 1976]
has made a similar conjecture Ð about a very special subset of our
categorization capacity, namely, the capacity to generate and
detect all and only those strings of words that are grammatical
according to the Universal Grammar underlying all possible natural
languages: UG-compliance is the underlying invariant in question,
and, according to Chomsky, our capacity to detect and generate
UG-compliant strings of words is shaped neither by learning nor
by evolution; it is instead somehow inherent in the structure of
our brains as a matter of structural inevitability, directly from the
Big Bang. This specific theory, about UG in particular, is not to be
confused with Fodor's general theory that
all
categories
are unlearnt and unevolved; in the case of UG there is considerable
"poverty-of-the-stimulus" evidence to suggest that UG is not
learnable by children on the basis of the data they hear
and produce
within the time they take to learn their first language;
in the
case of most of the rest of our categories, however, there is
no
such evidence.)
6. Learned Categories. All evidence suggests that most of our
categories are
learned. To get a sense of this, open a dictionary at random
and pick out a half dozen "content" words (skipping function
words such as "if," "not" or "the"). What you will find is nouns,
verbs, adjectives and adverbs all designating categories (
kinds
of objects, events, states, features, actions). The
question to ask
yourself is: Was I born knowing what are and are not
in these
categories, or did I have to learn it?
You can also ask
the same question about proper names, even though they don't appear in
dictionaries: Proper names name individuals rather than kinds, but for
a sensorimotor system, an individual is effectively just
as much of a kind as the thing a content word designates: Whether
it is Jerry Fodor or a boomerang, my visual system still has to
be able to sort out which of its shadows are shadows of Jerry
Fodor and which are
shadows of a boomerang.
How?
7. Supervised Learning.
Nor is it all as easy
as that case. Consider
the more famous and challenging pronlem
of sorting newborn chicks as
males or females. I'm not sure
whether Fodor thinks this capacity could
be innate, but the grandmaster, 8th-degree black-belt chicken-sexers on
this planet -- of which there are few, most of them in Japan -- say
that it takes years
and years of trial and error training under the supervision of
masters to reach black-belt level; there are no short-cuts,
and most aspirants never get past brown-belt level. (We will
return to this.) Categorization, it seems, is a sensorimotor skill,
though most of the weight is on the sensory part (and the output is
usually categorical, i.e., discrete, rather than continuous);
and like all skills, it must be learned.
So what is learning? It
is easier to say
what a system does when it learns
than to say
how it does it: Learning occurs when a system samples inputs and
generates outputs in response to them on the basis of trial and error,
its performance guided by corrective feedback. Things happen, we
do something in response; if what we did was the right thing, there
is one sort of consequence; if it was the wrong thing there is another
sort of consequence. If our performance shows no improvement with time,
then we are like the sand in the wind. If our performance improves --
more correct outputs, fewer errors --- then we are learning. (Note
that this presupposes that there is such a thing as an
error
,
or miscategorization: No such thing comes up in the case of the wind,
blowing the sand.)
This sketch of learning should remind us of BF Skinner,
behaviorism, and schedules of reward and punishment. For it was
Skinner who pointed out that we learn on the basis of feedback
from the consequences of our behavior. But what Skinner did not
provide was the internal mechanism for this sensorimotor capacity
we and so many of our fellow-creatures have, just as Gibson did
not
provide the mechanism for picking up affordances. Both these thinkers
thought that providing internal mechanisms was either not necessary or
not
the responsibility of their discipline. They were concerned only
with describing the input and the sensorimotor interactions, not how
a sensorimotor system could actually do those things. So whereas they
were already beginning to scratch the surface of the "what" of our
categorization capacity, in input/output
terms, neither was interested
in the "how."
8. Instrumental (Operant) Learning. Let us, too, set aside the
"how" question
for the moment, and note that so-called operant or instrumental
learning -- in which, for example, a pigeon is trained to peck at
one key whenever it sees a black circle and another key whenever
it sees a white circle (with food as the feedback for doing the
right thing and no-food as the feedback for doing the wrong thing) --
is already a primitive case of categorization. It is a systematic
differential response to different kinds of input, performed by an
autonomous adaptive system that responded randomly at first, but
learned to adapt its responses under the guidance of error-correcting
feedback (thanks, presumably, to some
sort of adaptive change in
its internal state). The case of black vs. white
is relatively trivial,
because the animal's sensory apparatus already has
those two kinds of
inputs well-segregated in advance, although if, after training
on just
black and white, we began to "morph" them gradually into one another
as
shades of gray, and tested those intermediate shades without feedback,
the pigeon would show a smooth "generalization gradient," pecking more
on
the "black" key the closer the input was to black, more on the white
key
the closer the input was to white, and approaching a level
of chance performance
midway between the two. The same would
be true for a human being in this
situation.
9. Color Categories. But if the animal had color vision, and we
used
blue and green as our inputs, the pattern would be different.
There
would still be maximal confusion at the blue-green midpoint,
but
on either side of that boundary the correct choice of key and the
amount of pressing would increase much more abruptly Ð one
might
even say "categorically" -- than with shades of gray. The reason
is that
between black and white there is no innate category boundary,
whereas between
green and blue there is (in animals with normal green/blue
color vision).
The situation is rather similar to hot and cold, where
there is a neutral
point midway between the two poles, feeling neither
cold nor hot, and then
a relatively abrupt qualitative difference
between the "warm" range and the
"cool" range in either direction.

10. Categorical Perception. This effect is called "categorical
perception"
(CP) and in the case of color perception, the CP is innate.
Light
waves vary in frequency. We are blind to frequencies above red
(infrared, wavelength about 800 nm) or below violet (ultraviolet,
wavelength
about 400 nm), but if we did not have color CP then the
continuum from
red to violet would look very much like shades of gray,
with none of those
qualitative "bands" separated by neutral mixtures
in between that we all
see in the rainbow or the spectrum. Our color
categories are detected by
a complicated sensory receptor mechanism, not
yet fully understood, whose components include not just light
frequency, but other properties
of light, such as brightness and saturation,
and an internal mechanism
of three specialized detectors selectively
tuned to certain regions
of the frequency spectrum (red, green, and
blue), with an "opponent-process"relation between their activities
(red being opposed to green and blue being
opposed to yellow).
The outcome of this innate invariance extracting
mechanism is that
some frequency ranges are automatically "compressed":
we see them all as
just varying shades
of the same
qualitative color. These compressed ranges are then separated
from adjacent qualitative regions, also compressed, by small,
boundary
regions that look like indefinite mixtures, neutral
between the two
adjacent categories. And just as there is compression
within each color range
there is expansion between them:
Equal-sized
frequency differences look much
smaller and are harder to detect when
they are within one color
category than when they cross the boundary
from one category
to the other.
Although basic color CP is inborn rather than a result
of learning, it still meets our definition of categorization
because the real-time trial and error process that "shaped"
CP
through error-corrective feedback from adaptive consequences was
Darwinian evolution. Those
of our ancestors who could make rapid, accurate distinctions
based on color out-survived and out-reproduced those who
could not. That
natural selection served as the "error-correcting"
feedbackon the genetic
trial-and-error variation. There are probably
more lessons to be learned,
from the analogy between categories acquired through learning and
through
evolution as well as from
the specific features of the mechanism underlying
color CP -- but
this brings us back to the "how" question raised earlier,
to which we
promised to return.
Learning Algorithms. Machine learning algorithms
from artificial
intelligence research, genetic algorithms from
artificial life research
and connectionist algorithms from neural
network research have all
been providing candidate mechanisms for
performing the "how" of categorization.
There are in general two
kinds of models, so-called "supervised" and
"unsupervised" ones. The unsupervised models are generally designed
on the assumption that
the input "affordances" are already quite salient,
so that the
right categorization mechanism will be able to pick them
up on
the basis of the shape of the input from repeated exposure and internal
analysis alone, with no need of external error-correcting
feedback. By way of an exaggerated example, if the world of shapes
consisted of nothing but boomerangs and Jerry Fodor shapes, an
unsupervised learning mechanism could easily sort out their retinal
shadows on the basis of their intrinsic structure alone (including
their
projective geometric invariants). But with the shadows of new-born
chick abdomens, sorting them out as males and females would probably
need the help of error-corrective feedback. Not only would the attempt
to sort them on the basis of their intrinsic structural landscape
alone
be like looking for a needle in a haystack, but there is also the much
more general problem that the very same things can often be categorized
in many different ways. It would be impossible, without supervision, to
determine which way was correct (in a given context, for the right
categorization
can vary with the context: sometimes we may want
to sort baby chicks
by gender, sometimes by species, or something else)
(Harnad 1987).
In general, a nontrivial categorization problem will be
"underdetermined." Even
if there is only one correct solution, and even if
it can be found by an
unsupervised mechanism, it will first require
a lot of exposure and processing. The figure/ground distinction
might be something like this: How, in general, does our visual
system manage to process the retinal shadows of real-world scenes
in such a way as to sort out what is figure and what is ground? In the
case of ambiguous figures such as Escher drawings there may be more
than one
way to do this, but in general, there is a default way
to do it that works,
and our visual systems usually manage to find it,
quickly and reliably for
most scenes. It is unlikely that they learned
to do this on the basis of
having had supervisory feedback on samples of
all the possible combinations of scenes and their shadows.
11. Unsupervised Learning. There are both morphological and
geometric
invariants
in the sensory shadows of objects, highlighted especially
when
we move relative to them or vice versa; these can be extracted
by
unsupervised learning mechanisms that sample the structure and the
correlations (including covariance and invariance under dynamic
sensorimotor
transformations). Such mechanisms cluster things
according to their
internal similarities and dissimilarities, enhancing
both the similarities and the contrasts. An example of an unsupervised
contrast-enhancing and
boundary-finding mechanism is "reciprocal
inhibition," in which activity
from one point in visual space inhibits
activity from surrounding points
and vice-versa. This kind of internal
competition tends to bring into focus
the structure inherent in and
afforded by the input.
Context-Dependent Categorization.
This kind of unsupervised
clustering based on enhancing structural
similarities and correlations
will not work, however, if different ways of clustering the very same
sensory shadows are correct,
depending on other circumstances. To sort
this out, supervision
by error-corrective feedback is needed too; the
sensorimotor
structure and its affordances alone are not enough. We
might
say that supervised categories are even more underdetermined than
unsupervised ones. Both kinds of category are underdetermined, because
the
sensory shadows of their members are made up of a high
number of dimensions and features, their possible combinations
yielding an infinity of
potential shadows, making the subset of
them that will afford correct
categorization hard to find. But
supervised categories have the further
difficulty that there are many
correct categorizations (sometimes an infinite
number) for the very same
set of shadows.

If you doubt this, open a dictionary again, pick any
content word, say, "table," then think of an actual table, and
think of all the other things you
could have called it (thing,
object, vegetable, handiwork, furniture, hardwood, Biedermeyer,
even "Charlie"). The other names you could have given it correspond to
other ways you could have categorized it.
Every category
has both an "extension"
(the set of
things that are members of that category) and an "intention" (the
features that make things members of that category rather than
another). Not only are all things the members of an infinite number
of different categories, but each of their features, and combinations
of features is a potential basis (affordance) for assigning them
to still more categories. So far, this is again just ontology; but if
we return to sensory inputs, and
the problem facing the theorist trying
to explain how sensorimotor systems can do what they do: sensory inputs
are the shadows of a potentially infinite number of different kinds
of things. Categorization is the problem of sorting them correctly,
depending on the demands of the situation.
Supervised learning can
help; if unsupervised learning cannot find the winning features,
perhaps feedback-guided trial and error training will do it, as with
the pigeon's black/white sorting and the chicken-sexing. There
are some supervised learning algorithms so powerful that they
are guaranteed to find the needle in the haystack, no matter
how undetermined it is Ð as long as it is just
underdetermined,
not indeterminate (like the exact midpoint between black
and white)
or NP-complete -- and as long is enough data and feedback and
time (as,
for the language-learning child, there is not, hence the "poverty of
the stimulus"). Our categorization algorithms have to be able
to do what we can do; so if we can categorize a set of inputs
correctly, then those inputs
must not only have the features that can
afford correct categorization, but
there must also be a way to find
and use those affordances. (Figure 1 shows
how a supervised neural net
learns to sort a set of shapes into three categories
by compressing
and separating their internal representations in hidden-unit
space;
Tijsseling & Harnad 1997.)
Figure 1. Left: 3 sets of stimuli
presented to neural net: vertical arm of L much longer, vertical
and
horizantal about equal, horizontal much longer. Middle:
Position of the hidden-unit representations of each
of the three categories
after auto-association but before learning
(cubes represent Ls with long vertical arms,
pyramids Ls with near-equal arms, spheres Ls with long
horizontal
arms). Right: Within-category compression and between-category
separation when the net has
learned
to separate the three kinds of input. (From Tijsseling &
Harnad 1997.)
13. Vanishing Intersections? Fodor
and others have sometimes suggested
otherwise: They have suggested that
one of the reasons most categories can
be neither learned nor evolved
(and hence must be "innate" in some deeper
sense than merely being a
Darwinian adaptation) is the "vanishing intersections"
problem: If you
go back to the dictionary again, pick some content words,
and then look
for the "invariance" shared by all the sensory shadows of just
about any
of the things designated by those words, you will find there is
none:
their "intersection" is empty. What do all the shadows of boomerangs
or tables -- let alone Jerry Fodors or chicken-bootoms -- have in
common
(even allowing dynamic sensorimotor interactions with them)? And
if that
doesn't convince you, then what is the sensory shadow of
categories
like "goodness," "truth," or "beauty"?
14. Direct Sensorimotor
Invariants. There is no reason for invariance
theorists to
back down from this challenge. First, it has to be pointed
out that
since we
do manage to categorize correctly all those
things designated by our dictionaries, there is indeed a capacity of
ours
that needs to be accounted for. To say that these categories are
"innate"
in a Cartesian, Platonic, or cosmogonic sense rather than
just a Darwinian
sense is simply to say that they are an unexplained,
unexplainable mystery.
So let us reject that. Let us assume that if
organisms
can categorize,
then there must be a sensorimotor basis
for that skill of theirs, and its
source must be either evolution,
learning, or both. Which means that there
must be enough in those
shadows to afford all of our categorization capacity.

15. Abstraction and Hearsay.
Does it
all have to be a
matter of direct sensorimotor invariants, always? No, but the path to
goodness, truth and beauty requires
us
to trace the chain of
abstraction that takes us from
categories
acquired through direct sensory experience to those
acquired through
linguistic "hearsay":
Consider the five sensorimotor ways we can
interact differentially with things: the five kinds of things we
can
do with things. We can
see
them,
recognize them,
manipulate them,
name them or
describe them. "Manipulate" in a sense
already covers all five, because manipulating is something
we
do with things; but let us reserve the word "manipulate"
for our more direct physical interactions with objects, such as
touching, lifting, pushing, building, destroying, eating, mating
with, and fleeing from them. Naming them and describing them is also a
thing we do with them, but let us not subsume those two acts
under manipulation. Seeing and recognizing are likewise things
we do with things, but these too are better treated separately,
rather than as forms of manipulation. And "seeing"
is meant to stand
in for all modes of sensory contact with things (hearing,
smelling, tasting, touching), not just vision.
Recognizing is special, because it is not
just a passive sensory event. When we recognize something,
we see it as a
kind of thing (or an individual) that we have
seen before. And it
is a small step from recognizing a thing as a kind
or an individual to giving
it a name. Seeing requires sensorimotor
equipment, but recognizing requires
more. It requires the capacity
to
abstract
. To abstract is
to single out some subset
of the sensory input, and ignore the rest.
For example, we may
see many flowers in a scene, but we must abstract to recognize
some of them as being primroses. Of course, seeing them as flowers is
itself abstraction. Even distinguishing figure from ground is
abstraction. Is any sensorimotor event not abstraction?
16. Abstraction and Amnesia. To answer,
we have to turn to fiction. Borges, in his 1944 short story, "Funes
the Memorious," describes a person who cannot abstract. One day
Funes fell off a horse, and from then onward he could no longer forget
anything. He had an infinite rote memory. Every successive instant
of his experience was stored forever; he could mentally replay the
"tapes" of his daily experience afterwards, and it would take even
longer to keep re-experiencing them than it had to experience them in
the first place. His memory was so good that he gave proper names
or descriptions to all the numbers -- "Luis Meli‡n Lafinur, Olimar,
azufre, los bastos, la ballena, el gas, la caldera, NapolŽon,
Agust’n de Ved’a" -- from 1 all the way up to enormous
numbers. Each was a unique individual for him. But, as a consequence,
he could not do arithmetic; could not even grasp the concepts. The
same puzzlement accompanied his everyday perception. He could
not understand why we people with ordinary, frail memories insisted
on calling a particular dog, at a particular moment, in a particular
place, in a particular position, by the same name that we call it at
another moment, a different time, place, position. For Fines, every
instant was infinitely unique, and different instants were
incomparable,
incommensurable.

Funes's infinite rote memory was hence a handicap, not an advantage. He
was
unable to forget, yet
forgetting,
or at least ignoring, is what is required in order to recognize
and name things. Strictly speaking, a true Funes could not even
exist, or if he did, he could only be a passive sensorimotor
system, buffeted about by its surroundings (like the sand by the wind).
Borges portrayed Funes as having difficulties in grasping
abstractions, yet if he had really had the infinite memory and
incapacity for selective forgetting that Borges ascribed to him,
Funes should have been unable to speak at all, for our words all
pick out categories bases on abstraction. He should not have been
able to grasp the concept of a dog, let alone any particular dog,
or anything else, whether an individual or a kind. He should have
been unable to name numbers, even with proper names, for a numerosity
(or a numeral shape) is itself an abstraction. There should be
the same problem of recognizing either a numerosity or numeral
as being the same numerosity (numeral) on another occasion as
there was in recognizing a dog as the same dog, or as a dog
at all.
17. Invariance and Recurrence.
Funes was a
fiction, but Luria described a
real person who had handicaps that went in the same direction,
though not all the way to an infinite rote
memory. In "The Mind
of a Mnemonist" (1968) Luria describes a stage
memory-artist,
"S," whom he had noticed when S was a journalist because he
never
took notes. S did not have an infinite rote memory like Funes's,
but
a far more powerful and persistent rote memory than a normal
person. When
he performed as a memory artist he would memorize long
strings of numbers
heard only once, or all of the objects in the purse
of an audience member.
He could remember the exact details of scenes,
or long sequences. He also had synaesthesia, which means that
sensory events for him were richer,
polysensory experiences: sounds and
numbers had colors and smells; these
would help him remember. But his
powerful rote memory was a handicap too.
He had trouble reading novels,
because when a scene was described, he would
visualize a corresponding
scene he had once actually seen, and soon he was
lost in reliving his
vivid eidetic memory, unable to follow the content of
the novel. And
he had trouble with abstract concepts, such as numbers, or
even
ordinary generalizations that we all make with no difficulty.

What the stories of Funes
and S show is that living in the world requires the capacity to
detect recurrences, and that
that in turn requires the capacity to
forget or at least ignore what
makes every instant infinitely unique,
and hence incapable of exactly recurring. As noted earlier, Gibson's
(1979) concept of an "affordance" captures the requisite capacity
nicely: Objects
afford certain sensorimotor interactions with
them: A chair affords sitting-upon; flowers afford sorting by
color,
or by species. These affordances are all invariant features of
the sensory
input, or of the sensorimotor
interaction with the input, and the organism has to be
capable of detecting these invariants selectively -- of abstracting
them. If all sensorimotor features are somehow on a par, and every
variation is infinitely unique, then there can be no abstraction
of
the invariants that allow us to recognize sameness, or similarity,
or
identity, whether of kinds or of individuals.
18. Feature Selection and Weighting. Watanabe's (1985)
"Ugly Duckling Theorem" captures the same insight. He describes
how,
considered only logically, there
is no basis for saying that the "ugly duckling" -- the odd swanlet
among
the several ducklings
in the Hans Christian Anderson fable -- can be
said to be any less similar
to any of the ducklings than the ducklings
are to one another. The only reason it looks as if the ducklings are
more similar to one another than
to the swanlet is that our visual
system "weights" certain features more heavily than others --
in other words, it is selective, it
abstracts
certain features
as privileged. For if all features are given equal weight
and there
are, say, two ducklings and a swanlet, in the spatial position D1, S,
D2, then although D1 and D2 do share the feature that they are both
yellow, and S is not, it is equally true that
D1 and S share the feature that they are both to the left of
D2 spatially, a
feature they do not share with D2. Watanabe pointed
out that if we made a
list of all the (physical and logical) features
of D1, D2, and S, and we did
not preferentially weight any of the
features relative to the others, then
S would share exactly as many
features with D1 as D1 shared with D2 (and
as D2
shared with S). This is an exact analogue
of Borges's and Luria's
memory effect, for the feature list is in fact infinite
(it includes
either/or features too, as well as negative ones, such
as "not
bigger than a breadbox," not double, not triple, etc.), so unless
some
features are arbitrarily selected and given extra weight, everything
is equally (and infinitely) similar to everything else.

But of course our sensorimotor systems do not
give equal weight to all features; they do not even detect all
features. And among
the features they do detect, some (such as
shape and color) are more
salient than others (such as spatial position and number of feathers).
And not only are detected features
finite and differentially weighted,
but our memory for them is
even more finite: We can
see
, while
they are present,
far more features than we can
remember afterward.
19. Discrimination Versus
Categorization. The best illustration of this is the difference
between relative and absolute discrimination that was pointed
out by George Miller in his famous 1956 paper on our brains'
information-processing limits: "The Magical Number 7+/-2". If you
show someone an unfamiliar, random shape, and immediately afterward
show either the same shape again or a slightly different shape, they
will
be able to tell you whether the two successive shapes were the same or
different. That is a
relative discrimination, based on a
simultaneous or rapid successive pairwise comparison. But if instead
one shows only one of the two shapes, in isolation, and asks which
of the two it is, and if the difference between them is small enough,
then the viewer will be unable to say which one it is. How small
does the difference have to be? The "just-noticeable-difference"
or JND is the smallest difference that we can detect in pairwise
relative comparisons. But to
identify a shape in isolation is
to make an
absolute discrimination (i.e., a categorization),
and Miller showed that the limits on absolute discrimination were
far narrower than those on relative discrimination.

Let us call relative discrimination "discrimination"
and absolute discrimination "categorization." Differences
have to be far greater for identifying what
kind or individual something is than for telling it apart it
from something else that is simultaneously present or viewed in rapid
succession. Miller pointed out that if the differences are along only
one sensory dimension, such as size, then the number of JNDs we can
discriminate is very large, and the size of the JND is very small,
and depends on the dimension in question. In contrast, the number of
values
along the dimension for which we can categorize the object in
isolation is
approximately seven. If we try to subdivide any dimension
more finely than
that, categroization errors grow.
This limit on categroization capacity
has its counterpart in memory too: If we are given a string of
digits to remember we -- unlike Luria's S, who can remember a very
large number of them -- can recall only about 7. If the string is
longer, errors and interference grow.
20. Recoding and Feature Selection.
Is there any way to increase our capacity to make categorizations? One
way is
to add more dimensions of variation; presumably this
is one of the
ways in which S's synaesthesia helped him. But even
higher dimensionality has its limits, and never approaches
the resolution power of the JND of sensory discrimination.
Another way of increasing memory is by recoding.
Miller showed
that if we have to remember a string of 0's and 1's, then
a string of 7 items is about our limit. But if we first learn to
recode the
digits into, say, triplets in binary code, using their
decimal names -- so that 001 is called "one", 010 is called
"two," 011 is called "three" etc., and we overlearn that code,
so that we can read the strings automatically in the new code,
then we can remember three times as many of the digits. The 7-limit
is still there, but it is now operating on the binary triplets into
which we have recoded the digits: 101 is no longer three items: it is
recoded into one "chunk," "five." We have learned to see the strings in
terms of bigger chunks -- and it is these new chunks that are
now subject
to the 7-limit, not the single binary digits.

Recoding by overlearning
bigger chunks is a way to enhance rote memory for sequences, but
something similar operates at the level of features of objects:
Although the number of features our sensory systems can detect
in an object is not infinite, it is large enough so that if we
see two different objects, sharing one or
a few features, we will
not necessarily be able to detect that they share features, hence
that they are the same kind of object. (This is again a symptom of
the "underdetermination" mentioned earlier, and is related to the
so-called "credit assignment problem": How to find the winning feature
or rule among many possibilities?) To be able to abstract the
shared features, we need supervised categorization training,
with trial
and error and corrective feedback based on a large
enough sample to
allow our brains to solve the credit-assignment
problem and abstract the invariants underlying the variation. The
result, if the learning is successful, is that the inputs are recoded,
just as they are in the
digit string memorization; the features are
re-weighted. The objects that are of the same kind, because they
share invariant features, are consequently seen as more similar
to one another; and objects of different kinds, not sharing the
invariants, are seen as more different.
This within-category enhancement
of perceived
similarity and between-category enhancement of
perceived differences
is again the categorical perception (CP) described
earlier in the case of
color. The sensory "shadows" of light frequency,
intensity and saturation
were recoded and reweighted by our evolved
color receptors so as to selectively
detect and enhance the spectral
ranges that we consequently see as red, yellow, etc.
21. Learned Categorical Perception and
the Whorf Hypothesis.
When CP is an effect of learning, it
is a kind of a Whorfian effect.
Whorf (1956) suggested that how
objects look to us depends on how
we sort and name them. He cited colors as an example of how language
and culture shape the way things
look to us, but the evidence suggests
that the qualitative
color-boundaries along the visible spectrum are
a result of
inborn feature detectors rather than of learning to sort
and name
colors in particular ways. Learned CP effects do occur, but
they
are subtler than color CP, and can only be demonstrated in the
psychophysical laboratory (Goldstone 1994, 2001; Livingston et al.
1998).
Figure 2 below illustrates this for a task in which
subjects learned
texture categorization. For an easy categorization task,
there was no difference
before and after learning, but for a hard one,
learning caused within-category
compression and between-category
separation. (From Pevtzow & Harnad 1997).
Yet learned CP works much the way inborn CP does: Some features are
selectively enhanced, others are suppressed, thereby bringing out
the commonalities underlying categories or kinds. This works like a
kind of input filter, siphoning out the categories on the basis of
their
invariant features, and ignoring or
reducing the salience of non-invariant
features. The supervised and unsupervised
learning mechanisms discussed
earlier have been proposed as the potential
mechanisms for this
abstracting capacity, with sensorimotor interactions
also helping us
to converge on the right affordances, resolving the underdetermination
and solving the credit-assignment problem.

Where does this leave the concrete/abstract
distinction and the vanishing-intersections problem, then? In
what sense is a primrose concrete and a prime number abstract? And how
is "roundness" more abstract than "round," and "property" more abstract
still? Identifying any category is always based on abstraction, as
the example of Funes shows us. To
recognize a wall as a wall rather than,
say, a floor, requires us to abstract
some of its features, of which
verticality, as opposed to horizontality, is a critical one here (and
sensorimotor interactions and affordances obviously help narrow
the options). But in the harder, more underdetermined cases like
chicken-sexing, what determines which features are critical? (We
are back to the Maine joke again: "How's your wife?" "Compared to
what?" .)
Uncertainty Reduction. Although categorization is
an absolute judgment, in that it is based on identifying an object
in isolation, it is relative in another sense: What invariant
features need to be selectively abstracted depend entirely on what the
alternatives are. "Compared to what?" The invariance is relative to
the variance. Information, as we learn from formal information theory,
is something that reduces the uncertainty among alternatives. So
when we learn to categorize things, we are learning to sort the
alternatives that might be confused with one another. Sorting walls
from floors is rather trivial, because the affordance difference is
so obvious already, but sorting the sex of newborn chicks is
harder,
and it is even rumoured that the invariant features are ineffable
in that case: They cannot be described in words. That's why
the only
way to learn them is through the months or years of trial
and error experience training guided by feedback under the
supervision of masters.
22. Explicit Learning . But let us not mistake the fact that it
is difficult
to make them explicit verbally for the fact that there is anything
invisible or mysterious about the features underlying chicken-sexing --
or any other subtle categorization. Biederman did a computer-analysis
of newborn chick-abdomens and identified the winning invariants in
terms
of his "geon" features (Biederman & Shiffrar 1987). He was then
able
to
teach the features and rules through explicit instruction to
a sample
of novices so that within a short time they were able to
sex chicks
at the brown-belt level, if not the black belt level. This
progress should
have taken them months of supervised trial-and-error training,
according
to the grandmasters.
So if we accept that all categorization, great and
small, depends on abstracting some features and ignoring others,
then
all categories are abstract. Only Funes lives in the world of
the
concrete, and that is the world of mere passive experiential
flow from
one infinitely unique instant to the next (like the sand in the wind).
For to do anything systematic or adaptive with the
input would require
abstraction, whether innate or learned: the
detection of the recurrence
of a thing of the same kind.
Categorization Is Abstraction.
What about degrees of abstractness? (Having, with G.B. Shaw,
identified the profession, we are now merely haggling about the
price.) When I am sorting things as
instances of a round-thing
and a non-round-thing, I am sorting things. This thing is round,
that thing is non-round. When I am sorting things as instances of
roundness and non-roundness, I am sorting features of
things. Or
rather, the things I am sorting are features (also known as
properties, when we are not just speaking about them in a sensorimotor
sense). And features themselves are things too: roundness is
a feature, an apple is not (although any thing, even an apple,
can also be a part, hence a feature, of another thing).
Direct and Derivative
Grounding. In principle, all this sorting and naming could be
applied directly to sensorimotor inputs; but much of the sorting and
naming of what we consider more abstract things, such as numbers,
is applied to symbols rather than to sensorimotor interactions with
objects. I name or describe an object, and then I categorize it: "A
number is an invariant numerosity" (ignoring the variation in the kinds
or individuals involved). This simple proposition already illustrates
the adaptive value of language: Language allows as to acquire
new categories without
having to go through the time-consuming and
risky process of direct trial-and-error
learning. Someone who already
knows can just
tell me the features
of an X that will allow me
to recognize it as an X. (This is rather like
what Biederman did for
his experimental subjects, in telling them what features to use to sex
chickens, except that his method was hybrid: It was show-and-telling,
not just telling, because he did not merely
describe
the critical features verbally, but also pointed them out and
illustrated them
visually. He did not pretrain his subjects on
geon-naming, as Miller's
subjects were pretrained on naming binary
triplets.)

The Adaptive
Advantage of Language. If Biederman had done it all with words,
through pure hearsay, he would have demonstrated the full and
unique category-conveying power of language: In sensorimotor
learning, the abstraction usually occurs implicitly. The neural net
in the learner's brain does all the hard work, and the learner
is
merely the beneficiary of the outcome. The evidence for this
is that
people who are perfectly capable of sorting and naming things correctly
usually
cannot tell you
how they
do it. They may try to tell you what features and
rules they are using, but
as often as not their explanation is
incomplete, or even just plain wrong. This is what makes cognitive
science a science; for if we could all make it explicit, merely by
introspecting, how it is that we are able to do all that we can do,
then our introspection would have done all of cognitive science's work
for it. But we usually cannot make our implicit knowledge explicit,
just
as the master chicken-sexers could not. Yet what explicit knowledge we
do
have, we can convey to one another much more efficiently by hearsay
than
if we had to learn it all the hard way, through trial-and-error
experience. This is what gave language the powerful adaptive advantage
that it had for our species; Cangelosi & Harnad 2001; see
Figure 3).
An artificial-life simulation of
mushroom
foragers. Mushroom-categories could be learned in two
different ways, by sensorimotor “toil”
(trial-and-error learning with feedback
from the consequences
of errors) or linguistic “theft” (learning from overhearing the
category described; hearsay). Within a very few generations
the linguistics “thieves” out-survive and out-reproduce the
sensorimotor toilers. (But note that
the linguistically based
categories must be grounded
in sensorimotor categories: it cannot be
theft all the way down.) (Cangelosi & Harnad 2001.)
Absolute Discriminables and Affordances. Where
does this leave prime numbers then, relative to primroses? Pretty
much on a par, really. I,
for one, do not happen to know what
primroses are. I am not even sure
they are roses. But I am sure I
could find out, either through direct
trial and error experience,
my guesses corrected by feedback from
the masters, and my internal neural nets busily and implicitly solving
the credit-assignment
problem for me, converging eventually on the
winning invariants;
or, if the grandmasters are willing and able to
make the invariants
explicit for me in words, I could find out what
primroses are
through hearsay. It can't be hearsay all the way down, though.
I will
have had to learn some things the hard, sensorimotor way, if
the
words used by the grandmasters are to have any sense for me. The words
would have to name categories I already have.
Is it any different with prime numbers? I know
they
are a kind of number. I will have to be told about factoring,
and
will probably have to try it out on some numbers to see what
it affords,
before recognizing that some kinds of numbers do afford
factoring and
others do not. The same is true for finding out what
deductive proof
affords, when they tell me more about further features of prime
numbers.
Numbers themselves I will have had to learn at
first hand, supervised
by feedback in absolutely discriminating
numerosities, as provided by yellow-belt arithemeticians -- for
here too it cannot be hearsay all the way down. (I will also need
to experience counting at first hand, and especially what "adding
one" to something, over and over again, affords.)
But is there any sense in which primroses or their
features are "realer" than prime numbers and their features? Any more
basis for doubting whether one is really "out there" than the other?
The
sense in which either of them is out there is that they are both
absolute discriminables: Both have sensorimotor affordances that I can
detect, either implicitly, through
concrete trial-and-error experience,
guided by corrective feedback (not necessarily
from a live teacher, by
the way: if, for example, primroses were edible,
and all other flowers
toxic, or prime numerosities were fungible, and all
others worthless,
feedback from the consequences of the sensorimotor interactions
would
be supervision enough); or explicitly, through verbal descriptions (as
long as the words used are already grounded, directly or recursively,
in
concrete trial-and-error experience; Harnad 1990). The affordances
are not
imposed by me; they are "external" constraints, properties of
the outside world, if you like, governing its sensorimotor interactions
with me. And what
I do know of the outside world is only through what
it affords (to my senses,
and to any sensory prostheses I can
use to augment them). That 2+2
is 4 rather than 5 is hence as much of a sensorimotor constraint as
that projections of nearer objects
move faster along my retina than
those of farther ones.
Mere cognitive scientists
(sensorimotor roboticists, really) should not presume to do
ontology at all, or should at least restrict their ontic claims to
their own variables and terms of art -- in this case, sensorimotor
systems and their inputs and outputs. By this token, whatever
it is that "subtends" absolute discriminations -- whatever
distal objects, events or states are the sources of the proximal
projections on our sensory surfaces that afford us the capacity
to see, recognize, manipulate, name and describe them -- are all on an
ontological par; and subtler discriminations are unaffordable.
Where does this leave goodness,
truth and beauty, and their sensorimotor invariants? Like prime
numbers, these categories are acquired largely by hearsay. The
ethicists, jurists and theologians (not to mention our parents) tell us
explicitly what kinds of acts and
people are good and what kind
are not, and why (but the words in their
explicit descriptions
must themselves be grounded, either directly, or recursively,
in sensorimotor invariants: again, categories cannot be hearsay
all the
way down.). We can also taste what's good and what's not good directly
with our senses, of course, in sampling some of their consequences.
We perhaps
rely more on our own sensory tastes in the case of beauty,
rather than on
hearsay from aestheticians or critics, though we are no
doubt influenced by them and by their theories too. The categories
"true"
and "false" we sample amply through direct sensory experience,
but there too, how we cognize them is influenced by hearsay; and of
course the formal theory of truth looks more and more like the theory
of
prime numbers, with both constrained by the affordances of formal
consistency.
Cognition Is Categorization. But, at bottom, all of our categories
consist in ways we behave differently toward different kinds of things,
whether it be the things we do or don't, eat, mate with, or flee from,
or the things that we describe, through our language, as prime numbers,
affordances, or absolute discriminables. And isn't that all that
cognition is for -- and about?

References
Biederman, I. &
Shiffrar, M. M. (1987) Sexing day-old chicks: A case study and expert
systems analysis of a difficult perceptual-learning task.
http://www.phon.ucl.ac.uk/home/richardh/chicken.htm
Borges, J.L. (1962) Funes el memorioso
http://www.bridgewater.edu/~atrupe/GEC101/Funes.html
Cangelosi, A. & Harnad, S. (2001) The Adaptive Advantage of
Symbolic Theft
Over Sensorimotor Toil: Grounding Language in Perceptual
Categories. Evolution of Communication 4(1)
117-142
http://cogprints.soton.ac.uk/documents/disk0/00/00/20/36/index.htm
Chomsky, N. (1976) In Harnad, Stevan and Steklis, Horst D. and
Lancaster, Jane
B., Eds. Origins and Evolution of Language and Speech,
page
58
. Annals of the New York Academy of
Sciences.
Fodor, J. A. (1975)
The language
of thought
. New York: Thomas
Y. Crowell
Fodor, J. A. (1981) RePresentations . Cambridge MA: MIT/Bradford.
Fodor, J. A. (1998). In critical condition:
Polemical essays on cognitive science and
the philosophy of mind
. Cambridge, MA: MIT Press.
http://cognet.mit.edu/MITECS/Entry/gibson1
Goldstone, R.L., (1994)
Influences of categorization on perceptual discrimination
.
Goldstone, R.L. (2001) The Sensitization and Differentiation of
Dimensions
During Category Learning. Journal of
Experimental Psychology: General 130: 116-139
Harnad, S. (1987)
Category Induction and Representation, In: Harnad, S. (ed.) (1987)
Categorical Perception:
The Groundwork of Cognition. New York: Cambridge University
Press.
http://cogprints.soton.ac.uk/documents/disk0/00/00/15/72/index.html
Harnad, S. (1990)
The Symbol Grounding Problem. Physica
D
Journal
of Logic, Language, and
Information 9(4):
425-445. (special issue on "Alan Turing and Artificial Intelligence")
The Sciences
Harnad, S. (2003)
Categorical Perception. Encyclopedia of
Cognitive Science . Nature
Publishing Group. Macmillan.
Encylopedia of
Cognitive
Science
. Nature
Publishing Group. Macmillan.
Livingston, Kenneth and Andrews, Janet
and Harnad, Stevan (1998) Categorical
Perception Effects Induced
by Category Learning. Journal of Experimental Psychology: Learning,
Memory and Cognition
http://eprints.ecs.soton.ac.uk/archive/00006883/
Luria, A. R. (1968) The Mind of a
Mnemonist
Psychological Review 63:81-97
Rosch, E. &
Lloyd, B. B. (1978) Cognition and categorization . Hillsdale NJ:
Erlbaum Associates
(Harnad, Stevan,
Steklis, Horst Dieter and Lancaster,
Jane B., Eds.), 445-455. Annals of the New York Academy
of Sciences 280.
Watanabe, S., (1985) "Theorem of the Ugly Duckling", . Wiley
http://www.kamalnigam.com/papers/thesis-nigam.pdf
Whorf, B.L. (1956) Language, Thought and Reality . (J.B. Carroll, Ed.)
Cambridge:
MIT
http://www.mtsu.edu/~dlavery/Whorf/blwquotes.html
Appendix 1.
There is
nothing wrong with the "classical theory" of categorization.
Eleanor
Rosch has suggested that because we cannot state the basis on
which we categorize, that basis must not exist (Rosch & Lloyd
1978). It follows that there is something wrong with the so-called
"classical" theory of categorization, which is that we categorize on
the basis of the features that are necessary and sufficient to
afford categorization.
Not only do I think there's
nothing
the least bit wrong with that "classical theory," but I
am pretty confident that there is no
non-magic
alternative to it. Rosch's alternative was to vacillate
rather vaguely
between the idea that we categorize on the basis of
prototypes or
on the basis of "family resemblances". Let's consider
of these candidate
mechanisms in turn:
To categorize on the basis of prototypes would be to identify a bird as
a
bird because it looks more like the template for a typical bird
than
the template for a typical fish. This would be fine if all,
many, or
most of the things we categorize indeed had templates,
and our internal
categorization mechanism could sort their sensory
shadows by seeing
which template they are closest to; in other
words, it would be fine
if such a mechanism could actually generate our categorization
capacity.
Unfortunately it cannot. Template-matching
is not very successful among
the many candidate machine-learning
models, and one of the reasons is
that it is simply not the case
that
everything is a member of
every category, to
different degrees. It is not true ontologically that a bird is a
fish (or a table) to a certain degree; nor is it true
functionally
that sensory shadows of birds can be sorted on the basis
of
their degree of similarity to prototype birds, fish or tables. So
prototype theory is a non-starter as a mechanism for our categorization
capacity. It might explain our typicality judgments Ð is
this a more typical bird than that -- but being able to make a
typicality judgment
presupposes
being
able to categorize;
it does not explain it: Before I can say how typical
a bird this
is, I first need to identify it as a bird!
So if
not prototypes, what about family-resemblances, then? What are family
resemblances? They are merely a cluster of either/or features:
This
is an X, if it has feature A or B or not C. Either/or features
(disjunctive
invariants) are perfectly classical (so forget about
thinking of family-resemblances
as alternatives to classical theories
of categorization). The problem is
that saying that some features are
either/or features leaves us no closer
to answering "how" than we were
before we were informed of this. Yes, some
of the affordances of sensory
shadows will be either/or features, but what we need to know is what
mechanism will be able to find them!
The last Roschian
legacy to category theory is the "basic object" level, vs. the
superordinate or subordinate level. Here too it is difficult to see
what, if anything, we have learned. If you point to an object,
say, a table, and ask me what it is, chances are that I will
say it's a table, rather than a Biedermeyer, or furniture, or
"Charlie". So what?
As mentioned earlier, there are many ways to
categorize the same objects,
depending on context. A context is
simply a set of alternatives among
which the object's name is meant
to resolve the uncertainty (in perfectly
classical information-theoretic terms). So when you point to a table
and ask me what it is, I
pick "table" as the uncertainty-resolver in
the default context
(I may imagine that the room contains one chair, one computer,
one waste-basket and one table. If I imagine that it contains
four
tables, I might have to identify this one as the Biedermeyer;
and
if there are four Biedermeyers, I may have to hope you know
I've dubbed
this one "Charlie." So much for subordinates. The same path can be
taken for superordinates. It all devolves on the old
Maine joke, which comes close to revealing a profound truth
about categories: "How's Your
wife?" Reply: "Compared to what?" If
we were just discussing the relative amount you should invest
in furniture in your new apartment, as opposed
to accessories,
and you forgot what was in the adjacent room and asked
what was
in there (when there was just a table) I might reply furniture. If we
were discussing ontology, I might say "vegetable" (as opposed to
animal or mineral). Etc.
So citing the
"basic object level" does not help; that's just what one arbitrarily
assumes the default context of interconfusable alternatives
to be, given no further information. The only sense in which
"concrete" objects, directly accessible to our senses, are
somehow more basic, insofar as categorization is concerned,
than more "abstract" objects, such as goodness, truth or beauty is that
sensorimotor categories must
be grounded in sensory
experience and that the content of that experience
is fairly
predictable from most members of our species.
Appendix Two.
Associationism begs the question of
categorization.
The problem
of association is the problem of rote pairing: an object with an
object, a name with a name, a name with an object. Categorization
is the problem of recognizing and sorting objects as
kinds based
on finding the invariants underlying sensorimotor interactions with
their
shadows. Associationism had suggested that this was just a matter of
learning to associate tokens (instances, shadows)
of an object-type with tokens of its type-name -- as indeed it
is, if only we can first figure out which object-tokens are tokens
(shadows) of the same object-type! Which is in turn the problem of
categorization. Associationism simply bypassed the
real problem, and
reduced learning to the trivial process of rote association, governed
by how often two tokens co-occurred (plus an unexplicated influence of
how "similar" they were to one another).