Cangelosi/Harnad Symbols

From: Jelasity Mark (jelasity@amadea.inf.u-szeged.hu)
Date: Tue Dec 07 1999 - 11:09:05 GMT


> Cangelosi, A. & Harnad, S. (2000) The Adaptive Advantage of
> SymbolicTheft Over Sensorimotor Toil: Grounding Language in
> Perceptual Categories. Evolution of Communication (Special Issue
> on Grounding Language)
> http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad00.language.html

I had a hard time making this commentary: the paper does not contain
enough information to reproduce the model it discusses, so I had to
discuss missing details with Angelo Cangelosi. Since the paper itself
is not comprehensible
(description of the model and experiments) I'm afraid it will
be very hard to follow my relevant coments for those who have not
read it. Some guide for those who are not interested in the presented
model should skip the middle of this commentary.

================== section 0 =================================

The explanation of the title: toil and theft:

> Human beings clearly become capable of doing many things in their
> world, and from what they can do, it can also be inferred that they
> know a lot about that world. Without too much loss of generality,
> the Martian could describe that knowledge as being about the kinds
> of things there are in the world, and what to do with them. In
> other words, the knowledge is knowledge of categories: objects,
> events, states, properties and actions.
>
> [...] By analogy with the concept of wealth, the Martian
> might describe the categories acquired through the efforts of a
> lifetime to be those that are earned through honest toil, whereas
> those that we are born with and hence not required to earn he might
> be tempted to regard as ill-gotten gains -- unless his database
> was really very long-range, in which case he would notice that
> even our inborn categories had to be earned through honest toil:
> not our own individual toil, nor even that of our ancestors,
> but that of a more complicated, collective phenomenon that our
> (ingenious) Martian anthropologist might want to call evolution.
>
> [...] but upon closer inspection of his data he [the Martian]
> deduces that we must in fact be stealing them from one
> another somehow. [...]
>
> [...] A proposition is just a series of symbols that can be
> interpreted as making a claim that can be either true or false. The
> Martian knows that propositions can always be interpreted as
> statements about category membership. He quickly deduces that
> propositions make it possible to acquire new categories in the
> form of recombinations of old ones, as long as all the symbols
> for the old categories are already grounded in Toil (individual
> or evolutionary). He accordingly conjectures that the adaptive
> advantage of language is specifically the advantage of Symbolic
> Theft over Sensorimotor Toil, a victimless crime that allows
> knowledge to be acquired without the risks or costs of direct
> trial and error experience.

So far what is said is that the main adaptive advantage of language
is the boosting of concept learning. The exact way language boosts
learning is not decribed yet. The above ideas (i.e. theft) can be
interpreted several ways. One of the fix points in the
interpretations is that language
somehow helps learning new categories; it is rather
uncontroversial.
Another fix point is that the main adaptive advatage of language
lies in this first fix point. It may be more controversial, but it
is true that at least NOW it is one of the main functions of language.

The motivation of the paper:

> Can the adaptive advantage of Symbolic Theft over Sensorimotor Toil
> be demonstrated without the benefit of the Martian Anthropologist's
> evolutionary database (in which he can review at leisure the
> videotape of the real-time origins of language)? We will try to
> demonstrate them in a computer simulated toy world considerably
> more impoverished than the one studied by the Martian. It will
> be a world consisting of mushrooms and mushroom foragers who must
> learn what to do with which kind of mushroom in order to survive
> and reproduce (Parisi, Cecconi & Nolfi 1990; Cangelosi & Parisi

in the following I will show that he model and the experiments
presented here support only a rather trivial statement: it is easier
to learn a category if the
learning samples do not contain irrelevant features.
I will also argue that this fact does not prove the adaptive advantage of
language, the suggested mechanism is not the way concept learning
works trough language.

===================== the model ===========================

I cite the whole part about the model in almost full length here, with
some short comments.
For the figures refer to the html version. After this citation
I am giving the summary my version to which Angelo Cangelosi responded:

> Your description is about OK. Sorry I cannot continue to comment it
> in detail, but I am busy with other things.

Nevertheless, we had several exchange of messages before this. Now
the model and experiment descriptions, as in the paper:
(a good piece of advice: eat, mark, and return doesn not mean anything
like eat, mark, return. read them as a1, a2, a3. only motion is real.)

> 2. The mushroom world
>
> Our simulations take place in a mushroom world (Cangelosi & Parisi,
> 1998; Harnad 1987) in which little virtual organisms forage among
> the mushrooms, learning what to do with them (eat or don't eat,
> mark or don't mark, return or don't return). The foragers feed,
> reproduce and die. Mushrooms with feature A (i.e. those with
> black spots on their tops, as illustrated in Figure 1) are to be
> eaten; mushrooms with feature B (i.e. a dark stalk) are to have
> their location marked, and mushrooms with both features A and B
> (i.e. both black-spotted top and dark stalk) are to be eaten,
> marked and returned to. All mushrooms also have three irrelevant
> features, C, D and E, which the foragers must learn to ignore.
>
> Apart from being able to move around in the environment and to
> learn to categorize the mushrooms they encounter, the foragers
> also have the ability to vocalize. When they approach a mushroom,
> they emit a call associated with what they are about to do to that
> mushroom (EAT, MARK). Both the correct action pattern (eat, mark)
> and the correct call (EAT, MARK) are learned during the foragers'
> lifetime through supervised learning (Sensorimotor Toil). Under
> some conditions, the foragers also receive as input, over and
> above the features of the mushroom itself (+/-A, +/-B, +/-C,
> +/-D, +/-E), the call of another forager. This will be used to
> test the adaptive role of the Theft strategy. (Note, however,
> that except in special cases -- reported and analyzed elsewhere
> (Cangelosi & Harnad, in preparation) -- in the present simulations
> the thief steals only the knowledge, not the mushroom.)
>
> The foragers' world is a 2-dimensional (2D) grid of 400
> cells (20x20). The environment contains 40 randomly located
> mushrooms. Mushrooms are grouped in four categories according
> to the presence/absence of features A and B: 00, A0, 0B, and AB
> (Figure 1). In each world there are 40 mushrooms: 10 instances of
> each of the four categories.
>
> Feature A is the black-spotted top and feature B is the dark
> stalk. Mushroom position is encoded as the normalized relative
> angle between the direction the forager is facing and the direction
> of the closest mushroom. In this simulation, the foraging is done
> by only one forager at a time. As it moves, the forager perceives
> only the closest mushroom. For each mushroom, the input to the
> forager consists of the 5 +/- features plus its location relative
> to the forager, expressed as the angle a, between its position and
> the direction the forager is facing. The angle is then normalized
> to the interval [0, 1]. The five visual features A, B, C, D,
> E are encoded in a binary localist representation consisting
> of five units each of which encodes the presence/absence of one
> feature. An A0 mushroom would be encoded as 10***, with 1 standing
> for the presence of feature A, 0 for the absence of feature B
> and *** being either 0 or 1 for the 3 irrelevant features, C, D,
> and E. 0B mushrooms are encoded as 01***, and AB as 11***. The
> calls that can be produced in the presence of the mushroom are
> also encoded in a localist binary system. There are 3 units for
> each of the three calls: 1** EAT, *1* MARK and **1 RETURN, so
> EAT+MARK+RETURN would be 111. Like the Calls, the three actions
> of eating, marking and returning are encoded localistically.
>
> 3. The Neural Network and Genetic Algorithm
>
> The forager's neural network processes the sensory information
> about the closest mushroom and activates the output units
> corresponding to the movement, action and call patterns. The
> net has a feedforward architecture (Figure 2) with 8 input,
> 5 hidden and 8 output units. The first input unit encodes the
> angle to the closest mushroom. Five input units encode the visual
> features and three input units encode incoming calls (if any). Two
> output units encode the four possible movements (one step forward,
> turn 90 degrees right, turn 90 degrees left, or stay in place)
> in binary. Three action units encode the action patterns eat,
> mark, and return, and three call units encode the corresponding
> three calls, EAT, MARK, and RETURN.
>
> A forager's lifetime lasts for 2000 actions (100 actions in
> 20 epochs, each of them sampling a different distribution of 40
> mushrooms). For each epoch there are two spreads of activation, one
> for the action (movement and action/call) and one for an imitation
> task. The forager first produces a movement and an action/call
> output using the input information from the physical features of
> the mushroom. The forager's neural network then undergoes a cycle
> of learning based on the backpropagation algorithm (Rumelhart,
> Hinton, & Williams, 1986).
>
> The net's action and call outputs are compared with what they
> should have been; this difference is then backpropagated so as to
> weaken incorrect connections and strengthen correct ones. In this
> way the forager learns to categorize the mushrooms by performing
> the correct action and call. In the second spread of activation
> the forager also learns to imitate the call. It receives as
> input only the correct call for that kind of mushroom, which it
> must imitate in its call output units. This learning is likewise
> supervised by backpropagation.

Note that imitation learning is independent of the observed
mushroom featrures. The imitation of any call can be learned
in the presence of any mushroom with the same overall effects.

>
> The population of foragers is also subject to selection and
> reproduction as generated by the genetic algorithm (Goldberg,
> 1989). The population size is 100 foragers and remains constant
> across generations. The initial population consists of 100 neural
> nets with a random weight matrix. During the forager's lifetime its
> individual fitness is computed according to a formula that assigns
> points for each time a forager reaches a mushroom and performs
> the right action on it (eat/mark/return) according to features A
> and B. At the beginning of its life, a forager does not become
> much fitter from the first mushrooms it encounters because it
> takes some time to learn to categorize them correctly. As errors
> decrease, the forager's fitness increases. At the end of their
> life-cycles, the 20 foragers with the highest fitness in each
> generation are selected and allowed to reproduce by engendering
> 5 offspring each. The new population of 100 (20x5) newborns is
> subject to random mutation of their initial connection weights
> for the motor behavior, as well as for the actions and calls
> (thick arrows in Figure 2); in other words, there is neither
> any Lamarckian inheritance of learned weights nor any Baldwinian
> evolution of initial weights to set them closer to the final stage
> of the learning of 00, A0, 0B and AB categories. This selection
> cycle is repeated until the final generation.

Of course, there IS Baldwinian evolution, as Cangelosi admitted.

>
> 4. Grounding Eat and Mark Directly Through Toil.
>
> Two experimental conditions were compared: Toil and Theft. Foragers
> live for two life-stages of 2000 actions each. The first
> life-stage is identical for both populations: they all learn,
> through sensorimotor Toil, to eat mushrooms with feature A and
> to mark mushrooms with feature B. (AB mushrooms are accordingly
> both eaten and marked.) Return is not taught during the first
> life-stage. The input is always the mushroom's position and
> features, as shown in Table 1. In the second life-stage, foragers
> in the Toil condition go on to learn to return to AB mushrooms
> in the same way they had learned to eat and mark them through
> honest toil: trial and error supervised by the consequences of
> returning or not returning (Catania & Harnad 1988). In contrast,
> foragers in the Theft condition learn to return on the basis of
> hearing the vocalization of the mushrooms' names.
>
>
> Feature Call Behavior Call
> Condition Input Input Backprop Backprop
>
> TOIL EAT-MARK YES NO YES YES
>
> TOIL RETURN YES NO YES YES
>
> THEFT RETURN NO YES YES YES
>
> IMITATION NO YES NO YES
>
> Table 1 - Input and backpropagation for Toil and Theft learning
> and for
> imitation learning
>
> We ran ten replications for each of the two conditions. In the
> first 200 generations, the foragers only live for the first
> life-stage. From generation 200 to generation 210 they live on
> for a second life-stage and must learn the return behavior. The
> first 200 generations are necessary to evolve and stabilize the
> ability to explore the world and to approach mushrooms. After the
> foragers are able to move in the 2D environment and to approach
> mushrooms, they learn the basic categories plus their names, EAT
> and MARK. The average fitness of the ten replications is shown
> in Figure 3. The populations that evolve in these 10 runs are
> the same ones that are then used in the Toil and Theft conditions
> from generations 200 to 210.
>
> In the next runs, the second life-stage differs for the Toil and
> Theft groups: The Toil group learns to return and to vocalize
> RETURN on the basis of the feature input alone, as in the previous
> life-stage. Their input and supervision conditions are shown
> in Table 1. In the Theft condition the foragers rely on other
> foragers' calls to learn to return. They do not receive the
> feature input, only the vocalization input.
>
> Our hypothesis is that the Theft strategy is more adaptive
> (i.e. results in greater fitness and more mushroom collection)
> than the Toil strategy. To test this, we compare foragers'
> behavior for the two conditions statistically. For our purposes
> we count the number of AB mushrooms that are correctly returned
> to. The average of the best 20 foragers in all 10 replications is
> 54.7 AB mushrooms for Theft and 44.1 for Toil. That is, Thieves
> successfully return to more AB mushrooms than do Toilers. This
> means that learning to return from the grounded names EAT and
> MARK is more adaptive than learning it through direct toil based
> on sampling the physical features of the mushrooms. To compare
> the two conditions, we performed a repeated measures analysis of
> variance (MANOVA) on the 10 seeds. The dependent variables were
> the number of AB mushrooms collected at generation 210 averaged
> over the 20 fittest individuals in all 10 generations. The
> independent variable was Theft vs. Toil. The difference between
> the two conditions was significant [F(1,9)=136.7 p<0.0001]. Means
> and standard deviations are shown in Figure 4.

Now, my extended version of the model and experiments, as I sent it
to Angelo Cangelosi:

The ANN architecture of an organism is clear. The description of the world
is clear.

We have a population of 100 organisms.

The learning procedure for every organism is the following:

Every organism is put in a new world 20 times, in each they make 100
steps. In each step, the following is done:

the closest mushroom is percpeted and God teaches the
organism what to do (eat, mark), but not how to move around making
sure that the organism continues to do a random walk trough the space, and
does not teach "return" neither, because it will be used to test the theft
effect.
It is based on mushroom
description only, through a backprop cycle, and an additional backrop
cycle God teaches it what to say (based on the correct call only).
If an organism is in the cell of a mushroom and happens to do the right
thing, that it gets a reward.

When we have the sum of the rewards for all the 100 organisms, we chose
the top 20 organisms, making 5 new organisms from each via changing 10%
of the weights of the original randomly. Thus we have a new population of
100.

The above population cycle is repeated 200 times. Then in the followin 10
generations we can proceed 2 ways.

1: everything is the same except after the usual 2000 steps the organisms
   live further and return is included in the teaching (or ONLY return
   is taught?) for and additional 2000 steps.
2: the first 2000 steps are the usual as in 1, but then every organism
   learns return (both call and action) (ONLY return?) based on a
   correct call.

Now, let's do some simplifications, that leave the predictive power of
the model intact.

First observe, that the motion of an organism is
in fact a random walk, since motion is never taught, at least by
backprop. the genetic algorithm may have some effects, and it may well
be that fitness increases because organisms learn to approach mushrooms
more efficiently. However, this effect is irrelevant from the point of
view of learning concepts about mushrooms through toil. The
results of the paper would not change if motion, and the concept of
"the world of mushrooms" were discarded altogeather.
Fitness could be calculated via any measure of learning accuracy over
an example set of different mushrooms.

Second, observe, that the call and ection outputs are in fact identical.
The call output of the organisms are never used in any experiment.
The misterious "imitation learning" phase seems to be useless, since
it teahes a function that is never used. The only possible effect of it
is that it somehow "clears the ground" for theft learning making use
of the fact that the action and call output is identical, so in the
theft learning the organism has to IMITATE the call input in its action
output. If this is right then it is cheating. If this is not right, then
the call output is useless. This means that the call output and the
imitation learning phase can be discarded.

Third, evolution and learning both evolve the very same weights of the
organism. The combination of evolution and backprob is virtually a
single learning algorithm that has to find good weights for the given
task (the genetic algorithm is typically used to find structure, not
weights, in which case this is not true). So we can think of the model
as containing only one organism, being taught by some algorithm
based on a set of learning examples.

Here, "theft" organisms learn return based on the call, and "toil"
organisms learn based on the mushroom. This means that "theft"
organisms receive the very same input as toil organisms, exept
they don't receive garbage (C,D,E features).

This means that the model only prooves, that learning is more effective
without garbage in the input examples.

Now let's move on to the conclusions.

===================== conclusions ===============================

> 8. Conclusions
>
> We have shown that a strategy of acquiring new categories by
> Symbolic Theft completely outperforms a strategy of acquiring them
> by Sensorimotor Toil as long as it is grounded in categories
> acquired by Toil.

If theft means learning without garbage, yes.

> That?s how it proceeded on our planet until one species discovered
> a better way: First acquire an entry-level set of categories the
> honest way, like everyone else, but then assign them arbitrary
> names. (Those names could start as nonarbitrary functional
> or imitative gestures at first, by-products of practical,
> collective social actions or even deliberate mimicry, but their
> nonarbitrary features would be irrelevant once they were used
> just to name; and vocal gestures would be least encumbered with
> other practical tasks, hence most readily available for arbitrary
> naming, especially across distances, out of eye-shot, and in the
> dark.) Once the entry-level categories had accompanying names,
> the whole world of combinatory possibilities opened up and a
> lively trade in new categories could begin (probably more in
> the spirit of barter than theft, and, within a kin-line, one of
> sharing categories along with other goods). In trading categories
> as they traded combinations of symbols, our species also traded
> "world-views," for each category acquired by hearsay also brought
> with it some rearrangement of the internal representation of
> categories, a "warping" that was Whorfian, whether merely the
> subtle compression that results from learning that A is always
> conjoined with B, or the fundamental restructuring dictated by
> a radical scientific discovery.
>
> Can results from a 3-bit toy world really cast light on the
> rich and complex phenomenon of the origin and adaptive value of
> natural language? This is really a question about whether such
> findings will "scale up" to human size in the real world.

The problem is not with the scaling up. I will mention two serious
problems with this approach.

The first is that in the example of the paper, the organisms had
to learn a concept (return) that depends only on two other
concepts, eat and mark. When the theft phase begins, the organism
has already acquired eat and mark. Insted of relying on the call
input, a third strategy could be to use the organisms own
output as input, i.e. to base the learning of new categories
on old ones. It would provide the same advantage, and indeed it
does. The frog's eye recognises concepts connected to size and motion,
and his concept "eat" depends on these primitive ones, forming
a hierarchy.
Though the frog doesn't have names for its concepts, neither do the
foragers. Or in the other direction, if the foragers do, then frogs
do as well.

The second objection is the other side of the first one: if the
concept return depended on C,D or E (why not?), then the theft strategy
would be sentenced to failure.
Furthemore, observe, that "theft" organisms never
learn how to recognise a mushroom to return to. They can recognise
only the call for such mushrooms.
Though the basic concepts are grounded, they are not connected to
the concept "return" if theft strategy is used. In other words,
"return" is grounded in the perception of calls.
This is not how language works. I can recognise
things without somebody telling me the name of the thing, after learning
the concept.

In other words, we can see the world AND hear the names of things.
In my view, theft is done the following way. We hear a new name first,
and AFTER THAT we figure out how to ground it in PRECEPTUAL INPUT and
OUR OWN old concpets.
The basic intuition of the paper is interesting but the model is
not relevant to language evolution.



This archive was generated by hypermail 2b30 : Tue Feb 13 2001 - 16:23:06 GMT