Dear Dr. Steels,

Appended below are the 5 BBS referee reports on your manuscript: "Computational Simulations of Colour Categorisation and Colour Naming" The reports indicate that your manuscript is potentially acceptable if you can successfully revise it in accordance with the referees' recommendations (which I will summarize below). BBS policy under these conditions is that the revised manuscript must be re-refereed and must be accompanied by a detailed, itemized statement of how and where in the revised draft each referee's specific points (as boldfaced in the referee reports below) have been accommodated.

To focus your revision, here is a summary of the boldface points in the reports below that particularly call for attention, according to the categories in the composite ratings that follow the last referee report below.

The responses of referees 1-3 and 5 all converge, from their different disciplinary viewpoints, on a similar conclusion. Most are sympathetic to the modeling enterprise in this domain, and see this as a quite valuable contribution to the field. In order for this manuscript to succeed with a BBS audience, however, it must bring in more of the empirical evidence and debate to do with color naming, whether that be psychophysical, neurophysiological or ecological, to move itself away from "toy worlds". It is not necessary that you bring in every domain of empirical information, but that the manuscript be amplified where relevant to sit itself in regard to the known literature and current debates. This will make a longer manuscript. Methodological issues in modelling must be addressed. Reviewer 5 brings up a central point regarding the use of Maunsell color space; it is possible you may choose to run another simulation that would address his important objection, but minimally your choice of a homogeneous, not "real world" color space must be justified and your results qualified.

In regard to Reviewer #4, I do not feel it necessary that your paper justify reductionist science and all modeling, but you could profitably use this review as a sample of the kind of responses your article may receive.

I also direct your attention to a recent target article in the commentary phase " Color Realism and Color Science" by Alex Byrne & David R. Hilbert that addresses a number of the issues raised here. You can see it at:

For the details, please see the 5 thoughtful referee reports. [Please make sure you read these reports in an HTML reader, otherwise the boldface will not be visible. If the word "boldface" does not appear boldface in this sentence, please contact BBS.]

I hope you will accept the challenge to revise your paper. Most ultimately accepted BBS papers first undergo major revision. This is necessary not only to ensure the quality of BBS target articles, but also to protect authors from running the gauntlet of open peer commentary before being adequately forearmed. The "mini-treatment" consisting of the BBS referee reports tends to provide a fair sample of what a paper is likely to encounter in Commentary; and so experience has dictated that to elicit commentary that is constructive and useful to the author as well as to the field, a paper must fully accommodate these prima facie criticisms in advance.

Please let us know how you propose to proceed, and according to what timetable.




Barbara L. Finlay

Editor, BBS


Author : Luc Steels and Tony Belpaeme

Title : Computational Simulations of Colour Categorisation and Colour Naming


Referee #1 Anonymous

The paper presents simulation studies to determine possible influences

of language on the acquisition of color categories. Models of

populations consisting of individuals interacting to different

degrees are used to investigate the emergence of common color

categories within and across populations. Environmental influences,

evolutionary processes, and the impact of language are investigated

in these model systems.

It is an interesting and, to my knowledge, new idea to approach

issues of color category formation by pure model studies. I could

imagine that in principle such an approach could lead to valuable

contributions. Regarding the complexities of the issues involved,

however, the methodological demands are exceptional, and the current

paper in my opinion does not meet the requirements.

The main problem of the paper is that such model studies can only

constitute a proof of principle. When successful, we know that the

assumed mechanisms are one possibility. From failure we learn almost

nothing. The problem may be not as severe if there is good reason

to believe that certain aspects of the system under investigation

are sufficiently well understood and appropriately modeled. This is

clearly not the case here.

What interests BBS readers and commentators are the real systems,

not toy worlds. Adequately designed model studies can be fruitful

in giving us relevant insights, and I would be all in favor of a

systematic approach along these lines. But I don't see sufficient

potential to address such questions with the models presented here.

From these kinds of studies, virtually no conclusions regarding the

development of color categories in humans can be drawn.

In this context, "proofs" 5-7 (section 5.1) have to be taken with

extreme caution. Conclusions such as some mechanisms being "more

plausible" or "less plausible" than others are not justified. This is

clearly, if late (sections 5.2-5.3), acknowledged by the authors,

but addressed vaguely and with weak arguments.

Consequently, it is not clear what the issues are that should be

commented on. Comments would most likely amount to discussions of

methodological approach and specifics of the models and assumptions.

Apart from the requirement that this should be done in main part

by the authors, such discussion would be more appropriate for a

specialist journal. It would also lack any interdisciplinary aspect.

What remains then is the claim that it is possible in principle

that genetics, learning, and language have certain influences on the

acquisition of color categories. This, in my view, is far too vague

for BBS commentaries.

In conclusion, it may have helped if the authors had tried to implement

more plausible constraints and investigated the conditions under

which language may be particularly helpful or not. This is just one

possibility, but may illustrate a kind of approach that I may have

found easier to recommend for commentary. As it stands, the paper

is not up to the standards of BBS.

Further comments:

Overall methodology seems sound, although I didn't check details.

Section 2.3.1 could be substantially abbreviated (e.g. omit eq. (2)),

since most of it is standard and irrelevant for the studies.

As in any modeling study, issues of model evaluation and robustness

must be addressed at least to some degree. Most of the following

points are acknowledged by the authors in sections 5.2 and 5.3.

- Effects of noise, at various stages of the models

- Convergence

- Robustness with respect to simulation parameters, e.g. learning


- Consideration of environmental constraints, distribution and/or

selection of topics and contexts; physiological constraints. This

would give the possibility to relate the results to other

discussions (e.g. BBS (1997) 20:167-228).

Data: Test sets such as the Munsell colors may be used as standards

to test and compare color categorization, because they essentially

cover the set of perceptual colors. But they are not a good model

of "real world" color distributions. The fact that spectral data

are used and converted to CIE Lab values is, as far as I can see,

completely irrelevant. Any set of random tristimulus values could

have been used, since no attempt is made to relate the obtained color

categories to experimental data.

The same applies to the argument about contexts (section 5.3). Color

categories are usually tested "without" context, but nevertheless have

been acquired with different contexts. After all it is the acquisition

of categories that is modeled here.

The first sentence of the Conclusions (section 5) makes claims that

are far too strong and needs to be toned down.

Check grammar/typos



Referee #2 Anonymous

In this work the authors tested some hypothesis on colour categorisation by

using computational simulations. The authors first reviewed some controversies

regarding the debate of how colour catergorisation arises, particularly whether

it is an universal or cultural character, whether it is genetically determined

or is empirically learned, and is influenced by language or not. Then, to test

different possibilities, they simulated the acquirement of colour

categorization in four different scenarios. Firstly, they tested two scenarios

of learning without language: individualistic learning or genetic evolution.

Then, they also tested two scenarios of learning with language: cultural

learning and influence of language on the genetic evolution of colour

categorization. The simulation involved two tasks called discrimination game

and guessing game. Munsell colours were used. It was measured the average

discriminative success and average numbers of colour categories of several

populations. The results were also showed as focal point and extent of colour

categories plotted in a chart of saturated Munsell colours.

There were three main results. i)In all four scenarios, it was developed an

adequate repertoire of colour categories. ii)In genetic evolution scenario and

in the two learning-with-language scenarios, the colour categories were

completely shared among individuals within a population, but individualistic

learning scenario resulted only in incomplete sharing of colour categories. iii)

In all four scenarios, the colour categories were not shared across populations

evolving independently.

This work is well suited to a publication in BBS. It concerns to an

interesting and controversial issue, shows interesting results, and is well



1. My main concern is related to the way how the results are discussed. From

my point of view, the results were not enough to make useful distinction

between the possibilites presented in the Introduction, and a comparison with

real data collected by anthropologists, psychologists, and others on the

subject is most wellcome. Thus, I recommend the authors to make an effort to

bring real data in the discussion of their own results. In particular, it is

intriguing that the main difference in the results for the four scenarios was

the absence of complete sharing of colour categories in the individualistic

learning (without language). This could usefully be brought in register with

available real data.

There are some minor points that the authors might wish to consider:

2. Section 1.1. Colour is an interesting subject for categorization

paradigms, but certainly not because "it is ease to gather data" about colour

categorization. There are so many physical and physiological problems

surrounding stimulus presentation and subject response in experiments involving

colour, that it might turn to be one of the most difficult subject to gather

reliable data.

3. Section 1.1. The scarce citation of the literature on colour vision

neurobiology worries me very much. Zeki has a very particular approach to the

subject and a lot more has been done after De Valois important contribution to

the understanding of colour coding in neuronal populations. The authors might

wish to expand the Introduction in this topic. A reference source can be found

in {Gegenfurtner KR, Sharpe LT (editors) Color Vision: From Molecular Genetics

to Perception. Cambridge, England: Cambridge University Press, 492 pp.}, among


4. Section 1.1. The choice of English and French to illustrate how different

languages use different words for colour can only be justified if the idea is

to select a pair of very similar languages. But it is always very dangerous to

play with languages without some statement about their position in

the "language tree". {The night/eight joke only makes sense in indo-european

languages and so on.}.

5. Section 1.2. The nativism case is obscurely presented. It is said that

the "process is triggered by normal colour stimuli" and then that "poor

stimuli or the absence of feedback is not a problem". It is also said that all

human beings share the same "colour genes". It is now largely known that there

is at least one important case of polymorphism in the normal human population,

the alanine/serine replacement at codon 180 of the L and M pigments.

6. Section 2.3.1. Attention to the definition of S(l), E(l), and f(x).

7. Section 3.3. Four hundred years is not a long period in terms of human


8. Section 5.3. It would be interesting to exploit the possibilites of

learning colour categories by using chromatic stimuli embbeded in natural

contexts instead of isolated chromatic stimuli, once in "real life" tasks the

context is also important.


Referee #3 Davidoff, Jules <>

Steels & Belpaeme: Computational simulations of colour categorisation and

colour naming

In a very nice account the authors present what at first reading could appear

to be a surprising conclusion. They argue for a cultural specificity for colour

categories. Despite the current interest in Whorfian ideas (Bowerman &

Levinson, 2001; Davidoff, 2001, TICS), these views do not (yet) represent the

majority within cognitive psychology. (I remember having a hard job convincing

you of its worth in Montreal this summer). However, they are certainly current

and ought to attract a lot of commentary especially given the strong claims

made by Steels and Belpaeme. I would definitely recommend it for publication.

The authors admit to limitations and the omission of hybrid accounts that would

merely restrict the role of language. One such could be the proposal that a

couple of discontinuities are innate and seen in the clear minima for threshold

sensitivity at a certain wavelengths. Perhaps these minima are the natural

boundaries (say, red/yellow) and, from these, with the advantage of language

all others are created. I don't think this a convincing analysis because other

minima that do not correspond to our colour category boundaries must be

ignored. Nevertheless, I can imagine the universalist taking up that case.

As for other points that could be further addressed: 1. More might be said to

counter the Berlin and Kay argument about the similarities of colour categories

across languages. Paul Kay, Larry Hardin might want to address that one. 2.

What about categorical perception that has been reported in the prelingual

child and in non-humans? It would be very nice to hear what Bornstein thought

now of his data from 25 years ago. 3. The argument by Lindsay and Brown

recently put forward in Psychological Science. They argue against the neo-

whorfian position claiming that unconventional colour categories derive from

defective colour vision combined with the need to communicate to that sample

despite it being a minority. While not an argument that seems likely to be

correct, I have encountered it recently being grasped by the universalist camp.

Lindsay and Brown might like to defend it. 4. There is the logical

impossibility of purely perceptual categories in the philosophical paradox of

Sorites (see Roberson et al, 1999, Cognition).


Referee #4 Saunders, Barbara <>

Referee Report on Steel and Belpaeme 'Computational Simulations of Colour

Categorisation and Colour Naming'


Steel and Belpaeme use the romantic notion of 'culture' which since Herder and

von Humboldt has been associated with 'language'. This assumption

equates 'culture' with Volksgeist - the soul of the people, a hidden essence

that manifests itself in every feature of a people's life - but most especially

in the uniqueness of 'language' - the life principle. 'Culture' is the totality

of prepolitical deep and emotional bonds that the nation discovers in its own

vital nature. 'Culture' is, as it were, 'in the genes', constituting the hard-

wiring of 'difference.' This has been described as the 'happy face' of racism

and buttresses the politics of the extreme right.


The notion of 'language' that is operative in Steel and Belpaeme's article is

that of an ideal machine language 'expressing' the Romantic notion

of 'culture'. It is based on Leibnizean notions of universals. In addition it

draws on knowledge of human linguistic diversity that owes much to descriptions

written over the last four centuries under the aegis of European colonial

regimes around the world. This knowledge is relative to its conditions of

production. Authorial interests shaped it as well as the authoritative

character that 'language' took on as the natural symbol of colonial difference.

Evolutionist biology then shaped European and colonial theories of language.

The positivist vision of language progress derived from Locke and worked up

both by Spencer and Darwin, produced a measure of the inequality

(or 'relativism') between languages which emphasized conceptual precision and

communicative efficiency. This was buttressed by rigorous descriptions

of 'linguistic categories' - notably of colour. The upshot is a scientized

version of the difference between modern European languages and speakers of

less evolved languages. Suitably objectified 'language' becomes a powerful

naturalized intrument for colonial power.

Steel and Belpaeme erect their model of culture and language on these

presuppositions. On the use of such arguments in Belgium and in wider

nationalist ideologies, they might consult Blommaert and Verschueren (1998)

Debating Diversity. Analysing the Discourse of Tolerance, London: Routledge.

No principle of demarcation between humans and machines While there are some

disclaimers and denials about talking about human colour vision, throughout

there is what I would call the 'blurring of genres'. Machine simulation and

human colour perception are run together.

What is the difference between a thermostat that turns on the furnace when the

temperature drops to 17 degrees, or a parrot trained to say 'thats red' in the

presence of red things, and a genuine, inferential reporter of those

circumstances? Each classifies particular stimuli as being of a general kind -

the kind namely that elicits a repeatable response of a certain kind. But what

is needed is the distinction between merely responsive classification

(thermostats, parrots, automata) and specifically conceptual classification.

The human reporter must, unlike the thermostat and the parrot and the

automaton, have the concept of heat or colour. For colour, what we need is the

principled distinction between wavelength responders and colour-see-ers. Steel

and Belpaeme do not provide such a distinction.

In contrast to responsive classification ('categories'), to grasp or understand

a 'concept' is to have practical mastery over the inferences involved - to know

in the practical sense of being able to distinguish (as a kind of know-how)

what follows from the applicability of a concept, and what it follows from. The

parrot does not treat 'thats red' as incompatible with 'thats green' nor for

does it follow from 'that's scarlet' the entailment 'that's coloured'. Insofar

as the repeatable response is not for the parrot caught up in practical

proprieties of inference and justification, and so of the making of further

judgements, it's not a conceptual or a cognitive matter at all.

Concepts and categories

Steel and Belpaeme's master concept is 'representation', in terms of which

everything is understood. It is also arguably the dominant research program of

this time and holds in place a general conception the simpler forms of which

are exhibited in the activity of non-concept-using creatures. On that basis

representations are held to elaborate ever more complex forms until something

recognisable as specifically conceptual is reached. Steel and Belpaeme's colour

categories hover somewhere here between non-conceptual and conceptual beings.

In their simulations no know-how is involved, and there is no recognition of

incompatibilities or entailments. The simulations are like the parrot and

thermostat which also have the capacity to be trained up.

The simulations dont work for humans because there are no inferential concepts

at work, to give know-how or normative proprieties for use in practice. It is

doubtful if normativities and their proprieties could be simulated.

Wittgenstein's parable of the slabs illustrates this. In his story Wittgenstein

parodies the Augustinian theory of language and meaning, that is of labelling

objects and their properties in a kind of 'code'. It's a non-language exactly

as Steel and Belpaeme describe word-forms labelling colour categories. It's

this that Wittgenstein vilifies as the project he 's attacking - that is, the

Fregean, Russellian, Carnapian ideal language, the proto-machine language of

his day.

Examples of confusing simulations and humans are: the notion of culture used

indiscriminately for what automata and humans supposedly do; the reference to

language and meaning (entailing concepts that automata don't have); the

subsuming of both automata and humans under the reference to memetic evolution;

ditto for 'thought', 'learning', 'memory' and 'game'. References to

neurobiology and to human colour perception, categorisation and naming,

contrive to mislead the reader.


All pretence that the simulations are related to humans should be dropped. This

is not a model of humans but of machines. There are fundamental differences

between them which need to be addressed.

The three models of the origin of categories (universal/culture specific,

genetically determined or innate, causal influence on language or not) are

different facets of one model - that of the innateness of cognitive capacity -

not three different models. For a discussion of the different ways the idea of

innateness unpacks to cover all three models, see Khalidi, M.A. (2002) Nature

and nurture in cognition, Brit. J. Phil. Sci. 53: 251-272.

A principle of demarcation is needed to separate wavelength responders from

colour see-ers (stimuli responders such as worms, birds and fish, as distinct

from human concept-users).

Steel and Belpaeme should rethink their notions of 'culture' and 'language'.


Referee #5 Endrikhovski, Serguei <>

This paper is an important step forward in understanding and modelling

the problem of color categorization. The author's approach to this

problem is sufficiently novel and, in my view, it is suitable for Open

Peer Commentary (with minor revisions) because it addresses a general

computational framework, which is relevant to various behavioral and

brain science disciplines. The following list presents specific comments

and suggestions to the authors.


- It would be beneficial if the authors explicitly define the terms

"color categorization" and "color naming" in their paper. The validity of

the authors' logic and conclusions are sometimes related to exact

definitions of these terms. For example, one can define color

categorization as the process by which distinct color entities are

treated as equivalent. Note that such definition implies that the

process of color categorization involves not only discrimination task but

also a grouping task. From this perspective, the simple Discrimination

Game used by the authors to simulate the process of color categorization

is not sufficient, and the proper simulation of color categorization

should also include some kind of Grouping Game -- the process which

was not addressed by the authors in their model.

- In Section 2.2, the authors describe the method of simulating

environment. There is one problem with the proposed simulation. The

Munsell color chips used by the authors are distributed more or less

uniformly in the perceptual color space (this was one of the goal of the

Munsell system). Therefore, each part of the color space has an equal

probability to be chosen as a part of the environment. However, colors in

the real world environment are not distributed uniformly. Data reported

by Burton and Moorhead (1987), Hendley and Hecht (1949), Howard and

Burnidge (1994), and Yendrikhovskij (2001) showed that naturally

occurring colors are distributed within a relatively restricted area of

the chromaticity diagram. Basically, there are three important groups of

colors in nature: water, sky, and distant objects fall within a blue

region; green plants fall within a yellow-green region; earth and dried

vegetation are yellow to orange-red. There are very little colors in the

green-blue and red-blue regions.

- In section 3.2 the authors claim physiological constraints

and the environmental and ecological constraints, are not enough

to drive the agents to the same solution space. Different solutions are

possible for the same task in the same environment". Similar conclusions

are made in Sections 5.1 and 5.2. It is important to realize that this

conclusion can be valid only if colors in the environment are distributed

uniformly. As indicated earlier, this is probably not the case with

actual color in real world. There appear to be more "blue", "green",

"yellow", and "red" colors than, for example, "blue-green" and "blue-red"

colors. So if an agent would play the Discrimination Game with samples

distributed similar to the real world, the probability of being the topic

of the game would be higher for the "blue", "green", "yellow", and "red"

colors than for "blue-green" and "blue-red" colors. Consequentially, more

agents would include "blue", "green", "yellow" and "red" colors in their

repertoire, which would drive the agents to a similar solution space. It

seems that the authors consider this possibility when claiming (in

Section 5.2) that "it might be possible to tighten physiological and

environmental constraints in the model so that they inevitably drive

genetic evolution (or learning) to a universal set of color categories."

But then the authors immediately conclude that "none of this is very

likely". This is a very disputable conclusion.

- In Section 2.3.1, the authors claim "opponent channels yellow-blue (a*)

and red-green (b*)". The correct statement should be "opponent channels

yellow-blue (b*) and red-green (a*)".

- The computational simulations presented in Section 3.2 have few

arbitrary chosen parameters. For example, why the context (i.e.,

environment) of the games was represented by 4 items, and why this items

were at a minimum Euclidean distance of 50 from each other in the CIELAB

color space? What would happen with the results of the simulations if the

there would be smaller/larger number of samples and smaller/larger

distances between them? What would happen with the context of the game

containing a mixture of "similar" items (e.g., at a distance less than 50

from each other) and "different" items (e.g., at a distance more than 50

from each other) as it usually occurs in the real world? I would be

useful to discuss or at least briefly address these issues in the paper.

- In Section 5.2, the authors claim that "So far no one has been able to

identify which physiological and environmental constraints could explain

universal colour categories-- and neither are we aware of

successful attempts to explain the universal colour categories based on

human ecological conditions". One of the recent attempts in this

direction was done by Yendrikhovskij (2001), who proposed a computational

model of color categorization and provided preliminary evidence that the

process of color categorization can be explained from of a non-uniform

distribution of colors in perceived environment.

- The inter- and intra- population category variance tables presented by

the authors are very interesting. However, the scientific value of the

paper would be significantly increased if these tables are compared with

actual data from psycholinguistic studies such as the Color World Survey

( Such a comparison would be the

crucial validity test of the computational simulations proposed by the


- It could be useful to broaden the scope of the paper by discussing

possible applications of the proposed simulations for other

perceptually-based and cognitively-based categories. One of the

interesting questions in this respect is why many languages have "basic"

color terms (e.g., red, green, blue, etc.) and "basic" taste terms (e.g.,

sweet, bitter, etc.) but no "basic" sound terms?


- G. J. Burton, and R. Moorhead, "Color and spatial structure in natural

scenes," Applied Optics, 26, pp. 157-170, 1987.

- C. D. Hendley, and S. Hecht, "The colors of natural objects and

terrains, and their relation to visual color deficiency," Journal of the

Optical Society of America, 39, pp. 870-873, 1949.

- C. M. Howard, and J. A. Burnidge, "Colors in natural landscapes,"

Journal of the Society for Information Display, 2, pp. 47-55, 1994.

- S.N. Yendrikhovskij, "Computing Color Categories from Statistics of

Natural Images," The Journal of Imaging Science and Technology, 45, pp.

409-417, 2001


ACC 2 3






ACC 1 3 4

MIN 2 5





ACC 3 4

MIN 1 2 5





ACC 2 3 4






ACC 2 3





[6] THEORY :

ACC 2 3


MAJ 1 4



[7] LENGTH :

ACC 1 2 3 4 5







MIN 2 5