**Next message:**HARNAD Stevan: "Mazur - Testosterone - Ready for Quote/Comment"**Previous message:**Lyons Tim: "Re: Howe 3"**Maybe in reply to:**Whitehouse Chantal: "Schyns Comms 01-03 abdi benson braisby"**Next in thread:**Whitehouse Chantal: "Re: Schyns Comms 01-03 abdi benson braisby"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]

*> From: Whitehouse Chantal <cw495@psy.soton.ac.uk>
*

*>
*

*> Commentry 1- Abdi, Valentin, and Edelman
*

*>
*

*> > the principal component approach used on objects described by a low
*

*> > level code (i.e., pixels, Gabor jets).
*

*>
*

*> but I haven't a clue what that is.
*

A pixel code is one that represents an image as dark and light dots on a

huge image, like a newspaper photo blown up enormously (pixel = picture

cell). You can think of it as a matrix of dots the width and height

of the image. Each pixel would be a dimension. So if the matrix is 100

x 100 pixels, it could be stretched out into a 10,000 pixel "vector," for

each "point" in pixel space.

A simple 2-D example is this: If you have a 3 x 3 matrix of pixels,

let's assume each is either a zero (= nothing) or has a * in it.

Visualise a cross in this matrix, consisting of a vertical column and a

horizontal row of pixels:

0*0

***

0*0

That's the 3x3 pixel representation; then the stretched out

9-bit vector would be:

0*0***0*0

replacing the * with one's gives

010111010

You can now think of ANY pattern in the image as being some point in

9-dimensional space (much the way that you can think of any

point in real space as a point in 3D XxYxZ space (X for height, Y for

width, Z for depth, except that each dimension in this simple space

has only two points, 0 and 1; if the points could vary in lightness

from 0% to 100%, then you'd get a space more like what you are used to,

but in 9 dimensions instead of 3).

So every possible pattern of *'s in a 3x3 pixel image becomes a point in

9-dimensional pixel space. Think of the *'s as points of light hitting

your retina, and think of the pixels as the cones in your retina. If the

cone is hit by light, it fires, if not, then not. So each cone is a

light-detector (= a very simple feature detector); the 9-D vector is

a complex feature detector consisting of the firing or not-firing of

each of the nine cones in that little 3x3 cone part of your retina.

Now 9 dimensions of fire/not-fire is not a lot, but imagine your whole

retina now, and that your cones can not just fire vs. not-fire, but

can fire to any degree, from 0%-100%. The matrix there would be even

bigger than 1000 x 1000, and the stretched vector would be one

million-dimensional. And the possible patterns that could fall on your

retina would be not just the patterns of 0's and 1's (light vs no-light;

fire vs. no-fire) but all combinations of degrees of firing, from 0% to

100%, and everything in between.

Points in a one-million dimensional space are just too much for any

system to handle and recognise patterns in. Something has to be done to

reduce the number of dimensions to something small enough, from which

features, hence patterns could be recognised.

The simplest feature is an on/off pixel (cone). In the 3x3 cross,

0*0

***

0*0

a useful higher-order feature would be 3 *'s in a horizontal row, and

three *'s in a vertical column. As you know, there is evidence for the

existence of such higher-order detectors of lines -- not just vertical

and horizontal, but all angles.

With a vertical and horizontal line detector you can reduce the

representation of the cross from the full 9 dimensions

010111010

to just two dimensions

11

in a two-dimensional SUBspace of the full 9D space. That subspace

consists of only the patterns that have either a horizontal or a

vertical row of three ON pixels. (How many of those are there?).

But just an ON from the vertical line detector and an on from the

horizontal line detector still isn't enough to pick out a cross, because

you would get the same pattern for

*00

***

*00

as well. So you need another detector for when the vertical and the

horizontal share their MIDDLE bit, as in.

*

***

*

Those three detectors together would be able to recognise crosses in

this simple space, and would reduce the representation from 9

dimensions to just 3. (There are of course other ways of doing this

simple pattern recognition problem.)

In such a higher-order vertical/horizontal/centre-intersection detector

space, the representation of the cross would be 111, because the

vertical detector, the horizontal detector, and the intersection

detector would all fire. The representation of any other shape with

both a horizontal and a vertical but no centre intersection would be

110. That would not be enough, of course, if you needed to recognise a

T too:

***

0*0

0*0

A lot of the simple geometric shapes would need to have detectors of

their own. Fortunately, including even "H":

*0*

***

*0*

which such a small array could not distinguish from "N" (try it).

So you would need a bigger matrix, plus detectors for lines at other

angles, and in other combinations. The good news, though, is that once

you have all your local feature detectors, they can generalise or "scale

up" to much bigger images -- maybe even the 1000 x 1000 one.

The reason they scale up easily is that an "N" is an N no matter where it

appears in your visual field. It's also an N whether it's small or

large.

So GABOR filters make use of this. They detect lines at a particular

angle anywhere in the visual field. They also detect angles of lines

whether they are small and narrow or big and wide.

So think of the Gabor filtered representation as something that reduces

the 1000 x 1000 pixel matrix of your retina and the associated one

million dimensional vector space down to a much smaller number of

SUBspaces, based on higher-order detectors for lines and angles

irrespective of their size or position on your retina.

Now PCA (principal components analysis) is a statistical technique for

taking a big matrix and reducing its dimensionality down to only

the subspaces and dimensions that matter. It is similar to factor

analysis, in which a lot of mental tests are analysed, and if people's

scores on a lot of them are correlated, then it is assumed that to some

extent they measure the same thing. So that's the first "factor." That

correlation is partialled out (in a very simple and natural way you

might remember from stats -- by using the "residuals" [ask me if you

want that explained]) and then you see whether what's left over after

you have partialled out that first correlations has has any

correlations left. If yes, you partial those out too, for the second

factor, until there is very little variation left.

(Usually 2-3 factors underlie a huge collection of test scores, and

instead of giving you score on every test, you can be given a score on

the three factors or "subscales," such as the verbal, the quantitative

and the spatial subscales; conversely, the tests themselves can be seen

as having high or low "loadings" on each of the factors, so some tests

are more verbal and less spatial, etc..)

That was just an analogy, but dimensionality is reduced in a similar way

in both cases.

Suppose for some reason we did not know that in the world points appear

in 3D space. So we tag them not only with their height, width, and

depth, but with the time of day, the temperature, the humidity, the

sound level, and the date. Points would then be in an 8D space instead

of a 3D space. But we notice that they have certain properties --

spatial, geometric properties, actually -- for which 5 of the

dimensions do not matter: the shapes are INVARIANT under any changes in

the 5 irrelevant dimensions. (A cube is a cube, regardless of the

temperature or time of day). So we could collapse all those irrelevant

parts of 9D space and reduce it all to the 3D subspace of the 8D space

that is relevant for geometric shape recognition.

PCA works a bit like that, except that the way you reduce the

dimensionality is based, as with the mental tests, on correlation.

The correlated dimensions are collapsed, and the uncorrelated ones

are left. As with tests, if you have several that are highly correlated,

you don't need them all, because they are redundant. The same is true

with dimensions. PCA finds the few dimensions that matter. The

correlation between them should be near zero, because they are really

all giving us independent, relevant information. In this case, they

would turn out to be the 3D space of X/Y/X: height, width, depth, which

are at right angles (orthogonal) to one another, so could not possibly

be more uncorrelated!

If you actually did the experiment of using the 8D data (including

temperature) to classify geometric shapes, a PCA would quickly find that

3 dimensions was enough, and would collapse the 5 irrelevant directions.

But of course 3D is still a lot. and even the 2D projection of all the

3D objects in the world is a lot, especially once it is stretched into

the million-dimensional vector. But PCA can be used to reduce that

million-dimensional vector to a smaller and more tractable size, again

by rooting out what is correlated and redundant or irrelevant in

the shadows cast on our retinas by the things we need to recognise.

This qualitative tour of dimension reduction on which I have just taken

you is really all you need in order to understand the Schyns paper and

commentaries.

*> > The pca approach represents faces by their projections on a set of
*

*> > orthogonal features (principal components, eigenvectors, "eigenfaces")
*

*> > epitomizing the statistical structure of the set of faces from which
*

*> > they are extracted. These orthogonal features are ordered according to
*

*> > the amount of variance (or eigenvalue) they explain, and are often
*

*> > referred to as "macro-features" (Anderson & Mozer, 1981) or
*

*> > eigenfeatures by opposition with the high level features traditionally
*

*> > used to describe a face (e.g., nose, eyes, mouth).
*

*>
*

*> Unfortunately this doesn't help me much. How are these eigenfeatures
*

*> represented? Are they a set of codes, written descriptions of features,
*

*> or an actual visual display of facial features?
*

You start with the shape of a face on the retina. Then you stretch that

out into a long vector specifying a point in a high-dimensional vector

space. Then you do PCA dimension reduction to reduce the dimensionality

of the representation from a point in the huge space to a point in a

much smaller feature subspace. One of the features might be the distance

between two circular parts (the eyes), just as the features of the

crosses I described were vertical and horizontal lines. The PCA can find

a minimal set of features that will sort the faces as they need to be

sorted (by facial expression? by identity? by age? by gender? by family?

by race?).

Features are really parts of the original, raw, shadow that the face

cast onto your retina (matrix). The feature detector picks them out, and

ignores all the rest of the variation in the rest of the dimensions: It

is just interested in what is INVARIANT in the category you are trying

to pick out.

Yes, they are a kind of code; not a written code -- though they can

usually be described verbally. I think the feature detectors are best

thought of as FILTERS, through which the retinal shadow passes, losing

most of its details and preserving only the invariant ones (for whatever

categories need to be sorted).

*> > Because they are optimal for the set of faces from which they are
*

*> > extracted, eigenfeatures are less efficient for representing faces
*

*> > from a different population and thus generate class-specific effects
*

*> > such as the other race effect
*

This is the issue of generalisation and scaling: The features extracted

by PCA from this set of faces, may or may not work for another set of

faces.

*> This appears to be saying that if we see lots of Caucasian
*

*> faces and not many Japanese faces then we won't be able to
*

*> efficiently represent Japanese faces and so label them as "other
*

*> race" faces. However earlier in the commentary they said that,
*

Yes, and I think that's true: It's what's behind the joke (which

in China they of course tell exactly in reverse) about the Caucasian

customer in the Chinese restaurant who asks how come all Chinese waiters

look the same, whereas all Caucasian ones are so easy to tell apart...

*> > Eigenfeatures are flexible in that they evolve with the faces
*

*> > encountered (Valentin, Abdi, & Edelman, 1996).
*

*>
*

*> This seems to imply that if a person sees more Japanese faces over time
*

*> then they will be able to represent them more efficiently and be less
*

*> likely to label them as "other race" faces. This doesn't seem to make
*

*> sense logically. If we see lots of Japanese faces we don't start to
*

*> think of them as more Caucasian.
*

No, but we do begin to find the invariant features that are more

characteristic of Japanese faces than Caucasian. We either develop a

specific set of Japanese feature detectors, or, more likely, we enlarge

or refine our existing set of facial feature detectors till they are

able to distinguish Japanese faces as well as Caucasian ones.

*> Commentary 2- Benson
*

*>
*

*> > Assuming primary visual cortex (V1) is necessary for object
*

*> > recognition strongly suggests the geniculostriate pathway is
*

*> > fundamental in bootstrapping the dimensionality reduction process.
*

*>
*

*> The dimensionality reduction process is the idea that our environment
*

*> is made up of hundreds of dimensions that we need to condense in some
*

*> way in order to make sense of the, "blooming, buzzing confusion".
*

*> Benson is saying that this condensation process is done in part by the
*

*> actual visual process. i.e. as the information is being taken from the
*

*> retina to the visual cortex, some sort of coding is occurring which
*

*> allows the information to be condensed.
*

Exactly.

*> > For every relevant (detected) feature of a homogeneous class,
*

*> > experience dictates either continuous or discrete measurement. In the
*

*> > former, this leads naturally to a feature vector which includes
*

*> > population sample variance information (variance may be asymmetric
*

*> > about the mean). Identification of a discrete feature immediately
*

*> > enhances categorisability.
*

*>
*

*> A feature can be given either a continuous or a discrete measurement,
*

*> e.g. either a value from the continuous scale 1-100, or an either or
*

*> value such as 0 or 1.
*

Correct.

*> Benson is saying that a discrete value for a
*

*> feature helps categorization of an object made from many such features
*

*> because it already has a discrete category itself. But maybe the degree
*

*> of a feature is important for categorization. Imagine someone was
*

*> describing two different animals to you in terms of features such as
*

*> whether it had fur or not. One animal is very furry and the other has
*

*> little fur. If you are giving features discrete measurements (with 1
*

*> being "fur" and 0 being "no fur") then both animals would be given 1
*

*> for the fur feature. If you were using continuous values, the very furry
*

*> animal could be given 80, and the animal with little fur could be given
*

*> the value 10 for the fur feature. The second case would help you to
*

*> categorise the animals more easily.
*

You're right. 0/1 furry would be too coarse-grained to sort these animals.

But notice that the highly furry and the minimally furry values along

the continuous dimension are still pretty far apart; the dimension would

not be much use if the animals varied in furriness along the whole

dimension. Either you would have to get very good at telling apart tiny

differences in furriness at the boundary, or you would need to use other

features instead.

*> Commentary 3- Braisby and Bradley
*

*>
*

*> > Schyns et al. argue that flexibility in categorisation implies
*

*> > 'feature creation'. We argue that this notion is flawed, that
*

*> > flexibility can be explained by combinations over fixed feature sets,
*

*> > and that 'feature creation' would anyway fail to explain
*

*> > categorisation. We suggest that flexibility in categorisation is due
*

*> > to pragmatic factors influencing feature combination, rendering
*

*> > 'feature creation' unnecessary.
*

*>
*

*> > Schyns et al. argue that fixed feature sets limit the representational
*

*> > (and classificatory) capacity of a conceptual system. However, they
*

*> > incorrectly claim that "Any functionally important difference between
*

*> > objects must be representable as differences in their building blocks"
*

*> > (Section 1.1, paragraph 3). However, this ignores the modes of
*

*> > combination of those building blocks
*

*>
*

*> True. As we know we are born with the ability to identify and make
*

*> sense of certain features such as those that make up the human face.
*

*> It seems to make more sense that we are born with a fixed set of
*

*> features which we learn to combine in different ways to make sense of
*

*> new things rather than somehow actually learn new features. Why
*

*> should we not come equipped with all the necessary building blocks?
*

Perhaps we do. But do you remember the dimensionality problem? There

might (just might -- I'm not saying it's so) be too many possible ways

to categorise things to make it economical to be born with the features

for doing all of them. B & B add that we can always make new

combinations of fixed features, and that may be all the extra

flexibility we need, and they may be right. But hear two there is room

for two possibilities: If features are added together the way they are

in a rule expressed in a sentence: "It's round and green and bigger than

a breadbox" then we are really just explicitly combining features we

already have detectors for. But it's possible that using a combination

of detectors eventually creates a unitary detector that no longer needs

to do it by explicitly combining simpler detectors. It may become

automatised just as the simple detectors are, so that it picks out

"round-green-bigger-than-a-breadbox" things, lets call them "Ragbatabs"

as quickly and directly as it picks out green things.

One way of interpreting Schyns, Goldstone & Thibaut's "created" features

as just this: an automatised combination of prior features that has been

put together in the service of a new categorisation task. But it could

become more "creative" than that: We may have detectors for lines and

angles and even squares, circles and triangles, but we certainly don't

and can't have detectors for every possible shape of blob. Yet some

specific blob shapes might turn out to be very important to identify

(say, in cancer screening). If we can construct dedicated detectors to

pick out and identify those quickly, reliably and automatically, would

that too just be a combination of prior fixed features?

*> > Fodor argues that systems cannot increase
*

*> > their logical power (acquire wholly new features) via learning: the
*

*> > system's vocabulary and mechanisms must already be able to express the
*

*> > 'new' feature, and so that feature has not been 'created'.
*

*>
*

*> The whole idea of creating new features provides such a puzzle. It
*

*> appears to be much simpler for the system to come ready prepared
*

*> with the necessary features and a flexible set of rules for combining
*

*> these features. It would be easier and make more sense for the rules
*

*> to be developed rather than the actual features.
*

You're right if (1) the fixed set out of which everything else can be

built is not too big (i.e., if there is no dimensionality reduction

problem) and if (2) all features we ever need in a life of

categorisation are just combinations of those fixed ones; but would a

dedicated blob-detector -- one that could be constructed for any possible

2-dimensional shape it might turn out to be important to identify -- just

be a combination of prior features? Could we be born with fixed

detectors for all possible blobs?

And at what point does putting together special combinations of

dimensions become such a demanding (creative?) task that it deserves to

be called feature creation?

*> > Despite this being a critical problem, Schyns et al. fail to address
*

*> > it properly. They state that "...categorisations, rather than being
*

*> > based on existing perceptual features, also determine the features
*

*> > that enter the representation of objects" (Section 1.2.4, paragraph
*

*> > 1). Their position appears circular, since they employ 'feature
*

*> > creation' to explain categorisation, but claim that categorisation
*

*> > itself determines 'feature creation'.
*

What they mean is that how we approach a new categorisation problem is

determined by the repertoire of feature detectors we already have --

not just the fixed ones, but the "created" ones too. First we will try

to see new things with our existing feature detectors. If this works,

fine; if not, if the prior detectors produce a "bias" that does not

result in correct sorting, then we may have to "create" a new detector.

It, in turn, will influence our future categorisations... No

circularity, just a cycle of (1) try to fit everything with existing

features, (2) succeed? fine; fail? (3) create new feature-detectors,

(4) go to (10...

**Next message:**HARNAD Stevan: "Mazur - Testosterone - Ready for Quote/Comment"**Previous message:**Lyons Tim: "Re: Howe 3"**Maybe in reply to:**Whitehouse Chantal: "Schyns Comms 01-03 abdi benson braisby"**Next in thread:**Whitehouse Chantal: "Re: Schyns Comms 01-03 abdi benson braisby"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]

*
This archive was generated by hypermail 2b30
: Tue Feb 13 2001 - 16:23:20 GMT
*