Using associative memory principles to
enhance perceptual ability of vision systems (by
Dmitry
O. Gorodnichy et al)
Presented at the First IEEE CVPR Workshop on Face
Processing in Video (FPIV'04),
Washington DC, June 28, 2004
Abstract: The so called associative thinking,
which humans are known to perform on every day basis, is
attributed to the fact that human brain memorizes
information using the dynamical system made of
interconnected neurons. Retrieval of information in such a
system is accomplished in associative sense; starting from
an arbitrary state, which might be an encoded representation
of a visual image, the brain activity converges to another
state, which is stable and which is what the brain
remembers. In this paper we explore the possibility of using
an associative memory for the purpose of enhancing the
interactive capability of perceptual vision systems. By
following the biological memory principles, we show how
vision systems can be designed to recognize faces, facial
gestures and orientations, using low-end video-cameras and
little computational power. In doing that we use the public
domain associative memory code.
Paper: pdf,
Talk slides: 2.7Mb
-
The gist of this approach: The
binary representations of faces are stored as global attractors of
the binary fully connected neural network. Starting from an
arbitrary (unseen) state, such as a new video image, the network
converges to one of those attractors. I.e. attractors represent the
memories; there are as many attractors, as faces shown in training
stage. NB: for a network on size N, there only about 50%N good
attractors.
Another, more biologically justified approach and which does not
limit the number of presented training images is to have extra few
neurons, the states of which are used to encode different faces (or
classes). See Further Developments below.
-
The videos of the experiments
described in the paper: (also downloadable from this
AVI directory)
Demo 1: Memorizing/recognizing user's face orientation:
demo-fr-rot-diff-lighting-3fps.avi
Demo 2: Memorizing/recognizing user's facial
expressions: memorizing-expressions-2fps.avi
Demo 3: Memorizing/recognizing user identities:
demo-fr-m-d-2fps.avi
A few other recoded videos:
Demo 4: Shows how to memorize a new face
(expressions):
memorizing-A-10rot.avi
Demo 5: Several runs with 30 faces shown at left (stored in this
directory and loaded from this face-names.txt
file). Last four pictures are from the photograph:
rot-exp-id-30.avi,
rot-exp-id-31.avi
-
A simple program which you can use
to test the technology yourself:
(also downloadable from this BIN directory)
video-memory-may04.exe
In order to run this program on your PC, you only need to have
a web-cam and the following .dll files: CV, cvaux, ilp,
plpx, Msvcrtd (downloadable from here) put in the directory from where
you run the program.
Description of the program:
-
It runs in either 0) Memorize or 1)
Recognize mode. What is memorized is determined by the Video
channels selected. In the current version, the Luminance channel is
used only - the program memorizes the transformed to the canonical
24x24 representation (described in paper) faces detected by Haar-like
wavelets using OpenCV library. Colour and Motion channels are included
for completeness only.
-
Faces can be taken either from 0)
video (by selecting Video Source Device as shown in the figure) or
1) hard-drive (from the location specified in face-names.txt
file which is read by default at the start of the program or from
the file selected though the menu). After a face is memorized from
video, the
program switches automatically to recognition mode, so that not to
get saturated.
-
The black-n-white image at right
shows the contents of the memory (as described in the paper). To the
the entire 574x574 synaptic weight matrix select View-> Video
Stream-> Extended Memory Contents
At any point, you can clear the memory by checking "Clear
Memory!"
-
To store faces from video on
hard-drive (so that you can make your own list faces, select
View-> Video Stream-> Trace Mode
-
The result of recognition is shown as
such:
- The face (out of all stored) closest to the attractor into which
the network converged is shown in read.
- The result which is consistent over time is shown also as
Response, where the number of video frames used in consistency
verification is set by Temporal filter slider.
- The Hemming distances from the Response image to the original
(Stimulus) image and the attractor-converged image is shown as
Correction ratio. The number of network iterations
before reaching an attractor is shown in brackets.
- 30(#2) refers to the description
of the response image as given in face-names.txt
file, such a
person's name
or code for facial expression/orientation, (2) - means consistent over 2 frames results.
|