Chapter 7: Pattern Sensitivity


In this chapter we will consider measurements and models of human visual sensitivity to spatial and temporal patterns. We have covered topics relevant to pattern sensitivity in earlier chapters as we reviewed image formation and the receptor mosaic. Here, we will extend our analysis by reviewing a collection of behavioral studies designed to reveal how the visual system as a whole detects and discriminates spatio-temporal patterns.

The spatial pattern vision literature is dominated by detection and discrimination experiments, not by experiments on what things look like. There are probably two reasons why these measurements make up such a large part of the literature. First, many visual technologies (e.g. televisions, printers, etc.) are capable of reproducing images that appear similar to the original, but not exactly the same. The question of which approximation to the original is {\em visually best} is important and often guides the engineering development of the device. As a result, there is considerable interest in developing a complete theory to predict when two different images will appear very similar. When we cannot reproduce the image exactly a theory of discrimination helps the device designer; the discriminability theory selects the image the device can reproduce that appears most similar to the original. Threshold and discrimination experiments are indispensable to the design of discrimination theories.

Second, many authors believe that threshold and discrimination tasks can play a special role in analyzing the neurophysiological mechanisms of vision. The rationale for using threshold and discrimination to analyze the physiological mechanisms of vision is rarely stated and thus rarely debated, but the argument can be put something like this. Suppose the nervous system is built from a set of components, or {\em mechanisms}, that analyze the spatial pattern of light on the retina. Then we should identify and analyze these putative mechanisms to understand how they contribute to perception of spatial patterns. Threshold performance offers us the best chance of isolating the mechanisms because, at threshold, only the most sensitive mechanisms contribute to visibility. If threshold performance depends upon the stimulation of a single mechanism or small number of mechanisms, then threshold studies can serve as a psychologist’s dissecting instrument: At threshold we can isolate different parts of the visual pathways. After understanding the component mechanisms, we can seek a unified theory of the visual system’s operation.

I am not sure whether this rationale in terms of visual mechanisms adequately justifies the startling emphasis on threshold measurements. But, I think it is plain that by now we have learned a lot from detection and discrimination experiments (DeValois and DeValois, 1988; Graham, 1989). Many of the basic ideas used in image representation and computer vision were derived from the work on detection and discrimination of spatial patterns. This chapter is devoted to an exposition of some of those experiments and ideas. In Chapters 9, 10, and 11 we will take up the question of what things look like.

A Single Resolution Theory

The problem of predicting human sensitivity to spatial contrast patterns has much in common with other problems we have studied: image formation, color-matching, or single unit neurophysiological responses. We want to make a small number of measurements, say sensitivity to a small number of spatial contrast patterns, and then use these measurements to predict sensitivity to all other spatial contrast patterns.

In 1956, Otto Schade had some ideas about how to make these predictions and he set out to build a photoelectric analog of the visual system in order to predict visual sensitivity. His idea was to use the device to predict whether small changes in the design parameters of a display would be noticeable to a human observer. For example, in designing a new display monitor an engineer might want to alter the spatial resolution of the device. To answer whether this difference is noticeable to a human observer, Schade needed a way to predict the sensitivity of human observers to the effect of the engineering change.



Figure 7.1: A computational model of the human visual system. Otto Schade designed a visual simulator to predict human visual sensitivity to patterns. His model incorporated many features of the visual pathway, and it may be the earliest computational model of human vision (From Schade, 1956).


Figure 7.1 shows a schematic diagram of Schade’s computational eye. I include the diagram to show that Schade created a very extensive model that incorporated many visual functions, such as optical image formation, transduction, how to combine signals from the different cone types, adaptation, spatial integration, negative feedback, and even some thoughts about correction, interpretation and correlation of the signals with stored information (i.e. memory). In this sense, his work was a precursor to the computational vision models set forth by Marr (1980) and his colleagues.



Figure 7.2: The neural image is a psychophysical construct. The activity of a hypothetical array of neurons, whose properties are selected to mimic some collection of neurons in the visual pathway, are represented as an image. The intensity at each location in the neural image is proportional to the response of a neuron whose receptive field is centered at that image point.


The neural image

A fundamental part of Schade’s computation — and one that has been retained by the field of spatial pattern vision — is his suggestion that we can summarize the effects of the many visual components using a single representation now called the {\em neural image}\footnote{ To the best of my knowledge, John Robson suggested the phrase neural image. }. A real image and several neural images are drawn suggestively in Figure 7.2. The idea is that there is a collection of neurons whose responses, taken as a population, capture the image information available to the observer. The responses of a collection of neurons with similar receptive fields, differing only in that the receptive fields are centered at various positions, make up a neural image.

Figure 7.2 shows how the neural image concept permits us to visualize the neural response to an input image. At several places within the Figure, I have represented the neural responses as an image. The intensity at each point in these neural images represents the response of a neuron single whose receptive field is centered at the corresponding image point. A bright value represents a neuron whose response is increased by the stimulus and a dark value a neuron whose response is decreased.

The assumptions we make concerning the receptive field properties of neurons comprising the neural image permit us to calculate the neural image using linear methods. For example, suppose the receptive fields of a collection of neurons are identical except for the position of the receptive field centers; further, suppose these are uniformly spaced. In that case, we can calculate the mapping from the real image to the responses of these neurons using a shift-invariant linear mapping, i.e. convolution.

The neural images shown in Figure 7.2 illustrate the idea that different populations of neurons may represent different types of information. One neural image is shown near the optic nerve; this neural image is drawn to represent the responses of the midget ganglion cells in a small region of the retina near the fovea. Near the fovea, the receptive fields of the midget cells are all about the same, except for displacements of the center of the receptive field; this neural image is a shift-invariant linear transformation of the input image. The neural images located near the cortical areas are transformed using oriented receptive fields. Of course, the information represented in the neural image at coritcal locations depends on transformations of the signal that take place all along the visual pathway, including lens defocus, sampling by the photoreceptor mosaic, and noisy signaling by visual neurons.

Schade’s single resolution theory

Schade’s theory of pattern sensitivity is formulated mainly for foveal vision. Schade assumed that foveal pattern sensitivity could be predicted by the information available in a single neural image. He assumed that for this portion of the visual field, the relevant neural image could be represented by a shift-invariant transformation of the retinal image, much like the neural image shown near the optic nerve in Figure 7.2. In this section we review the significance of this hypothesis and also some empirical tests of it.

We know that a neural image spanning the entire visual field cannot really be shift-invariant. In earlier chapters, we reviewed measurements showing that the fovea contains many more photoreceptors and retinal ganglion cells than the perihpery, and also that there is much more cortical area devoted to the fovea than the periphery. Consequently, a neural image can have a shift-invariant representation only over a relatively small portion of the visual field, say within the fovea or a small patch of the peripheral visual field.

Still, a model of pattern discrimination in the fovea is a good place to begin. First, the theory will be much simpler because we can avoid the complexities of visual field inhomogeneities. Second, because of our continual use of eye movements in normal viewing the fovea is our main source of pattern information. So, we begin by reviewing theory and measurements of foveal pattern sensitivity in the fovea. Later, we will consider how acuity varies across the visual field later in this chapter.

There are several ways the shift-invariant neural image hypothesis help us predict contrast sensitivity. Perhaps the most important is an idea we have seen several times before: If the mapping from image to neural image is shift-invariant, then the mapping from image to neural image is defined by knowing the shape of a single receptive field. In a shift-invariant neural image there is only one basic receptive field shape. Neurons that make up the neural image differ only with respect to their receptive field positions.

The analogy between shift-invariant calculations and neural receptive fields is useful. But, we should remember that we are reasoning about behavioral measurements, not real neural receptive fields. Hence, it is useful to phrase our measurements using the slightly more abstract language of linear computation. In these calculations, the linear receptive field is equivalent to the {\em convolution kernel} of the shift-invariant mapping. The shift-invariance hypothesis tells us that to understand the neural image, we must estimate the convolution kernel. Its properties determine which information is represented by the neural image and which information is not.



Figure 7.3: A shift-invariant linear neural image is formed by the responses of neurons whose receptive fields are the same except for their spatial position. The matrix tableau illustrates the computation of a shift-invariant linear transformation of a one-dimensional image. The rows of the system matrix are the one-dimensional spatial receptive fields of the neurons.


As we have done several times earlier in this book, we will begin our analysis using one-dimensional stimuli: vertical sinusoids varying only in the x-direction, If we use only one-dimensional stimuli as inputs, then we can estimate only the one-dimensional receptive field of the transformation. We can write the shift-invariant transformation the maps the one-dimensional contrast stimulus, \contrasti{x} to the one-dimensional neural image, \neuralx{x} using the summation formula,

(1)   \begin{equation*} \neuralx{x} = \sum_{y} \lspreadi{y} \contrasti{x-y} . \end{equation*}

where \lspreadi{x} is the one-dimensional receptive field. We also can express the transformation in matrix tableau (see Figure 7.3). In matrix tableua it becomes clear that the system matrix is very simple; the rows and columns are essentially all equal to the receptive field (i.e., convolution kernel) except for a shift or a reversal. Hence, by estimating the convolution kernel, we will be able to predict the transformation from contrast image to neural image.

The overall plan for predicting an observer’s pattern sensitivity is this: First, we will measure sensitivity to a collection of sinusoidal contrast patterns. These measurements will define the observer’s contrast sensitivity function (see Chapters 5 and 6). Because of the special relationship between harmonic functions and shift-invariant linear systems described in the earlier chapters and the appendix, we can use the contrast sensitivity function to estimate the convolution kernel of the shift-invariant linear transformation from image to neural image, \lspreadi{x}. Finally, we will use the estimated kernel to calculate the neural image and predict the observer’s sensitivity to other one-dimensional contrast patterns. This final step will provide a test of the theory.

Shortly, it will become clear that we must make a few additional assumptions before we can use the observer’s contrast sensitivity measurements to estimate the convolution kernel. But, first, let’s review some measurements of the human spatial contrast sensitivity function.

Spatial contrast sensitivity functions

Schade measured the contrast threshold sensitivity function by asking observers to judge the visibility of sinusoidal patterns of varying contrast. The observer’s task was to decide what contrast was necessary to render the pattern just barely visible. Because of optical and neural factors, observers are not equally sensitive to all spatial frequency patterns; the threshold contrast depends upon the pattern’s spatial frequency.

To get a sense of the informal nature of Schade’s experiments, it is interesting to read his description of the methods.

The test pattern is faded in by increasing the electrical modulation at a fixed rate and observed on the modulation meter; the observer under test gives a signal at the instant he recognizes the line test pattern, and the person conducting the test reads and remembers the corresponding modulation reading. The modulation is returned to zero, and within seconds it is increased again at the same fixed rate to make a new observation. By averaging 10 to 15 readings mentally and recording the average reading directly on graph paper, the [contrast sensitivity] function \ldots

%\Delta B = f(N)

can be observed in a short time and inconsistencies are discovered immediately and checked by additional observations.

Figure 7.4: Contrast threshold and contrast sensitivity measurements of a human observer. The contrast thresholds are plotted with respect to spatial frequency on the display rather than cycles per degree of visual angle (Source: Schade, 1956).


The contrast sensitivity function he measured is shown Figure 7.4. The horizontal axis is spatial frequency as measured in terms of the display device. The vertical axis is contrast sensitivity, namely \log ( 1 / c ) = - \log c where c is the contrast of the pattern at detection threshold. The contrast sensitivity function has two striking features. First, there is a fall-off in sensitivity as the spatial frequency of the test pattern increases. This effect is large, but it should not surprise you since we already know many different components in the visual pathways are insensitive to high spatial frequency targets: the optical blurring of the lens reduces the contrast of high spatial frequency targets; retinal ganglion cells with center-surround receptive fields are less sensitive to high spatial frequency targets.

Second, and somewhat more surprisingly, is that there is no improvement of sensitivity at low spatial frequencies; there is even a small loss of contrast sensitivity at the lowest spatial frequency. The eye’s optical image formation does not reduce sensitivity at low frequencies, so the fall in contrast sensitivity at low spatial frequencies is due to neural factors. Center-surround receptive fields are one possible reason for this low frequency fall-off.

Figure 7.5: Temporal variations change the shape of the human spatial contrast sensitivity function. The contrast sensitivity functions shown here were measured with contrast-reversing targets at several different temporal frequencies. At low temporal frequencies the contrast sensitivity function is bandpass. At high temporal frequencies the function is lowpass (Source: Robson, 1966).


Schade’s measurements were made using a steadily presented test pattern or a drifting pattern. Robson (1966; see also Kelly, 1961) made additional measurements using flickering {\em contrast-reversing} gratings. Contrast-reversing patterns are harmonic spatial patterns with harmonic amplitude variation (see Chapter 5). For example, suppose the mean illumination is \mean. Then the {\bf intensity} of the contrast-reversing stimulus at spatial frequency f_x, temporal frequency f_t, and contrast \contrast is (cf. Equation~??)

    \[ ( ~ 1.0 + a \cos (2 \pi f_t t) \cos (2 \pi f_x x ) ~ ) \mean . \]

The intensity is always positive. The spatiotemporal {\bf contrast} of the pattern is

    \[ a \cos(2 \pi f_t t) \cos ( 2 \pi f_x x ). \]

The contrast can be both positive and negative.

As the data in Figure 7.5 show, the spatial sensitivity falls at low frequencies. when we measure at a low temporal frequency (1 Hz). At high temporal frequencies, say as one might encounter during a series of rapid eye movements, there is no low frequency sensitivity loss. As we shall see later, the contrast sensitivity function also varies with other stimulus parameters such as the mean illumination level and the wavelength composition of the stimulus.

The psychophysical linespread function

Now, let’s return to the problem of using the contrast sensitivity data to calculate the convolution kernel, \lspreadi{x}. Because this kernel defines both the rows and the columns of the shift-invariant linear transformation, it is also called the {\em psychophysical linespread function}, in analogy with the optical linespread function (see Chapter 2). By now you have noticed that each time we apply linear systems theory, some special feature of the measurement situation requires us to devise some slightly different approach to calculating the system properties. The calculations involved in using contrast sensitivity measurements to predict sensitivity to all contrast patterns are no exception.

Let’s work out what we need to do to estimate the psychophysical linespread function. The general linear problem is illustrated in the matrix tableau in Figure 7.3. The input stimulus is shown as a column vector, specifying the one-dimensional spatial contrast pattern. The matrix describes how the stimulus is transformed into the neural image. We want to make a small number of measurements in order to estimate the entries of the system matrix.

We have solved this problem before, but in this case we have a special challenge. In our previous attempts to estimate linear transformations we have been able to specify both the stimulus and the response. When we obtain the contrast sensitivity measurements, however, we never measure the output neural image. We only measure the input threshold stimulus and the observer’s detection threshold. Hence, we have fairly limited information available.

Because we assumed that the neural image is a shift-invariant mapping, we do know something about the neural image: when the input stimulus is a harmonic function, the output must be a harmonic function at the same frequency. But, we do not know the amplitude or phase of the harmonic function in the hypothetical neural image. To estimate the psychophysical linespread function, we must make additional assumptions about the properties of a neural image that render it at detection threshold. From these assumptions, we will then specify the phase and amplitude of the neural image at detection threshold.

Two additional assumptions are commonly made. First, we assume that the spatial phase of the neural image is the same as the spatial phase of the input spatial contrast pattern\footnote{ We used the same assumption to infer the properties of the lens in Chapter 2. Specifically when the input pattern is a one-dimensional cosinusoid, \cos ( 2 \pi f {i / N } ), we assume the neural image output pattern is a scaled copy of the input pattern, \neuralx{i} = a_{f} \cos ( 2 \pi f {i / N } ). The scale factor, a_{f}, depends on the frequency of the input signal.

Second, we must make specify the amplitude of the neural image at detection threshold. The amplitude of the neural image should be related to the visibility of the pattern, and we can list a few properties that should be associated with pattern visibility. For example, whether the change introduced by the signal increases or decreases the firing rate should be irrelevant; any change from the spontaneous rate ought to be detectable. Also, detectability should depend on responses pooled across the neural image rather than the response of a single neuron. The squared {\em vector-length} of the responses of the neural image is a measure that has both of these properties. The squared vector-length of the neural image, \length^{2}, is defined by the formula

(2)   \begin{equation*} \length^2 = \sum_{i=1}^{N} \neuralx{i}^{2} . \end{equation*}

This formula satisfies both of our requirements since (a) the signs of the individual neural responses, \neuralx{i}, are not important because the neural image entry is squared, and (b) the formula incorporates the responses from different neurons. Other measures are possible. For example, one might assume that at detection threshold the sum of the absolute values of the neural image is equal to a constant, or one might make up a completely different rule. But, one must make some assumption and the vector-length rule is a useful place to begin.

%6 inches high, 9.25 wide



Figure 7.6: The psychophysical linespread function} can be estimated from the contrast sensitivity function. (a) A linespread estimated from Schade’s measurements. The horizontal axis is in arbitrary units because the spatial frequency of the contrast sensitivity function was reported in arbitrary units. (b,c) Linespread functions for contrast-reversing targets at 1 and 6 Hz derived from Robson’s measurements. The horizontal axis is in degrees of visual angle. 2


If we assume that at contrast detection threshold all neural images have the same vector-length, then we can specify the amplitude of the harmonic functions in the neural image. Hence, at this point we have made enough assumptions so that we can specify the complete neural image and solve for the psychophysical linespread function. Figure 7.6 shows three linespread functions estimated using these assumptions. Figure 7.6a shows a psychophysical linespread computed from Schade’s measurements (note that the spatial dimension is uncalibrated). Figure 7.6bc show psychophysical linespreads, plotted in terms of degrees of visual angle, derived from Robson’s measurements using 1 Hz and 6 Hz contrast-reversing functions. No single linespread function applies to all stimulus conditions. We will consider how the linespread function changes with the stimulus conditions later in this chapter.

Schade (1956) suggested that the general shape of the psychophysical linespread function can be described using the difference of two Gaussian functions. This description is the same one used by Rodieck (1965) and Enroth-Cugell and Robson (1966) to model retinal ganglion cell receptive fields. The correspondence between the psychophysical linespread function, derived from the behavioral measurement of contrast sensitivity, and the receptive field functions of retinal ganglion cells, derived from retinal physiology, is encouraging.

Discussion of the theory: Static nonlinearities

To estimate the convolution kernel of Schade’s hypothetical neural image, without being able to measure the neural image directly, we have been forced make several assumptions. It is wise to remember the three strong assumptions we have made:

(a) the neural image is a shift-invariant linear encoding, \\ (b) zero phase shift of the linear encoding, and \\ (c) vector-length rule determines visibility.

Taken as a whole, this is a nonlinear theory of pattern sensitivity. Although the neural image is a linear representation of the input, indeed it is even a shift-invariant representation, the vector-length rule linking the neural image to performance is nonlinear. You can verify this by noting that when a stimulus vector \stimi{1} has length \lengthi{1} and vector \stimi{2} has length \lengthi{2}, the vector \stimi{1} + \stimi{2} need not have length \lengthi{1} + \lengthi{2}. Thus, even when \stimi{1} and \stimi{2} are at one half threshold, \stimi{1} + \stimi{2} may not be at threshold\footnote{ The only case in which the lengths will add is when the vectors representing the neural images point in the same direction.}.

The vector-length calculation is a static nonlinearity applied after a linear calculation (see Chapter 4). This is a relatively simple nonlinearity, so that it is straightforward to make certain general predictions about performance even though the theory is nonlinear. In the next section, we consider some of these predictions as well as experimental tests of them.

Experimental Tests



Figure 7.7: Mixtures of spatial contrast patterns can be used to test theories about pattern sensitivity. Panels (a) and (b) show the contrast of cosinusoidal stimuli at 1 and 3 cpd. The spatial contrasts of these two stimuli are shown added together in peaks add spatial phase (c) and peaks subtract spatial phase (d).


The contrast sensitivity function by itself offers no test of Schade’s theory other than reasonableness: do the inferred linespread functions seem plausible? We have seen that the linespread functions are plausible since they are quite similar to the receptive fields of visual neurons. But, because we have made so many assumptions, it is important to find general properties of the theory that we can test experimentally and in that way gain confidence in the theory’s usefulness.

Harmonic functions will play a special role in testing the theory. There are two separate reasons why harmonics are important for our new test. (1) Given the assumed shift-invariance, the neural image of a harmonic is also a harmonic. We have seen this property many times before and it will be important again. (2) Harmonic functions at different frequencies are orthogonal to one another. Geometrically, orthogonality means that the vectors are oriented perpendicular to one another. Algebraically, we say two vectors, a_{x} and b_{x} are orthogonal when 0 = \sum a_{x} b_{x}. Sinusoids and cosinusoids are orthogonal to one another, and any pair of harmonic functions at different frequencies are orthogonal to one another. We will use these two properties, combined with the vector-length rule, to test Schade’s basic theory.

Suppose we create a stimulus equal to the sum of two sinusoids at frequencies, f_i, and contrasts, c_i, for i = 1,2. According to the shift-invariant theory, the neural image of these two sinusoids is the weighted sum of two sinusoids. Each sinusoid is scaled by a factor, s_i, that defines how well the stimulus is passed by the shift-invariant system. The squared vector-length of the neural image created by the sum of the two sinusoids is

(3)   \begin{equation*} \length^{2} = \sum_{i = 1}^{N} ( c_1 s_1 \sin ( 2 \pi f_1 { i / N } ) + c_2 s_2 \sin ( 2 \pi f_2 { i / N } ) )^{2}. \end{equation*}

The squared term in the summation can be expanded into three terms

(4)   \begin{eqnarray*} \length^2 & = & ( c_1 s_1 \sum_{i=1}^{N} \sin ( 2 \pi f_1 {i/N})) ^{2} + ( c_2 s_2 \sum_{i=1}^{N} \sin ( 2 \pi f_2 {i/N} ) ^{2} \nonumber \\ & + & 2 c_1 c_2 s_1 s_2 (\sum_{i=1}^{N} \sin (2 \pi f_1 {i/N}) \sin (2\pi f_2 {i/N} ). \end{eqnarray*}

Because sinusoids at different frequencies are orthogonal functions, the third term is zero, leaving only

(5)   \begin{equation*} \length^{2} = ( c_1 s_1 \sum_{i = 1}^{N} \sin ( 2 \pi f_1 { i / N } ) )^{2} + ( c_2 s_2 \sum_{i = 1}^{N} \sin ( 2 \pi f_2 { i / N } ) )^{2}. \end{equation*}

We can group some terms to define a new equation,

(6)   \begin{equation*} \length^{2} = (c_1 \sinlengthi{1})^{2} + (c_2 \sinlengthi{2})^{2} . \end{equation*}

where \sinlengthi{i} is a constant, namely s_i \sum_{i=j}^{N} \sin ( 2 \pi f_i { j / N}) ^ 2.

Equation~6 tells us when a pair of contrasts of the two sinusoids, (c_1, c_2), should be at detection threshold. Figure 7.8 is a graphical representation of these predictions. The axes of the graph represent the contrast levels of the two sinusoidal components used in the mixture. The solutions to Equation~6 sweep out a curve called a {\em detection contour}. As shown in Figure 7.8, Equation~6 is the equation of an ellipse whose principal axes are aligned with the axes of the graph. The two unknown quantities, the scale factors \sinlengthi{i}, are related to the lengths of the principal axes. Hence, if we scale the contrast of the sinusoidal components so that threshold contrast for each sinusoidal component is arbitrarily set to one, the predicted detection contour will fall on a circle (Graham and Nachmias, 1971, Nielsen and Wandell, 1988).

Many alternative theories are possible. Had we supposed that threshold is determined by the peak contrast of the pattern, then the detection contour would fall on the diamond shape shown in Figure 7.8. The important point is that the shape of the detection contour depends on the basic theory. The prediction using Schade’s theory is clear, so that we can use the prediction to test the theory.



Figure 7.8: The spatial test-mixture experiment provides a test of contrast sensitivity models. We measure the visibility of a test-mixture whose sinusoidal components have contrasts c_1 and c_2. The set of contrast pairs such that the mixture stimulus is at detection threshold define the detection contour. Schade’s hypothesis predicts that the detection contour is an ellipse aligned with the axes of the graph, as shown by the solid curve. If the peak stimulus contrast determines contrast sensitivity to the pair, then detection contour should fall along the contour indicated by the dotted lines.


Graham, Robson and Nachmias (1978; Graham and Nachmias, 1971) measured sensitivity to mixtures of sinusoidal gratings at 1 cpd and 3 cpd. They measured thresholds using a careful psychophysical threshold estimation procedure called a {\em two-interval, forced-choice} design. In this procedure each trial is divided into two temporal periods, usually indicated by a tone that defines the onset of the first temporal period, a second tone that defines the onset of the second temporal period, and a final tone that indicates the end of the trial. A test stimulus is presented during one of the two temporal intervals, and the observer must watch the display and decide which interval contained the test stimulus. When the contrast of the test pattern is very low, the observer is forced to guess and so performance is at chance. When the contrast is very high, the observer will nearly always identify the correct temporal interval. Hence, as the test pattern contrast increases, performance varies from 0.5 to 1.0. The threshold performance level is arbitrary, but for technical reasons described in their paper, Graham et al. defined threshold to be the contrast level at which the observer was correct with probability 0.81



Figure 7.9: Smpatial test-mixture thresholds measured using a 1 cpd and 3 cpd grating. The thresholds fall outside of the detection contour predicted by the shift-invariant hypothesis and vector length rule (Source: Graham, et al. 1978).


The test-mixture data in Figure 7.9 do not fall precisely along the predicted circular detection contour predicted by Schade’s single-resolution theory. Specifically, thresholds measured in the 45 degree direction tend to fall just outside the predicted detection contour; thresholds are a little too high compared to the prediction. The theory predicts that the threshold contrasts of the individual components should be reduced by a factor of 1.414, but thresholds are reduced by only a factor of 1.2. These data are typical for these types of experimental measurements.

Is this an important difference? The point of this theory is to measure sensitivity to a small number of spatial patterns and to use these measurements to predict sensitivity to all other spatial patterns. If we see failures when we measure sensitivity to a mixture of only two test patterns, we should be concerned. The theory must be precise enough to tolerate decomposition of an arbitrary pattern into a sum of many sinusoidal patterns, and then predict sensitivity to the mixtures of the multiple components. If we already see failures with two components, we should worry about how well the theory will do when we measure with three components.



Figure 7.10: A three component spatial test-mixture experiment. The probability of correct detection in a two-interval forced choice is shown as a function of normalized contrast. The dashed lines on the right show detection for three simple sinusoidal gratings at frequencies of 1.33, 4, and 12 cpd. The two solid lines show the probability of detecting mixtures of these three components in cosine phase and sine phase. The dot-dash curve on the left shows the predicted sensitivity using the shift-invariant model and the vector-length rule (Source: Graham, 1989).


The data in Figure 7.10 illustrate sensitivity measurements to the combination of three sinusoidal gratings. In this case, the data are plotted as three {\em psychometric} functions. Shown in this format, the dependence of performance on contrast is explicit. The data points connected by the dashed curves show the observer’s probability of correctly detecting the individual sinusoidal grating patterns at 1.33, 4.0, or 12 cpd. The horizontal axis measures the scaled contrast of the sinusoids in which the scale factor has been chosen to make the three curves align.

The visibility of two patterns formed by the mixtures of all three sinusoidal patterns, whose contrast ratios have been adjusted to make the three sinusoidal patterns equally visible, are shown as solid lines. One sum was formed with the peak contrasts all aligned (cosine phase) and the other with their zero-crossings aligned (sine phase).

Again, because the input signals are sinusoids or sums of sinusoids, we can predict performance based on the shift-invariant neural image and the vector-length rule. The neural image of the sum of three sinusoidal gratings will be the weighted sum of three sinusoidal neural images. The predicted threshold to the mixture of the three patterns is shown by the dot-dashed curve at the left of Figure 7.10. The vector-length rule also predicts that that the probability correct will be the same in both sine and cosine phase.

The model prediction is not completely wrong; the phase relationship of the gratings does not have a significant influence on detection threshold for the mixture of three targets. But the mixture patterns are less visible than predicted by the theory: The contrasts of the three component mixtures are reduced by a factor of about 1.4 compared to their individual thresholds, while the theory predicts a contrast reduction of 1.73. The basic theory has some good features, but the quantitative predictions fail more and more as we apply the theory to increasingly complex patterns.

Intermediate Summary

We have begun formulating psychophysical theory using the simple computational ideas of shift-invariance followed by a static nonlinearity. These ideas are reminiscent of the properties of certain neurons in the visual pathway. While this theoretical formulation is a vast simplification of what we know about the nervous system, it is a reasonable place to begin. The nervous system is complex and contains many different types of computational elements. While Schade’s effort to capture all of the nuances of the neural representation is inspiring, it was perhaps a bit premature. Much of the neural representation must be irrelevant to the tasks we are studying. By beginning with simpler formulations, we can use psychophysical model to discover those aspects of the neural representation that are essential for predicting the behavior. By comparing and contrasting the behavioral data and the neural data, we can discern the important functional elements of the neural representation for different types of visual tasks.

While the shift-invariant theory did not succeed, it has served the useful purpose of organizing some of our thinking and suggesting some experiments we should try. And, for those of us who need to make some approximate predictions quickly rather than precise predictions slowly, there are some good aspects of the calculation. For example, the inferred psychophysical linespread is similar to the receptive fields of some peripheral neurons. Also, for simple mixtures the theoretical predictions are only off by by a modest factor. Still, the shift-invariant theory plainly does not fit the data very well, and its performance will only deteriorate when we apply it to complex stimuli, such as natural images. We need to find new insights and experiments that might suggest how to elaborate the theory.

Multiresolution Theory

Schade’s single resolution theory of pattern sensitivity does not predict the pattern sensitivity data accurately. But, the theory is not so far wrong that we should abandon it entirely. The question we consider now is how to generalize the single resolution theory, keeping the good parts.

The most widely adopted generalization of expanding the initial linear encoding. Modern theories generally use an initial linear encoding consisting of a collection of shift-invariant linear transformations, not just a single one. Each shift-invariant linear transformations has its own convolution kernel and hence forms its own neural image. We will refer to the data represented by the individual shift-invariant representations as a {\em component-image} of the full theory.

To fully specify the properties of the more general theory, we need to select convolution kernels associated with each of the shift-invariant linear transformations and the static nonlinearities that follow. Also, we need to specify how the outputs of the different component-images are combined to form a single detection decision. For reasons I will explain next, the properties of the convolution kernels of the component-images are usually selected so that the expanded theory is a {\em multiresolution} representation of the image, a term I will explain shortly.

Pattern Adaptation



Figure 7.11: A size illusion and an orientation illusion based on visual pattern adaptation. The bar widths and orientations of the two squarewave patterns in the middle are the same. Stare at the fixation point between the two patterns in (a) for a minute, adapting to the two patterns in your upper and lower visual fields. When you shift your gaze to the patterns in (b) the patterns will appear to have different bar widths. Then, stare at the fixation point between the two patterns in (c) and then examine the middle pattern. When you shift your gaze to (b) the patterns will appear to have different orientations (After Blakemore and Sutton, 1969; see also DeValois, 1977).


The motivation for building a multiresolution theory comes from a collection of empirical observations, such as the one illustrated in Figure 7.11. That figure demonstrates a phenomenon called {\em pattern adaptation}. To see the illusion, first notice that the bars in the patterns on top and bottom of panel (b) are the same width. Next, stare at the fixation target between the patterns in panel (a) for a minute or so. These patterns are called the {\em adapting patterns}. When you stare, allow your eye to wander across the dot between patterns, but do not let your gaze wander too far. After you have spent a minute or so examining the adapting patterns, look at the patterns in panel (b) again. Particularly at first, you will notice that the bars at the top and bottom of the middle pattern will appear to have different sizes. You can try the same experiment by fixating between the adapting patterns in panel (c) for a minute or so. When you examine the bars in the middle, the top and bottom will appear to have different orientation (Blakemore and Campbell, 1969; Blakemore and Sutton, 1969; Blakemore, Nachmias and Sutton, 1970; Gilinsky, 1968; Pantle and Sekuler, 1968)



Figure 7.12: The effect of pattern adaptation on the contrast sensitivity function. (a) The curve through the open circles shows the observer’s contrast sensitivity function before pattern adaptation. The plus symbols show contrast sensitivity following adaptation to a sinusoidal pattern 7.1 cpd. (b) Threshold elevation, that is the ratio of contrast sensitivity before and after adaptation, is plotted as a function of spatial frequency. Threshold is to test frequencies near the frequency of the adapting stimulus (Source: Blakemore and Campbell, 1969).


The effect of pattern adaptation can be measured by comparing the contrast sensitivity function before and after adaptation. The curve through the open symbols in Figure 7.12a shows the contrast sensitivity function prior to pattern adaptation. After adapting for several minutes to a sinusoidal contrast pattern, much as you adapted to the patterns in Figure 7.11a, the observer’s contrast sensitivity to stimuli near the frequency of the adapting pattern is reduced while contrast sensitivity to other spatial frequency patterns remains unchanged (the ‘+’ symbol). The ratio of contrast sensitivity before and after adaptation is shown in Figure 7.11b. When this experiment is repeated, using adapting patterns at other spatial frequencies, contrast sensitivity falls for test patterns whose spatial frequency is similar to that of the adapting pattern (Blakemore and Campbell, 1969).

The results of the pattern adaptation measurements suggest one way to generalize the neural image from a single resolution theory to a multiresolution theory: Use a neural representation that consists of a collection of component-images, each sensitive to a narrow band of spatial frequencies and orientations. This separation of the visual image information can be achieved by using a variety of convolution kernels, each of which emphasizes a different spatial frequency range in the image. This calculation might be implemented in the nervous system by creating neurons with a variety of receptive field properties, much as we have found in the variety of receptive fields of linear simple cells in the visual cortex; these cells have both orientation and spatial frequency preferences (Chapter 6). Because the individual component-images are assumed to represent different spatial frequency resolutions, we say that the neural image is a {\em multiresolution} representation.



Figure 7.13: A multiresolution model can explain certain aspects of pattern adaptation. (a) In normal viewing, the bar width is inferred from the relative responses of a collection of component-images, each responding best to a selected spatial frequency band. The spatial frequency selectivity of each component-image is shown above and the amplitude of the component-image encoding of the test stimulus is shown in the bar graph below. (b) Following adaptation to a low frequency stimulus (shown in inset), the sensitivity of the neurons comprising certain component-images is reduced. Considering the responses of all the component-images, the response to the test is similar to the unadapted response to a high frequency target. (c) Following adaptation to a high frequency pattern (shown in inset), the neural representation is consistent with the unadapted response to a low frequency target.


Multiresolution representations provide a simple framework to explain pattern adaptation (see Figure 7.13). The visual system ordinarily encodes the image using a collection of shift-invariant whose contrast sensitivity curves are shown on the top of Figure 7.13a. Before adaptation, each of the component images represents the squarewave at an amplitude that depends on the squarewave frequency and the channel sensitivity. The amplitude of the component-image representations to the test pattern before adaptation is plotted at the bottom of part (a) as the bar plot.

Adaptation to a low frequency squarewave suppresses sensitivity of some of the component-images, as shown in the top of part (b) Consequently, the responses to the test frequency following adaptation changes, as shown in the bottom of part (b). The new pattern of responses is consistent with the responses that would be caused by the unadapted response to a finer squarewave pattern. This is the explanation of the observation that following adaptation to a low frequency squarewave the test pattern appears to shift to a higher spatial frequency. Figure 7.13c illustrates the component-image sensitivities following adaptation to a high frequency squarewave (top) and how the amplitude of the component image responses are altered (bottom). In this case, the pattern of responses is consistent with the unadapted encoding of a lower frequency target.

According to the multiresolution model, pattern adaptation is much like a lesion experiment. Adaptation reduces or eliminates the contribution of one set of neurons, altering the balance of activity and producing a change in the perceptual response. Following adaptation to a low frequency target, the excitation in component-images at higher spatial frequencies is relatively greater, giving the test bars a narrower appearance. Conversely, following adaptation to a high frequency target, the pattern the excitation in component-images representing low spatial frequencies is relatively greater, giving the test bars a wider appearance.

In summary, the empirical observations using pattern adaptation suggest that squarewave or sinusoidal adapting patterns only influence the contrast sensitivity of patterns of roughly the same spatial frequency. This observation suggests that the component-images might be organized at multiple spatial resolutions.

Pattern Discrimination and Masking

There are several other experimental observations, in addition to pattern adaptation, that can be used to support multiresolution representations for human perception. Historically, one of the most important papers on this point was Campbell and Robson’s (1968) detection and discrimination measurements using squarewave gratings and other periodic spatial patterns.

Squarewaves, like all periodic stimuli, can be expressed as the weighted sum of sinusoidal components using the Discrete Fourier Series. A squarewave, sq(x), that oscillates between plus and minus one, with a frequency of f, can be expressed in terms of sinusoidal components as

(7)   \begin{equation*} sq(x) = \frac{ 4 }{ \pi } \sum_{ n = 0}^{ \infty } \frac{1}{ 2n + 1} \sin ( 2 \pi (2n + 1) f x ) ~~~. \end{equation*}

A squarewave at frequency f is equal to the sum of a series of sinusoids at the odd numbered frequencies, f, 3f, 5f, and so forth. The amplitude of the sinusoids declines with increasing frequency; the amplitude of the 3f sinusoid is one-third the amplitude of the component at f, the amplitude of the 5f sinusoid is one-fifth, and so forth. When the overall contrast of the squarewave is very low, the amplitude of the higher order terms is extremely small and they can be ignored. At low squarewave contrasts only one or two sinusoidal terms are necessary to generate a pattern that is very similar in appearance to the true squarewave. For low contrast values, then, the squarewave pattern can be well-approximated by the pattern

(8)   \begin{equation*} sq(x) \approx \frac{ 4 }{ \pi } [ \sin ( 2 \pi f x ) + \frac{ 1 }{ 3 } \sin ( 2 \pi ( 3 f ) x ) + \frac{ 1 }{ 5 } \sin ( 2 \pi ( 5 f ) x ) ] \end{equation*}



Figure 7.14: Contrast sensitivity measured using squarewave gratings greater than 1 cpd can be predicted from the contrast of squarewave fundamental frequency. The plus signs and open circles show contrast sensitivity to squarewaves and sinewaves, respectively. The filled circles show the ratio of contrast sensitivities at each spatial frequency. The solid line is drawn at a value of 4/\\pi \\approx 1.273, the amplitude of a squarewave fundamental in a unit contrast squarewave (Source: Campbell and Robson, 1968).

Campbell and Robson used squarewaves (and other periodic patterns) to test the multiresolution hypothesis in several ways. First, they measured the smallest contrast level at which observers could detect the squarewave grating. Notice that the amplitude of the lowest frequency component, which is called the {\em fundamental}, is 4 / \pi. Since the fundamental has the largest contrast, and for patterns above 1 cpd sensitivity begins to decrease, Campbell and Robson argued that the neurons whose receptive field size are well-matched to the fundamental component will signal the presence of the squarewave first. If this is the most important term in defining the visibility of the squarewave, then the threshold contrast of the squarewave should be 4 / \pi times the threshold contrast of a sinusoidal grating at the same frequency. The data in Figure 7.14 show contrast sensitivity functions to both sinusoidal and squarewave targets, and the ratios of the contrast sensitivities. As predicted\footnote{ Campbell and Robson’s squarewave detection experiment is a special case of the test-mixture experiment we reviewed earlier. The results show that the 3f and higher frequency components do not help the observer detect the squarewave grating. The more general question of how different frequency components combine is answered by test-mixture experiments, such as those performed by Graham and Nachmias (1971), in which the relative contrast of all the components are varied freely.} for patterns above 1 cpd the ratio of contrasts at detection threshold is 1.28 \approx 4 / \pi.

figure 7.15

Figure 7.15: Discrimination of sinusoidal and squarewave gratings becomes possible when the third harmonic in the squarewave reaches its own independent threshold. The open circles plot contrast sensitivity function. The plus signs show the contrast level at which a squarewave can be discriminated from its fundamental frequency. The filled circles show the squarewave discrimination data shifted by a factor of 3 in both frequency and contrast. The alignment of the shifted curve with the contrast sensitivity function suggests that squarewaves are discriminated when the third harmonic reaches its own threshold level (Source: Campbell and Robson, 1968).

In addition to detection thresholds, Campbell and Robson also measured how well observers can discriminate between squarewaves and sinusoids. In these experiments, observers were presented with a squarewave and a sinewave at frequency f. The two patterns were set in a contrast ratio of 4 / \pi, insuring that the fundamental component of the square and the sinusoid had equal contrast. The observers adjusted the contrast of the two patterns, maintaining this fixed contrast ratio, until the squarewave and sinusoid were barely discriminable. Since the contrast of the squarewave fundamental was held equal to the contrast of the sinusoid, the stimuli could only be discriminated based on the frequency components at 3f and higher.

Campbell and Robson found that observers discriminated between the sinusoid and the squarewave when the contrast in the third harmonic reached its own threshold level. Their conclusions are based on the measurements shown in Figure 7.15. The filled circles show the contrast sensitivity function. The open circles show the contrast of the squarewave when it is just discriminable from the sinusoid. Evidently, the squarewave contrast needed to discriminate the two patterns exceeds the contrast needed to detect the squarewave. But, we can explain the increased contrast by considering the contrast in the 3f component of the squarewave. Recall that this component has 1/3 the contrast of the squarewave. By shifting the squarewave discrimination data (open circles) to the left by a factor of three for spatial frequency, and downwards by a factor of three for contrast, we compensate for these two factors. The plus signs show the open circles shifted in this way. The plus signs align with the original contrast sensitivity measurements. From the alignment of the shifted discrimination data with the contrast sensitivity measurements, we can conclude that the squarewave can be discriminated from the sinusoid when the 3f component is at detection threshold visible.

Campbell and Robson’s discrimination results are consistent with a multiresolution representation of of the pattern. It is as if the fundamental and third harmonics are encoded by different component-images. Because the amplitude of the fundamental component is the same in the squarewave and sinusoid, the observer cannot use that information to discriminate between them. When the contrast of the third harmonic exceeds its own independent threshold, the observer can use the information and discriminate the two patterns.

Although multiresolution representations are consistent with this result, we should ask whether the evidence is powerful. Specifically, we should ask whether the data might be explained by simpler theories. One more general hypothesis we should consider is this: observers discriminate two spatial patterns, S and S + \Delta S, whenever \Delta S is at its own threshold. This is the phenomenon that Campbell and Robson report for their when S and \Delta S are low contrast stimuli, widely separated in spatial frequency. Can subjects always discriminate S from S + \Delta S when \Delta S is at its own threshold hold generally?

No. In fact, the case described by Campbell and Robson describe is very rare. In many cases the two patterns S and S + \Delta S cannot be discriminated even though \Delta S, seen alone, is plainly visible. In this case, we say the stimulus S {\em masks} the stimulus \Delta S. There are also cases when S and S + \Delta S can be discriminated even though \Delta S, seen alone, cannot be detected. In this case, we say the stimulus S {\em facilitates} the detection of \Delta S. Masking and facilitation are quite common; the absence of masking and facilitation, as in the data reported by Campbell and Robson are fairly unusual.

The images in Figure 7.16a demonstrate the phenomenon of visual masking. The pattern shown on the left is the target contrast pattern, \Delta S. This contrast pattern is added into one of the masking patterns shown in the middle column. The masking pattern on the top is similar to the target in orientation, but different by a factor of three in spatial frequency. If you look carefully, you will see a difference between the mask alone and the mask plus the target: Specifically, near the center of the pattern several of the bars on the left appear darkened and several bars on the right appear lightened. The second mask is similar to the target in both orientation and spatial frequency. In this case, it is harder to see the added contrast, \Delta S. The third mask is similar in spatial frequency but different in orientation. In this case, it is easy to detect the added target.

Figure 7.16b shows measurements of masking and facilitation between patterns with with similar spatial frequency and the same orientation (Legge and Foley, 1981). Observers discriminated a mask, S, from a mask plus a two cycle per degree sinusoidal target, S + \Delta S. The vertical axis measures the threshold contrast of the target needed to make the discrimination and the horizontal axis measures the contrast of the masking stimulus S. The different curves show results for maskers of various spatial frequencies. In general, the presence of S facilitates detection at low contrasts and masks detection at high contrasts. When the spatial frequencies of \Delta S and S differ by a factor of two, the amount of facilitation is small, though there is still considerable masking. Other experimental measurements show that when the spatial frequencies of the test and mask differ by a factor of three, the effect of masking is reduced (Wilson et al., 1983; DeValois, 1977). We will discuss the effect of orientation on masking later in this chapter.



Figure 7.16: Masking and facilitation. (a) These images illustrate visual masking. The test contrast pattern is shown on the left, and three different masking contrast patterns are shown in the middle column. The sum of the test and mask contrasts are shown in the right column. When the spatial frequency of the test and mask differ by a factor of three (top), it is possible to see the effect of the test pattern. When the spatial frequency of the test and mask are similar (middle) it is difficult to perceive the added test. When the orientations of the test and mask are very different (bottom) it is very easy to see the added test. (b) The contrast needed to detect a 2 cpd target (Delta S, vertical axis) depends on the contrast of the masking pattern (S, horizontal axis). Each curve measures the effect of a different spatial frequency pattern, S. When S is of low contrast and similar spatial frequency and orientation, it facilitates detection of the target; when it is of high contrast pattern it masks detection of the target. The curves have been displaced along the vertical axis so that each can be seen clearly (Source: Legge and Foley, 1980).


The implications of these experiments for multiresolution models can be summarized in two parts. First, Campbell and Robson’s data show no facilitation or masking when S and \Delta S are low contrast and widely separated in spatial frequency. Second, the Legge and Foley data show that for many stimulus pairs more similar in spatial frequency, S influences the visibility of an increment \Delta S. Taken together, these results are consistent with the idea that stimuli with widely different spatial frequencies are encoded by different component-images.

The Conceptual Advantage of Multiresolution Theories

Today, many different disciplines represent images using the multiresolution format; that is, by separating the original data into a collection of component-images that differ mainly in their peak spatial frequency selectivity. The multiresolution representation has opened up a large set of research issues, and I will discuss several of these in Chapter 8. While the behavioral evidence for multiresolution is interesting, it is hardly enough to explain why multiresolution hypothesis have led to something of a revolution in vision science. Rather, I think that it is the conceptual advantages of multiresolution representations, described below, that have made them an important part of vision science.

When theorists abandon the simple shift-invariance hypothesis for the initial linear encoding, two problems arise. First, the set of possible encoding functions, even just linear encoding functions, becomes enormous. How can one choose among all of the possible linear transformations? Second, without shift-invariance, the theorist loses considerable predictive power, some many important derivations we have made depend on shift-invariance. For example, without shift-invariance we can not derive the same quantitative prediction to the test-mixture threshold of a pair of sinusoids (Equation~6).

Simply abandoning shift-invariance opens up the set of possible encodings too far; theorists need some method of organizing their choices amongst the set of possible linear encodings. The multiresolution structure helps to organize the theorist’s choices. To specify a multiresolution model we must specify the properties of the collection of shift-invariant calculations that make up the multiresolution theory. This organization helps the theorist reason and describe the properties of the linear encoding.

The multiresolution hypothesis also permits theorists to introduce organizational properties into the component-images that make these images seem more like the cortical response of nerve cells. A model with only a single shift-invariant model can not have an orientation selective convolution kernel or a single frequency selective kernel. If the convolution kernel (i.e., neural receptive field) encodes one orientation more effectively than others, or one spatial resolution more strongly, then the observer also must be more sensitive to stimuli with this orientation. Since observers show no strong orientation or resolution bias, a shift-invariant model must use a circularly symmetric pointspread function with fairly broad spatial resolution.

Multiresolution theories, however, can incorporate receptive fields with a variety of orientations and resolutions. As long as all orientations are represented, the model as a whole will retain equal sensitivity to all orientations. The use of oriented convolution kernels with restricted spatial resolution makes the analogy between the convolution kernels and cortical receptive fields much closer (see Chapter 6).



Figure 7.17: A multiresolution model of spatial pattern sensitivity. The stimulus is convolved with a collection of spatial filters with different peak spatial frequency sensitivity. The filter outputs are modified by a nonlinear compression, noise is added, and the result is combined into a neural image (Source: Spillman and Werner, 1990).


The complexity of the calculations is an important challenge in developing multiresolution models of human pattern sensitivity. Figure 7.17 is an overview of a fairly simple multiresolution model described by Wilson and Regan (1984; Wilson and Gelb, 1984; (A. B. Watson, 1983; Foley and Legge, 1980; Watt and Morgan, 1985). The initial image is transformed linearly into a neural image comprising a set of component-images. In a recent implementation of this model, Wilson and Regan (1984) suggest that the neural image consists of forty-eight component-images, organized by six spatial scales and eight orientations (i.e. all scales at all orientations). Each component-image is followed by a static nonlinearity that are modifications of the vector-length measure. To see the development of an even more extensive model, the reader should consult the work by Watson and his colleagues( e.g. Watson, 1983; Watson and Ahumada, 1989). They have developed a substantially larger multiresolution model, using very sophisticated assumptions concerning the observer’s internal noise and decision-making capabilities.

It is difficult to reason about the performance of these multiresolution models from first principles (though see Nielsen and Wandell, 1988; Bowne ,1990). Consequently, most of the predictions from these models are derived using computer simulation. Analyzing the model properties closely could easily fill up a book; and, in fact, Norma Graham (1989) has completed an authoritative account of the present status of work in this area. I am pleased to refer the reader to her account.

Challenges to Multiresolution Theory

Multiresolution theories are the main tool that theorists use to reason about pattern sensitivity. As we reviewed in the preceding section, multiresolution representations have many useful features and they can be used to explain several important experimental results. There are, however, a number of empirical challenges to the multiresolution theories. In this section, I will describe a few of the measurements that represent a challenge to multiresolution theories of human pattern sensitivity. As you will see, many of these challenges derive from the same source as challenges to a shift-invariant theory: mixture experiments.

Pattern Adaptation to Mixtures

If we are to use pattern adaptation to justify multiresolution theories, then we should spend a little more time studying the general properties of pattern adaptation measurements. Perhaps the first step we should take is to extend the pattern adaptation measurements from simple sinusoids to more general patterns consisting of the mixture of two patterns.


Figure 7.18: Pattern adaptation mixture experiments. These curves measures log threshold contrast elevation at various test frequencies following adaptation. The curves show the results following adaptation to a 3 cpd sinusoid (solid), a 9 cpd sinusoid (dash), and their sum (dot-dash). Threshold elevation following adaptation to the sum is smaller than threshold elevation following adaptation to the individual components (Source: Nachmias et al., 1973).

Nachmias et al. (1973) performed a pattern adaptation experiment using individual sinusoidal stimuli and their mixtures as adapting stimuli. The question they pose, as in all mixture experiments, is whether we can use the individual measurements to predict the behavioral performance to the sum.

The results of their measurements are shown in Figure 7.18. Each curve in 7.18 represents threshold elevation of sinusoidal test gratings at different spatial frequencies. The solid curve measures threshold elevation when the adapting stimulus was a three cycles per degree sinusoidal grating. Confirming Blakemore and Campbell (1969), there is considerable threshold elevation at 3 cycles per degree and less adaptation at both higher and lower spatial frequencies. The dashed curve measures threshold elevation when the adapting field was a 9 cycle per degree grating. For historical reasons, the contrast of this grating was one third the contrast of the grating at the fundamental. Even at this reduced contrast, the nine cycle per degree grating also causes a significant threshold elevation for nine cycles per degree test stimuli.

The dot-dash curve shows the threshold elevation following adaptation to the mixture of the two adapting stimuli. For this observer, adaptation to the mixture shows no threshold elevation to test gratings at 9 cpd. For all of the observers in this study, the threshold elevation at 9 cpd following adaptation to the mixture is smaller than the threshold elevation following adaptation to the 3 cpd adapting stimulus. The mixture of 3 and 9 is {\em less} potent than adapting to 9 alone.

This result is difficult to reconcile with the simple interpretation of adaptation and spatial frequency channels in Figure 7.13. If the adaptation to 3 cpd stimulates a different set of neurons from adaptation to 9 cpd, then why should adapting to 3 cpd and 9 cpd improve sensitivity at 3 cpd? I am unaware of any explanations of this phenomenon that also preserve the basic logical structure of the multiresolution representations.


Figure 7.19: The neutrons that limit detection and those that cause pattern adaptation may not be the same. For example, one group of neurons (Group T) may be noisy and have low contrast gain. Because of their noise properties, this group would limit detection threshold. Neurons in a second population (Group A) may integrate the responses of the first group of neurons and have high contrast gain. Because of their high gain, this group of neurons may fatigue easily and be the neural basis of pattern adaptation. If the neural units that limit these two types of behavioral responses are different, then the spatial receptive fields of neurons that are inferred from detection and pattern adaptation experiments may well be different.

The results of these mixture experiments should motivate us to rethink the basic mechanisms of pattern adaptation. If we plan on using this experimental method to provide support for a notion as significant as multiresolution representation of pattern, then we should understand the adaptation phenomenon. Figure 7.19 illustrates one of the difficulties we face when we try to integrate results from detection and adaptation experiments. When we group results from detection and adaptation experiments, we assume implicitly that the visual mechanisms that limit detection are the same as those that alter visual sensitivity following pattern adaptation. But behavioral measurements provide no direct evidence that the neurons that limit sensitivity are the same as those that underly adaptation.

The diagram in Figure 7.19 illustrates one way in which this assumption may fail. Suppose that neurons indicated as {\em Group T} are located early in the visual pathways, and that these neurons are noisy and have low contrast gain. If these are the least reliable neurons in the pathway, then the sensitivity may be limited by their properties. In that case, we can improve the observer’s detection performance by testing with stimuli that are well-matched to response properties of the Group T neurons.

We have assumed that the effects of pattern adaptation are due to neural fatigue caused by strong stimulus excitation. Because the Group T neurons have relatively low gain, they will not respond very strongly to most stimuli and consequently they may not be susceptible to pattern adaptation. Instead, it may be that another group of neurons, {\em Group A} neurons, are the ones most influenced by the adapting pattern. I have shown these neurons in Figure 7.19 at a later stage in the visual pathways. The spatial spatial properties of the pattern adaptation experiment, for example the way test sensitivity varies with the spatial properties of the adapting pattern, may be due to the spatial receptive field properties of the Group A neurons. Group T and Group A neurons may have quite different spatial receptive fields\footnote{ You should also consider the possibility that the basic mechanism of neural fatigue is not the main source of pattern adaptation. Recently, Barlow and Foldiak (1989) have put forward an entirely different explanation of pattern adaptation that is based on the learning principles, not on neural fatigue. While the work on this topic is too preliminary for me to include in this volume, I think this line of research has great potential for clarifying many visual phenomenon.}.

From this analysis, it should be clear that the spatial properties of the neural encoding derived from pattern adaptation may differ from the spatial properties of the neural encoding derived from detection tasks, To argue that the mechanisms limiting detection and mediating pattern adaptation are the same, we must find behavioral experimental measurements that prove this point. In that case, we can piece together the results from detection and pattern adaptation to infer the organization within multiresolution models.

Masking with Mixtures

In Campbell and Robson’s (1968) discrimination experiment, the observer was asked to distinguish between two stimuli S and S + \Delta S, where S and \Delta S, effectively, were sinusoidal patterns. Campbell and Robson found that when S and \Delta S were sinusoidal stimuli at well-separated spatial frequencies the two patterns were discriminable when \Delta S was at its own threshold. In reviewing their experiments we considered how masking depends on the relative spatial frequency of the test and masking patterns (Figure 7.16).


\centerline{ \psfig{figure=../06space/fig/ ,clip= ,width=5.5in} }


Orientation tuning in the masking experiment. Threshold elevation to a 2 cpd test as a function of the orientation of a 2 cpd masking stimulus. The data include only maskers with positive orientations since masking is symmetric with respect to the orientation of the masking stimulus. The two curves show data from two observers and the error bars are one standard error of the mean (Source: Phillips and Wilson, 1984).

% figure 1, data from filled circles in the middle panel.

% Two observers, at 2 cpd test.

% Phillips and Wilson 1984, JOSA A Orientation Bandwidths of spatial

% mechanisms measured by masking.




The data in Figure 7.20 show how masking depends on the relative orientation of the target and masker (Phillips and Wilson, 1984). In this study, the masker S and test \Delta S were at the same spatial frequency. Phillips and Wilson measured the contrast needed in the test to discriminate S + \Delta S from S for various orientations of the masker. The horizontal axis in Figure 7.20 measures the orientation of the masking stimulus, S. The vertical axis measures the threshold elevation of the test. As the difference in orientation between the test stimulus \Delta S and masking stimulus S increases, the masking effect decreases. In this data set, when the orientation difference exceeds 40 degrees S + \Delta S can be discriminated from S when \Delta S is at its own threshold level.

Based on these measurements, one might suspect that one-dimensional contrast patterns separated in orientation by 40 degrees are encoded by separate neurons. Experimental results like these might be used to determine the orientation selectivity properties of convolution kernels used in multiresolution models.


Figure 7.21: Masking mixture experiments. When a test and masking grating are separated in orientation by 67.5 deg, the masker has no influence on the visibility of the test. But, the combination of two masking gratings at 67.5 deg, neither of which alone has any effect, acts as a powerful masker. (Source: Derrington and Henning, 1989).

Results from test-mixture experiments based on visual masking challenge the validity of this conclusion. Derrington and Henning (1989) report mixture experiments in which they measured threshold elevation using two separate masking patterns and their mixture. The individual masking patterns were 3 cpd sinusoidal gratings; one grating was oriented at plus 67.5 degrees and the other minus 67.5 degrees relative to vertical. They measured the effect of these masking patterns on a variety of vertically oriented sinusoidal gratings.

If the two masking stimuli are represented by neurons that are different from those that represent the vertical test patterns, then the superposition of these two masking stimuli should not influence the target visibility. The data in Figure 7.21 show that this is not the case. The mixture of the two masking patterns is a potent mask even though each alone fails to have any effect\footnote{ Similar difficulties in interpreting the effects of masking, but with respect to mixtures of sinusoidal gratings, have been studied by Nachmias and Rogowitz (1983, Perkins and Landy, 1991)}.

%VR, 29 no. 1, Nachmias and Rogowitz (1983),

% Masking by spatially modulated gratings. VR 23 1621-1629

%and Perkins and Landy (1991).

% Nonadditivity of masking by narrowband noises. VR no. 6 1991.

% p. 1067-1072.

Intermediate Summary

The multiresolution representations are very important theoretical tool. They help us think about the general problem of pattern sensitivity and they provide a framework for organizing computational models of pattern sensitivity and other pattern-related tasks. There is some evidence that these representations are an important part of the human visual pathways. But, there is a bewildering array of experimental methods — ranging from detection to pattern adaptation to masking — whose results are inconsistent with the central notions of multiresolution representations. As we have seen, mixture experiments using pattern adaptation and masking are difficult to understand if we believe that components of the image in spatial frequency and orientation bands are encoded by independent sets of neurons.

The conflicting pattern of experimental results show us that we haven’t yet achieved a complete understanding of the basic neural processes that cause adaptation and masking. Nor do we understand how these neural processes are related to the neural processes that limit pattern sensitivity. Achieving this understanding is important because these experiments provide the key results that support multiresolution representations. Perhaps, once we understand the properties these separate experimental methods more fully, we will understand the role of multiresolution representations and find a way to make sense of complete set of experimental findings. Up to this point, I think you should see that we are well underway in understanding these issues, but many questions remain unanswered.


% This file contains adaptation and localization


Pattern Sensitivity Depends on Other Viewing Parameters

Next, we will review how pattern sensitivity depends on other aspects of the viewing conditions, such as the mean illumination level, the temporal parameters of the stimulus, and the wavelength properties of the pattern. In each of these cases, we will use some form of the contrast sensitivity function as a summary of the observer’s behavior.

In the remainder of this chapter, the contrast sensitivity function plays a different role from the way we have used it up to now. To this point, I have emphasized the special role of the contrast sensitivity function in linear systems theories. If we understand the structure of the data well enough, then the contrast sensitivity function can be used to predict sensitivity to many other different patterns. A clear example of this is Schade’s use of the contrast sensitivity function: if visual sensitivity is limited by a shift-invariant neural image, then we can use the contrast sensitivity function to predict sensitivity to any other pattern.

We do not have yet a complete theory that permits us to use the contrast sensitivity function to characterize behavior generally. My purpose in continuing to describe pattern sensitivity in terms of the contrast sensitivity function now is that it serves as a summary measure of visual pattern sensitivity. Hence, in the remainder of this chapter, we will not look at the contrast sensitivity function as a complete description of the observer’s pattern sensitivity. Rather, we will use it as a descriptive tool to help us learn something about the general pattern sensitivity of the visual system.

Part of the reason for standardizing on the contrast sensitivity function is this: The measure is used widely in both physiology and psychophysics. Hence, behavioral measurements of the contrast sensitivity function can provide us with a measure that we can compare with the neural response at different points in the visual pathway. If a particular class of neurons, say retinal ganglion cells, limit visual sensitivity, we should expect behavioral contrast sensitivity curves and neural contrast sensitivity curves to covary as we change the experimental conditions.

Light Adaptation

Figure 7.22 shows that the contrast sensitivity function changes when it is measured at different mean background intensities. The curve in the lower left shows a contrast sensitivity function measured at a low mean luminance level (9 \times 10^{-4} trolands) when rods dominate vision. Under these conditions the contrast sensitivity function peaks at 1-3 cpd and the curve is lowpass rather than bandpass. The curve on the upper right shows a contrast sensitivity function measured on a bright photopic background, one million times more intense. Under these conditions the peak of the contrast sensitivity function is near 6-8 cpd and the shape of the curve is bandpass. At mean background intensities higher than 1000 trolands, the contrast sensitivity function remains unchanged (Westheimer, 1960; van Ness and Bouman, 1967).


needs capt

\caption[Contrast Sensitivity at Various Mean Field Levels]{ {\em Human contrast sensitivity varies with mean field luminance.} Each curve shows a contrast sensitivity function at a different mean field luminance level ranging from 9 \times 10^{-4} trolands to 9 \times 10^{2} trolands, increasing by a factor of ten from curve to curve. The stimulus consisted of monochromatic light at 525 nm. At the lowest level, under scotopic conditions, the contrast sensitivity function is lowpass and peaks near 1 cpd. On intense photopic backgrounds the curve is bandpass and peaks near 8 cpd Above these mean background levels, the contrast sensitivity function remains constant (Source: van Ness and Bouman, 1967).

%Figure 2 scanned and replotted in F. L. VanNess and M. A. Bouman,

%Spatial modulation transfer in the human eye, JOSA v. 57 no. 3 march 1967.




The change in the shape of the contrast sensitivity function is consistent with a few simple imaging principles. The first principle concerns the importance of achieving adequate signal under the ambient viewing conditions. At very low light levels, the observer needs to integrate light across the retina in order to achieve a reliable signal. If the observer must spatially average the light signal to obtain a reliable signal, then the observer cannot also resolve high spatial frequencies. Consequently, under dim, scotopic conditions the observer should have poor sensitivity to high spatial frequencies, as they do. On more intense backgrounds, when quanta are plentiful, the observer can integrate information over smaller spatial regions and spatial frequency resolution improves.

The second principle concerns the importance of contrast, rather than absolute intensity, for visual processing. Figure 7.22 shows that contrast sensitivity of low spatial frequency patterns (below 1 cpd) rises with mean luminance and then becomes constant. The range in which contrast sensitivity becomes constant is called the {\em Weber’s law} regime. For low spatial frequency patterns, Weber’s law is a good description of the results. At higher spatial frequencies, contrast sensitivity continues to rise with the mean luminance. For these patterns Weber’s law is not a precise description of behavior of sensitivity.

Even though Weber’s law is imprecise it does contain a kernel of truth. Consider the overall dynamic ranges we are measuring. The background intensities used in these experiments vary by a factor of one million, i.e., six orders of magnitude. Yet, the contrast sensitivity generally varies by only a factor of 20 or so, only one order of magnitude while sensitivity to absolute light level varies by 4 or 5 orders of magnitude. The pattern of results suggests that the visual system preserves contrast sensitivity, as suggested by Weber’s law, rather than absolute intensity. The visual system succeeds quite well at Weber’s law behavior at low spatial frequencies, and it comes close at high spatial frequencies. The significance of contrast rather than absolute intensity for vision confirms the general view we have adopted, beginning with measurements of contrast sensitivity in retinal ganglion cells and cat behavior described in Chapter 5.

Spatio-temporal contrast sensitivity

Figure 7.5 showed several contrast sensitivity functions measured using contrast-reversing sinusoids. Those data illustrate how the contrast sensitivity function varies when we measure at a few different temporal frequencies. Figure 7.23 contains a surface plot that represents how spatial contrast sensitivity function when we measure at many different temporal frequencies. One axis of the graph shows the spatial frequency of the test pattern, a second axis shows the test pattern’s temporal frequency. The height of the surface represents the observer’s contrast sensitivity. The surface represents the observer’s {\em spatiotemporal contrast sensitivity function}. This single surface represents a large range of spatial and temporal contrast sensitivity functions. Paths through the surface running parallel to the spatial frequency axis represent the spatial contrast sensitivity function; paths through the surface running parallel to the temporal frequency axis represent temporal contrast sensitivity functions. Kelly (1979) derived the analytic curve that yields the surface shape from an extensive set of psychophysical measurements.

If the spatial contrast sensitivity functions had the same shape up to a scale factor, and similarly for the temporal contrast sensitivity functions, we would say that human spatio-temporal contrast sensitivity is space-time {\em separable}\footnote{ See Chapter 5 near Equation~?? for a discussion of space-time separability of receptive fields}. From the shape of the contrast sensitivity surface, it is apparent that the spatial contrast sensitivity curves have different shapes when measured at different temporal frequencies (cf. Figure 7.5). Hence, human contrast sensitivity is not space-time separable (Kelly and Burbeck, 1984).


Figure 7.23: Human spatiotemporal contrast sensitivity function. The two lower axes represent the spatial and temporal frequencies of a contrast-reversing pattern. The vertical axis represents the observer’s contrast sensitivity to each of the contrast reversing patterns. The data used to estimate this surface were made on a mean background luminance of 1000 trolands. Curves running parallel to the spatial frequency axis define a set of spatial contrast sensitivity functions measured at different temporal frequencies (cf. Figure 7.5). Curves running parallel to the temporal frequency axis represent the temporal contrast sensitivity measured at different spatial frequencies. Human spatiotemporal contrast sensitivity is not space-time separable (Source: Kelly, 1966,1979).

There are several considerations that make space-time separability an important property. First, in Chapter 5 I explained that only space-time separable systems have unique spatial and temporal sensitivity functions. When a system is not separable it does not have a unique contrast sensitivity function; rather it has a different function for each temporal measurement condition.

Second, space-time separability is significant because it simplifies computations and representations. For example, suppose we want to represent the spatiotemporal contrast sensitivity function at N = 60 spatial and N = 60 temporal frequencies. If the contrast sensitivity function is not separable, we may need to store as many as N^2 = 3600 values of the sensitivity function. But, if the function is space-time separable, we need to represent only the the spatial contrast sensitivity function and the temporal contrast sensitivity function (2N = 120). Sensitivity to any space-time pattern can be calculated from the products of these two functions.

While the observer’s behavior as a whole is not space-time separable, it is not necessary that we forego all of the advantages of space-time separability. Thus, even though the observer’s performance as a whole is not space-time separable, we may be able to describe the observer’s performance as if it depends on the combination of a few space-time separable mechanisms\footnote{Indeed, it is possible to show this is always a theoretical possibility. The result follows from an important representation in linear algebra called the {\em singular value decomposition}.}. We first saw this approach in Chapter 5 when we studied the receptive field of retinal ganglion cells. Although their receptive fields are not space-time separable, we could model them as comprised of two space-time separable components, namely the center and surround.

Kelly (1971, 1979; Kelly and Burbeck, 1984) has modeled the human spatiotemporal contrast sensitivity function as if visual sensitivity is limited by contributions from two space-time separable component. This description of contrast sensitivity is a single-resolution description, much like Schade’s. The convolution kernel of the system is composed of a central and a surround region, much like a difference of Gaussian, in which the two components are each space-time separable. When the two components are summed, as for retinal ganglion cells, the resulting convolution is not separable. Using suitable parameters for the Gaussians and temporal parameters, it is possible to approximate the the contrast sensitivity surface by computing the output of the convolution kernel. This single-resolution convolution kernel provides a convenient method for computing the surface, but as we have seen in other parts of this chapter the single-resolution system does not generalize well to predict sensitivity to other space-time patterns formed by the mixture of harmonic functions.

Temporal Sensitivity and Mean luminance

The {\em temporal contrast sensitivity function} measures sensitivity to temporal sinusoidal variations in the stimulus contrast. Figure 7.24a shows the temporal contrast sensitivity function measured at a variety of mean background intensities.


Figure 7.24: Human temporal sensitivity measured at various mean background illuminance levels. (a) Temporal contrast sensitivity. The spatial pattern was a two degree disk presented on a large background. Each curve measures contrast sensitivity (vertical axis) as a function of temporal frequency (horizontal axis). The curves show measurements on a variety of backgrounds In sequence from lowest curve to highest the mean luminance was 0.375, 1, 3.75, 10, 37.5, 100, 1000, 10,000 tds. Once the background illumination reaches roughly 5 trolands, contrast sensitivity to low temporal frequencies remains constant, consistent with Weber’s law (Source: de Lange, 1958). (b) Temporal amplitude sensitivity. The spatial pattern was a 60 degree disk. Each curve measures the threshold amplitude, not contrast, as a function of temporal frequency. The mean background levels are 0.85, 7.1, 8.5 and 850 trolands. Notice that at high temporal frequencies the threshold amplitude appear to fall along a single curve, independent of the mean background level. This convergence is consistent with a purely linear response, involving and no light adaptation, for high temporal frequency stimuli (Source: Kelly 1961).

First, consider how contrast sensitivity to the lowest temporal frequencies varies with background intensity. At the very lowest background levels, contrast sensitivity increases with mean luminance. Once the mean background luminance reaches 5 trolands, contrast sensitivity to low frequencies changes by less than a factor of two while the background intensity changes over a factor of 100. For low temporal frequencies, contrast sensitivity remains relatively constant across changes in the mean background intensity. This is the form of light adaptation called Weber’s Law.

Second, consider the contrast sensitivity at high temporal frequencies. For these tests, contrast sensitivity increases systematically at all background levels, a deviation from Weber’s Law. The nature of the deviation can be clarified by replotting the data as shown in Figure 7.24b where sensitivity is plotted as a function of the {\em amplitude} of the high frequency flicker, not contrast (which is the amplitude divided by the mean level). When plotted as a function of amplitude, the temporal flicker sensitivity curves converge at high temporal frequencies. The convergence of the functions measured at many different mean luminance levels implies that sensitivity to high temporal frequency signals is predicted by the amplitude of the signal, not its contrast. This is the behavior one expects from a pure linear system, without light adaptation. In this temporal frequency range, then, Weber’s Law does not describe the data well at all. These data show that light adaptation does not play a significant role in determining the visibility of high temporal frequency flicker.

Pattern-Color Sensitivity

There is a very powerful relationship between the wavelength composition of a target and our sensitivity to pattern. In Chapter 2 we reviewed one of the most important factors that relates wavelength and pattern sensitivity: the chromatic aberration of the optics. The consequences of chromatic aberration are quite significant for the organization of the entire visual pathways. For example, based on the measurements we reviewed in earlier chapters, the chromatic aberration of the lens, coupled with the wide spacing of the \Blue cones, imply that a signal beginning in the \Blue cones can only represent signals less than 3-4 cycles per degree (cf. Figure~??). This compares to the basic optical and sampling limit of nearly 50 cpd for signals initiated by a mixture of \Red and \Green cones. The consequences of these neural limitations and others can be measured easily in people’s ability to detect, discriminate and perceive colored patterns: People’s ability to resolve short-wavelength patterns is very poor (Williams, 1986).

While it is easy to understand some of the relationship between pattern and color in terms of the optics and the cone mosaic, the limitations that relate color and pattern are best understood by thinking about the neural pathways that encode color, rather than the cones. A great deal of physiological and behavioral evidence (see Chapters 6 and 9) demonstrate that we perceive color via neural pathways that combine the signals from the three cone classes. One pathway carries the sum of the cone signals, while other pathways, called {\em color opponent-pathways}, carry signals representing the difference between cone signals. Signals are represented on these pathways at very different spatial resolution (Mullen, 1988; Noorlander and Koenderink, 1983; Poirson and Wandell, 1993; Sekiguchi, et al. 1993).

High spatial frequency signals (20-60 cpd) appear to excite only the pathway formed by summing the cone signals. We experience these patterns as light-dark modulations around the mean luminance. Spatial frequency patterns below 12 cpd can excite a pathway that encodes the difference between \Red and \Green signals. Only the lowest spatial frequencies excite the third pathway, a pathway that includes the \Blue cones.

These effects have been roughly understood for many years. For example, the color television broadcast system that is transmitted into many homes is organized into three color signals that correspond to a light-dark signal and two color difference signals. Only the light-dark signal includes high spatial frequency information about the image; the two color channels represent only low spatial frequency information. This representation is very efficient for transmission since leaving out high spatial frequencies in two of the signals permits a large compression in the bandwidth of the signal. Despite the missing spatial frequency information, the broadcast images do not appear spatially blurred. The reason is that the high spatial frequency color information that is omitted in the transmission is not ordinarily perceived.

The color pathways also differ in their temporal sensitivity. Perhaps the most important observation is based on the {\em flicker photometry} experiment. In this experimental procedure a pair of test lights alternate with one another. When the lights are alternated slowly the pattern appears to change between the colors of the two lights. When the lights alternate rapidly, observers fail to see the color modulation, and all differences appear as a light-dark modulation upon a steady colored background. Our temporal resolution for distinguishing blue-yellow flicker is poorest, red-green in the middle, and light-dark is best.

The relationship between spatial resolution and temporal resolution suggest a hypothesis that we considered in Chapter 3: namely, that spatial and temporal resolution covary because they are both related through the rigid motion of objects. If the most important source of temporal variation in the image is due to motions of the eye or motions of an object, temporal frequency and spatial frequency resolution should covary. At a single velocity, the motion of a low spatial frequency image produces a slower temporal variation than motion of a high spatial frequency image. Hence, in those wavelength bands where only low spatial frequencies are imaged the visual system may not require high temporal frequency resolution.


Figure 7.25: Color sensitivity and appearance depends on the spatiotemporal pattern. We perceive blue-yellow, red-green and light-dark variations at the lowest spatiotemporal frequencies. When the spatial frequency of the pattern exceeds 3 or 4 cpd, we fail to see blue-yellow variation. For spatial (temporal) frequencies greater than 16 cpd (Hz), we see the world only as light-dark modulations about the mean color. In this spatiotemporal region our perception is monochromatic.

I have summarized the covariation of color, space and time in the color image shown in Figure 7.25. The image represents how color appearance varies across different spatiotemporal frequency ranges. We are trichromatic only in a relatively small range of low spatial and temporal frequencies represented near the origin of the figure. As the spatial or temporal frequency increases we fail to see blue-yellow variation and vision becomes dichromatic. At the higher spatial and temporal frequencies we are monochromatic, and we see only light-dark variation.

Retinal Eccentricity

The contrast senstivity measurements we have reviewed were all made using small patches of sinusoidal grating presented within the central few degrees of the visual field. As one measures contrast sensitivity at increasingly peripheral locations in the visual field, sensitivity decreases. There are a number of neural factors\footnote{The quality of the optics does not appear to decline significantly over the first 20 degress of visual angle (Jennings and Charman, 1981).} that conspire to reduce both absolute sensitivity and spatial resolution. The density of the cone mosaic falls off rapidly as a function of visual eccentricity, so that there are fewer sensors available to encode the signal. The retinal ganglion cell density falls as well, as does the amount of cortical area devoted to representing the periphery. Approximately one half of primary visual cortex represents only the central ten degrees of the visual field (314 square degrees), while the remaing half of visual cortex must represents the rest of the visual field, which extends to a radius of 80 degrees (20,000 square degrees; see Chapters 5 and 6).


Figure 7.26: The contrast sensitivity function varies with retinal eccentricity. (a) Contrast sensitivity functions measured using a 1 deg x 2 deg grating patch at retinal eccentricities of 0, 1.5, 4, 7.5, 14, and 30 degrees retinal eccentricity are shown. Contrast sensitivity measured using this stimulus is highest in the fovea and falls dramatically with retinal eccentricity. (b) Contrast sensitivity functions measured with test stimulus scaled in size and spatial frequency in order to compensate roughly for the reduced cortical area devoted to different retinal eccentricities (Source: Rovamo et al.,1978).

Figure 7.26a shows a set of contrast sensitivity functions measured using a small grating patch at several different visual eccentricities. The top curve shows the observer’s contrast sensitivity in the fovea. The observer’s peak contrast sensitivity is 100 for gratings near 5-8 cpd, meaning that the observer can detect these at one percent contrast. In the fovea, the observer can resolve gratings as fine as 40-60 cpd. When the same stimulus is used to make measurements in the visual periphery, observers become less sensitive in all regards so that stimuli 30 degrees in the periphery have a peak sensitivity of 3 and an upper limit of 2 cpd.

We don’t notice ordinarily this decrease in contrast sensitivity. When asked, most people believe that their spatial resolution is fairly uniform over a much wider extent of the image than just 2 degrees (their thumb nail at arms length). Yet, from the curves in Figure 7.26a, it is plain that our visual resolution is very poor by 7-10 degrees (a fist at arms length). Hence, our impression of seeing sharply over a large spatial extent must be due in part to our ability to integrate spatial information using eye movements.

Rovamo et al., (1978; Rovamo and Virsu, 1979, Virsu and Rovamo, 1979) suggested that the decrease in contrast sensitivity with eccentric viewing can be explained quantitatively by the reduced representation of the visual field in the cortex. Qualitatively, the decrease in contrast sensitivity and the coarse neural representation of the periphery do parallel one another. The rough agreement between these factors is demonstrated by the results in Figure 7.26b. These contrast sensitivity functions, like those in Figure 7.26a, were made at different retinal eccentricities. For these measurements, however, the size and spatial frequency of the grating patch were scaled to compensate for the reduced cortical representation at that retinal eccentricity. When the size and spatial frequency of the stimulus are adjusted to compensate for the reduced cortical representation, the contrast sensitivity functions become fairly similar.

Visual performance deteriorates with eccentricity for all known spatial-acuity tasks and spatial localization tasks that we will review later in this chapter; but, the performance decrease as a function of retinal eccentricity varies considerably across observers and across tasks. The reduced representation of the periphery is present in all of the neural representations beginning with the photoreceptors and continuing into the central nervous system. The variance in observers’ performance coupled with the wide number of neural representations with similar decrease in the peripheral representation, make it difficult to attribute the decline in performance with any single anatomical structure. The decline of acuity with eccentric viewing is an important and widespread feature of the visual system; it may not be possible to localize its cause to a single site in the visual pathways (e.g., Farrell and Desmarais, 1990; Ludvigh, 1941; Legge and Kersten, 1987; Levi et al., 1985; Westheimer, 1979, Yap et al., 1989).

Linking Hypotheses

We have now reviewed several instances in which the variation of behavioral contrast sensitivity functions with stimulus conditions is similar to the variation of retinal ganglion cell responses. These correlations suggests that there is a causal relationship between the retinal ganglion cells receptive fields and the behavioral measurements. But, such a relationship is quite difficult to prove with the certainty that we would like to have. At this point, I think it is worth reviewing what we have learned about making such inferences.

Behavioral and neural theorizing supplement one another. Psychophysicists measure behavioral responses and then build theories about neural mechanisms. The properties of the theoretical neural mechanisms summarize the data and lead to new behavioral predictions. Neurophysiological measurements tell us about the neural activity directly. But, we must theorize about how the neural activity influences behavior. Each field contributes part of the information about visual function.

In an influential chapter in his book, Brindley (1970) called hypotheses that connect measurements in the two fields {\em linking hypotheses}. He took a very conservative view concerning the type of experiments that could be used to reason about physiology from performance. His comments initiated a discussion that continues to this day (Westheimer, 1990; Teller, 1990). Brindley felt that the only truly secure argument connecting physiology and perception is this:

\ldots whenever two stimuli cause physically indistinguishable signals to be sent from the sense organs to the brain, the sensations produced by these stimuli, as reported by the subject in words, symbols, or actions, must also be indistinguishable (Brindley, 1970, p. 133.)

By stating his hypothesis clearly and forcefully, Brindley has drawn a great deal of attention to the problem of linking results between the separate disciplines. My purpose in writing this section is to question whether he may have succeeded too well; the emphasis on linking results from behavioral and physiological studies sometimes distracts us from assuring that the experimental logic within each discipline is complete.

We establish the most secure links between behavior and physiology when we first understand the separate measurements very well. For example, the relationship between the color-matching functions and photopigment sensitivities are strong because we have extensive quantitative studies, ranging over many measurement conditions, that tell us about each set of measurement conditions on their own. The color-matching experiment stands no matter what the photochemist observers, and the cone photopigment measurements stand no matter what the psychophysicist observes. Because each set of results stands powerfully on its own, we can feel confident that their relationship is a strong case for a connection between the two fields. If we require that the analysis within each discipline stands on its own, then when it comes time to join the two sets of observations we can have greater confidence in the link.

I mean to contrast the view stated here with an alternative approach in which the behaviorist uses the discovery of a particular neural response as the logical basis for a purely behavioral experiment. Or, conversely the case in which a physiologist explains a set of recordings in terms of some potentially related behavioral measure. Such ideas may be useful in the background to help formulate specific experimental measurements. But, the logic of theories and experiments based on a web of interconnections from behavior to physiology often serve to entangle our thinking.

Given this standard, what should we think about the connection between behavioral contrast sensitivity and neural receptive fields? In this chapter we have found that there is a powerful theory underlying behavioral contrast sensitivity functions. This theory is a good match to the logic of receptive field organization we reviewed in earlier chapters. The psychophysical results based on the contrast sensitivity function, however, do not fully support the basic theory. We cannot yet generalize from contrast sensitivity functions to sensitivity to other stimuli. Hence, the association between receptive field properties and contrast sensitivity functions are far more tentative than the connection between the color-matching functions and the photopigment spectral sensitivities.

Having stated this limitation in our current understanding, I don’t think we should be discouraged. The similarities between the properties of the contrast sensitivity functions and neural receptive fields are too striking to ignore. By continuing to improve on the models for behavior and receptive fields separately, the links we forge and quantitative comparisons we make could well turn out to form a complete model, linking behavioral pattern sensitivity and neural receptive fields.

Spatial Localization

In this section we will review how well human observers can localize the position of a target. Wulfing (1892) showed that human observers can make surprisingly fine discriminations between the positions of two objects. Observers can reliably distinguish spatial offsets between a pair of lines as small as one fifth the width of a single cone photoreceptor. Moreover, people can distinguish this spatial offset even when the objects are moving (Westheimer, 1979).


Figure 7.27: A comparison of localization and spatial resolution experiments. In a two-line spatial acuity experiment, the observer distinguishes between a stimulus consisting of a single line from a stimulus consisting of a pair of lines separated by a small amount. The images on the left side show the estimated retinal light distribution of a reference line and of three pairs of lines separated by increasing amounts. In a localization experiment, the observer distinguishes the position of a single line from the position of a displaced line. The images on the right side of the figure show a reference line and the estimated retinal light distribution of three offset lines.

The ability to discriminate between targets at different spatial positions is an aspect of human spatial resolution. It is important to recognize that that the ability to {\em localize} a target is different kind of resolution from the spatial resolution we measure when we ask observers to discriminate a pattern from a uniform background\footnote{ The terminology associated with these two types of spatial tasks can be confusing. The word {\em hyperacuity}, refers to the fact that people localize spatial position with very high precision. Unfortunately acuity is also used to refer to the spatial frequency sensitivity of the observer, which is a different matter. Here, I will use the term {\em localization} to refer to spatial resolution for position.}. The differences between the tasks are illustrated in Figure 7.27.The left side of the image in Figure 7.27 shows the estimated retinal light distributions of several stimuli a subject might be shown in a spatial resolution task. In this experiment, the subject must discriminate between the light distribution of a single dine line (top left) from the light distribution of a line-pair in which the two lines are separated by a small amount (bottom left). In this task, the stimuli are all centered at the same point, so there is no difference in where they are located. The right side of the image in Figure 7.27 Figure 7.27b shows the retinal light distributions of stimuli a subject might be shown in a spatial localization task. In this experiment, the subject must discriminate the position of the retinal light distribution created by reference line (top right) from the positions of the light distributions of a line that is offset (bottom right).


Figure 7.28: Localization sensitivity. Subjects detected whether a line was offset to the right or left of the tip of a chevron for a variety of angles of the opening of the chevron. Data from two observers are shown. Both observers could reliably report offsets as small as six seconds of arc (left vertical scale) which is one-fifth the width of a single photoreceptor (right vertical scale) (Source: McKee and Westheimer, 1977).

McKee and Westheimer (1977) measured observers ability to localize a line (see Figure 7.28). Subjects judged whether a line was located to the right or left of the tip of a chevron (see inset in the figure). The vertical axis of the graph measures the displacement needed to discriminate reliably when the line is offset from the middle of the chevron. This task was repeated for chevrons with various angles; for all angles, the offsets thresholds are on the order of 5 seconds of arc, roughly one-fifth the width of a single cone. Performance does not vary much as we change the stimulus. This suggests that localization performance is robust with respect to spatial manipulations of the target. This very fine localization applies to many different kinds of stimuli, including the relative positions of a pair of vertical lines, moving lines, and many other targets (Westheimer, 1979).

At first, it seems surprising to learn that we can localize targets at a finer resolution than the spacing of the cone mosaic. We know that the sampling grid determined by the cone mosaic imposes a fundamental limit on spatial pattern resolution through the phenomenon of aliasing (see Chapter 3). Shouldn’t the cone mosaic also impose a limitation on our ability to localize position?


Figure 7.29: A physical basis for localization in localization tasks. The points in the main graph show the estimated rate of light absorption by foveal cones to a fine line. The x’s show absorptions to a reference line and the open circles show the absorptions to a line offset by 12 seconds of arc. The tick marks on the horizontal axis are separated by the width of a single cone. The solid and dashed lines linearly connect the estimated absorption rates. The inset shows the ratio of cone absorptions from the reference line and the displaced line at each cone position. The graphs shows that a 12 sec shift, roughly 1/3 the width of a photoreceptor, changes the cone absorption rate by as much as 50 percent. This information can be used to localize the position of the line at a resolution that is substantially finer than the separation between cones.

In fact, a coarse sampling grid does not eliminate the possibility of localizing a target precisely. The physical principles we can use to achieve fine spatial localization on a coarse sampling grid are illustrated in Figure 7.29. The main portion of the figure shows the pattern of cone absorptions we expect in response to a reference line centered over a cone and a line that is displaced to the right by 12 sec of arc. The separation between the tick-marks on the horizontal axis are set at 30 sec, the size of an individual cone. The values were calculated using Westheimer’s optical linespread function (Chapter 2).

Because the offset is very small compared to the sampling, the same cones respond to the reference line and the offset line. It follows that the identity of the cones cannot be used to estimate the locations of the two lines. Although the same cones respond to the two lines, the spatial pattern of cone absorptions when the lines are in these two positions is quite different. The inset to Figure 7.29 shows the ratio of the cone absorptions to the two different lines. A small spatial shift of 12 sec of arc causes a fifty percent change in the absorption rate at an individual cone. Hence, the {\em spatial pattern of absorption rates} is a reliable signal that can be interpreted to infer that the line position at a resolution finer than the spacing of the cone mosaic.

Notice that the optical blurring of the light distribution is essential if we wish to localize the line at positions finer than the sampling grid. Were there no optical blurring, the image of a line would fall within the width of a photoreceptor and spatial displacements less than a photoreceptor width would not be detectable. Optical blur, which seems like a nuisance when we consider spatial resolution of contrast patterns, is a help when we consider spatial resolution to localize targets.

Our ability to localize the position of edges and lines is very robust with respect to various stimulus manipulations. If we vary the target contrast, set the display into motion, or flash the display briefly, performance remains excellent. Since performance is robust with respect to these experimental manipulations, it is clear that the simple calculation shown in Figure 7.29 is only a demonstration of how localization is possible. The visual system must use a much more sophisticated and robust method to calculate position than the simple calculation described in the Figure. Eye movements during examination of a static display, or tracking errors during examination of a visual display, will make it impossible to compare the outputs of a single small set of cones. Rather, people must be capable of estimating the position at fine precision even though the precise identity of the cones mediating the signal varies. Although we have some basic principles to work from, how we estimate the relative position of moving targets using active eyes remains an important challenge to study.


Theories of human pattern sensitivity are organized around a few basic principles. In the earliest and simplest theories, the visibility of different types of test patterns was explained by the properties of a single shift-invariant linear system. This type of theory is simple for computation and also parallels nicely our understanding of the initial encoding of light by retinal nerve cells in certain visual streams. The convolution kernel of the shift-invariant linear system and the neural receptive field play analogous roles and provide a natural basis for comparison of behavioral and physiological data. By using common experimental measures, such as contrast sensitivity functions, the properties of neural mechanisms and behavioral theories can be compared directly.

While certain aspects of single-resolution theories provide a reasonable description of human pattern sensitivity, they fail a number of direct empirical tests. Consequently, theorists have tried to assemble new theories in which the pattern representation is based on a collection of shift-invariant representations, not just a single one. This idea parallels the physiological notion that the visual system contains a set of visual streams. The more complex modern theories must specify a larger number of convolution kernels (receptive fields). To keep these organized, and to parallel some of the properties of cortical receptive fields, theorists generally choose convolution kernels that respond best to restricted bands of spatial frequency and to restricted stimulus orientations. These theories can predict more experimental results, but there remain many computational and experimental challenges before we will have a complete satisfactory theory of pattern sensitivity.

Because human vision constantly adapts to new viewing conditions, human pattern sensitivity cannot be described by a single pattern sensitivity function. Pattern sensitivity covaries with the temporal properties of the test stimulus, the mean background level, and with the wavelength composition of the stimulus. Thus, a general specification of human pattern sensitivity must take all of these factors into account.

Behavioral experiments show that people are also exquisitely sensitive the spatial location of targets. Observers can localize test stimuli to a resolution that is considerably finer than the spacing of the cone mosaic. The ability to localize is quite robust, surviving many different stimulus manipulations. The principles of how one might local to a very fine resolution are clear, but the methods that the visual pathways use to acquire the necessary information remain to be determined.