My research challenges the notion that scenes are merely a collection of objects. Instead, I believe that global aspects of scene geometry and affordance might make better primitives than objects. I have termed such properties global properties because they cannot be computed from a local segment of the image. These papers demonstrate that human observers are more sensitive to global properties that describe a scene's layout and functions than the objects within them. (Greene & Oliva, 2009 Cognitive Psychology; Greene & Oliva, 2009 Psychological Science; Greene & Oliva 2010, JEPHPP).
What we see is not only a function of the visual features entering our eyes, but also our knowledge and expectations of what we think we will see. These expectations govern how well we will see an image. When comparing unusual images and matched controls in a demanding detection task, observers were significantly more accurate at detecting the control pictures compared to the unusual ones, even though no machine vision system could distinguish between them (Greene et al, AP&P 2015).
Top-down knowledge is also reflected in patterns of eye movements. Idiosyncrasies in individual eye movement patterns can be accurately decoded from summary statistics, while an observer's task cannot (Greene, Liu & Wolfe, Vision Research 2012).
Scenes are complex, but not random. We know that keyboards tend to be found below monitors, and that chimneys are not found on the lawn. While many vision scientists believe that this contextual knowledge aids recognition, we cannot understand the extent to which it helps without first measuring how much redundancy there is. Though mining a large, fully-labeled scene database, I have provided a first set of contextual statistics, including object frequency and conditional frequencies based on scene category (Greene, Frontiers in Perception Science, 2013).
The next logical step is to understand how human observers internalize these contextual statistics. I asked observers to rate the frequency with which various objects could be found in various scenes. Across six experiments, I found that object frequency was systematically over-estimated by an average of 32% (Greene, Cognition, 2016).
Visual Search in Scenes
How do I find my lost keys? Visual search for real objects in real scenes does not always follow the same laws as visual search displays that are frequently used in experiments. In particular, the number of objects in a scene has very little influence over the difficulty of the search. In seeking to understand why this is the case, my colleagues and I have put forth a two pathway theory whereby a non-selective and global analysis of the scene can allow for activations of possible object locations while a selective process can evaluate each location in a serial manner (Wolfe et al, TiCS 2011).
What sort of information makes up the global image analysis? My global scene properties seemed like a good place to start. We examined visual search slopes for scenes with a particular global property, such as highly navigable scenes among non-navigable distractors. It does not appear that global properties are available without selective attention as search times increased substantially with increasing numbers of distractors (Greene & Wolfe, JoV 2011).
Creating Category Representations
Categories help us generalize across experience. However, not all categories are created equally. Consider the feline images on the right: these objects can be validly called an animal, a mammal, a cat, or a domestic orange tabby cat (left), people tend to use a mid-level of specificity (cat). Furthermore, even though both animals are cats, why do observers agree that the cat on the left is a better example of a cat than the one on the right?
Along with my colleagues, I have been investigating the neural correlates of these category structures. We have found evidence for entry-level object grouping in the object-selective lateral occipital complex (LOC, Iordan et al, JoCN 2015). Furthermore, the LOC is also sensitive to the typicality of an object within its entry-level category (Iordan et al, NeuroImage 2016).
Does categorization come automatically when one recognizes an object? To test this, I examined both object and scene categorization using a modified Stroop paradigm. Observers classified printed words that were written on top of photos of objects and scenes. Although the images were irrelevant to the task, observers could not inhibit their categorization: they were faster and more accurate in categorizing words with congruent pictures (Greene & Fei-Fei, JoV 2014).
Functions / Affordances
What are the "building blocks" of visual experience? Although it is intuitive to think of objects as the building blocks of scenes, my work in object context and global properties question this view. In my view, a fundamental aspect of human visual perception is the perception of what actions we can or cannot perform in a given environment. Using a massive crowdsourced dataset of over 5 million images, my colleagues and I have demonstrated that scene categorization patterns are better explained by these scene functions than by objects or perceptual features (Greene et al, JEP:G, 2016).