| Pilar Bachiller |
| Tecnología de los Computadores y las Comunicaciones, Extremadura |
| July, 2008 |
| Full text (external site) |
Abstract |
During the last few years, attention has become an important issue in machine vision. Studies of attentional mechanisms in biological vision have inspired many computational models. Most of them follow the assumption of limited capacity associated to the role of attention from psychological proposals. These theories hypothesize that the visual system has limited capacity of processing and that attention acts as a filter that selects the information that should be processed at each time. This assumption has been criticized by many authors who afirm that processing capacity of human perceptual systems is enormous. From this view, there is no need for an stage of selection of the information to be processed. Instead, they claim the role of attention from the perspective of selection for action. According to this new conception, the function of attention is to avoid a behavioral disorganization by selecting the appropriate information to drive task execution. Such a notion of attention is very interesting in robotics where the aim is to build autonomous robots that interact with complex environments, keeping multiple behavioral objectives. Attentional selection for action can guide robot behaviors by focusing on relevant visual targets while avoiding distracters. Moreover, it can be conceived as a coordination mechanism, since it allows serializing the actions of, potentially, multiple active behaviors. To exploit these ideas, we propose a visual attention system based on the selection for action theory. It has been design and tested on a mobile robot endowed with a stereo vision head. The proposed system has been modeled as a collection of components collaborating to select, fix and track visual targets according to different task requirements. The low level components are related to image acquisition, motor control, as well as computation and maintenance of regions of interest (ROI). Components of intermediate level are in charge of extracting sets of ROI features related to what (appearance information) and how (spatial information) matters. These features are used by high level components, called target selectors (TS), to drive attention according to certain top-down behavioral specifications. Attention control is not centralized, but distributed among several target selectors. Each of them drives attention from different top-down specifications to focus on different types of visual targets. At a given time, overt attention is driven by one TS, while the rest attends covertly to their corresponding targets. The frequency of overt control of attention of each TS is modulated by high level behavioral units according to their information requirements. The fixation of a selected target is accomplished by two independent camera movements: a saccadic and tracking movement in one of the cameras and a vergence movement in the other. This allows controlling attention from monocular information while keeping stable binocular fixation. Once this perceptual-motor process is completed, the foveated target is sent to the behavioral units. Only actions compatible with the focus of attention are then executed, solving the behavior coordination problem. The whole system works as a control architecture that is attracted towards different visual targets to keep several behavioral goals. The specific interleaving between actions is given by an implicit time relation that links internal parameters and external world features. |
ISSN: 1888-0258
