首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A video sequence is more than a sequence of still images. It contains a strong spatial–temporal correlation between the regions of consecutive frames. The most important characteristic of videos is the perceived motion foreground objects across the frames. The motion of foreground objects dramatically changes the importance of the objects in a scene and leads to a different saliency map of the frame representing the scene. This makes the saliency analysis of videos much more complicated than that of still images. In this paper, we investigate saliency in video sequences and propose a novel spatiotemporal saliency model devoted for video surveillance applications. Compared to classical saliency models based on still images, such as Itti’s model, and space–time saliency models, the proposed model is more correlated to visual saliency perception of surveillance videos. Both bottom-up and top-down attention mechanisms are involved in this model. Stationary saliency and motion saliency are, respectively, analyzed. First, a new method for background subtraction and foreground extraction is developed based on content analysis of the scene in the domain of video surveillance. Then, a stationary saliency model is setup based on multiple features computed from the foreground. Every feature is analyzed with a multi-scale Gaussian pyramid, and all the features conspicuity maps are combined using different weights. The stationary model integrates faces as a supplement feature to other low level features such as color, intensity and orientation. Second, a motion saliency map is calculated using the statistics of the motion vectors field. Third, both motion saliency map and stationary saliency map are merged based on center-surround framework defined by an approximated Gaussian function. The video saliency maps computed from our model have been compared to the gaze maps obtained from subjective experiments with SMI eye tracker for surveillance video sequences. The results show strong correlation between the output of the proposed spatiotemporal saliency model and the experimental gaze maps.  相似文献   

2.
Covert and overt spatial selection behaviors are guided by both visual saliency maps derived from early visual features as well as priority maps reflecting high-level cognitive factors. However, whether mid-level perceptual processes associated with visual form recognition contribute to covert and overt spatial selection behaviors remains unclear. We hypothesized that if peripheral visual forms contribute to spatial selection behaviors, then they should do so even when the visual forms are task-irrelevant. We tested this hypothesis in male and female human subjects as well as in male macaque monkeys performing a visual detection task. In this task, subjects reported the detection of a suprathreshold target spot presented on top of one of two peripheral images, and they did so with either a speeded manual button press (humans) or a speeded saccadic eye movement response (humans and monkeys). Crucially, the two images, one with a visual form and the other with a partially phase-scrambled visual form, were completely irrelevant to the task. In both manual (covert) and oculomotor (overt) response modalities, and in both humans and monkeys, response times were faster when the target was congruent with a visual form than when it was incongruent. Importantly, incongruent targets were associated with almost all errors, suggesting that forms automatically captured selection behaviors. These findings demonstrate that mid-level perceptual processes associated with visual form recognition contribute to covert and overt spatial selection. This indicates that neural circuits associated with target selection, such as the superior colliculus, may have privileged access to visual form information.SIGNIFICANCE STATEMENT Spatial selection of visual information either with (overt) or without (covert) foveating eye movements is critical to primate behavior. However, it is still not clear whether spatial maps in sensorimotor regions known to guide overt and covert spatial selection are influenced by peripheral visual forms. We probed the ability of humans and monkeys to perform overt and covert target selection in the presence of spatially congruent or incongruent visual forms. Even when completely task-irrelevant, images of visual objects had a dramatic effect on target selection, acting much like spatial cues used in spatial attention tasks. Our results demonstrate that traditional brain circuits for orienting behaviors, such as the superior colliculus, likely have privileged access to visual object representations.  相似文献   

3.
Zhang  Tielin  Yang  Yang  Zeng  Yi  Zhao  Yuxuan 《Cognitive computation》2020,12(4):834-843

Various types of theoretical algorithms have been proposed for 6D pose estimation, e.g., the point pair method, template matching method, Hough forest method, and deep learning method. However, they are still far from the performance of our natural biological systems, which can undertake 6D pose estimation of multi-objects efficiently, especially with severe occlusion. With the inspiration of the Müller-Lyer illusion in the biological visual system, in this paper, we propose a cognitive template-clustering improved LineMod (CT-LineMod) model. The model uses a 7D cognitive feature vector to replace standard 3D spatial points in the clustering procedure of Patch-LineMod, in which the cognitive distance of different 3D spatial points will be further influenced by the additional 4D information related with direction and magnitude of features in the Müller-Lyer illusion. The 7D vector will be dimensionally reduced into the 3D vector by the gradient-descent method, and then further clustered by K-means to aggregately match templates and automatically eliminate superfluous clusters, which makes the template matching possible on both holistic and part-based scales. The model has been verified on the standard Doumanoglou dataset and demonstrates a state-of-the-art performance, which shows the accuracy and efficiency of the proposed model on cognitive feature distance measurement and template selection on multiple pose estimation under severe occlusion. The powerful feature representation in the biological visual system also includes characteristics of the Müller-Lyer illusion, which, to some extent, will provide guidance towards a biologically plausible algorithm for efficient 6D pose estimation under severe occlusion.

  相似文献   

4.
The human visual cortex extracts both spatial and temporal visual features to support perception and guide behavior. Deep convolutional neural networks (CNNs) provide a computational framework to model cortical representation and organization for spatial visual processing, but unable to explain how the brain processes temporal information. To overcome this limitation, we extended a CNN by adding recurrent connections to different layers of the CNN to allow spatial representations to be remembered and accumulated over time. The extended model, or the recurrent neural network (RNN), embodied a hierarchical and distributed model of process memory as an integral part of visual processing. Unlike the CNN, the RNN learned spatiotemporal features from videos to enable action recognition. The RNN better predicted cortical responses to natural movie stimuli than the CNN, at all visual areas, especially those along the dorsal stream. As a fully observable model of visual processing, the RNN also revealed a cortical hierarchy of temporal receptive window, dynamics of process memory, and spatiotemporal representations. These results support the hypothesis of process memory, and demonstrate the potential of using the RNN for in‐depth computational understanding of dynamic natural vision.  相似文献   

5.
Gerlach C  Law I  Paulson OB 《Neuropsychologia》2004,42(11):1543-1553
It has been suggested that category-specific recognition disorders for natural objects may reflect that natural objects are more structurally (visually) similar than artefacts and therefore more difficult to recognize following brain damage. On this account one might expect a positive relationship between blood flow and structural similarity in areas involved in visual object recognition. Contrary to this expectation we report a negative relationship in that identification of articles of clothing cause more extensive activation than identification of vegetables/fruit and animals even though items from the categories of animals and vegetables/fruit are rated as more structurally similar than items from the category of articles of clothing. Given that this pattern cannot be explained in terms of a tradeoff between activation and accuracy, we interpret these findings within a model where the matching of visual forms to memory incorporates two operations: (i) the integration of stored object features into whole object representations (integral units), and (ii) the competition between activated integral units for selection (i.e. identification). In addition, we suggest that these operations are differentially affected by structural similarity in that high structural similarity may be beneficial for the integration of stored features into integral units, thus explaining the greater activation found with articles of clothing, whereas it may be harmful for the selection process proper because a greater range of candidate integral units will be activated and compete for selection, thus explaining the higher error rate associated with animals. We evaluate the model based on previous evidence from both normal subjects and patients with category-specific disorders and argue that this model can help reconcile otherwise conflicting data.  相似文献   

6.
The ability to group items and events into functional categories is a fundamental function for visual recognition. Experimental studies have shown the different roles in information representations of inferior temporal (IT) and prefrontal cortices (PFC) in a categorization task. However, it remains elusive how category information is generated in PFC and maintained in a delay period and how the interaction between IT and PFC influences category performance. To address these issues, we develop a network model of visual system, which performs a delayed match-to-category task. The model consists of networks of V4, IT, and PFC. We show that in IT visual information required for categorization is represented by a combination of prototype features. We also show that category information in PFC is represented by two dynamical attractors weakly linked, resulting from the difference in firing thresholds of PFC neurons. Lower and higher firing thresholds contribute to working memory maintenance and decision-making, respectively. Furthermore, we show that top-down signal from PFC to IT improves the ability of PFC neurons to categorize the mixed images that are closer to a category boundary. Our model may provide a clue for understanding the neural mechanism underlying categorization task.  相似文献   

7.
Abstract Traditionally, concepts are assumed to be situational invariant mental knowledge entities (conceptual stability), which are represented in a unitary brain system distinct from sensory and motor areas (amodality). However, accumulating evidence suggests that concepts are embodied in perception and action in that their conceptual features are stored within modality-specific semantic maps in the sensory and motor cortex. Nonetheless, the first traditional assumption of conceptual stability largely remains unquestioned. Here, we tested the notion of flexible concepts using functional magnetic resonance imaging and event-related potentials (ERPs) during the verification of two attribute types (visual, action-related) for words denoting artifactual and natural objects. Functional imaging predominantly revealed crossover interactions between category and attribute type in visual, motor, and motion-related brain areas, indicating that access to conceptual knowledge is strongly modulated by attribute type: Activity in these areas was highest when nondominant conceptual attributes had to be verified. ERPs indicated that these category-attribute interactions emerged as early as 116 msec after stimulus onset, suggesting that they reflect rapid access to conceptual features rather than postconceptual processing. Our results suggest that concepts are situational-dependent mental entities. They are composed of semantic features which are flexibly recruited from distributed, yet localized, semantic maps in modality-specific brain regions depending on contextual constraints.  相似文献   

8.
Two recent papers ( [Foulsham et?al., 2009] and [Mannan et?al., 2009]) report that neuropsychological patients with a profound object recognition problem (visual agnosic subjects) show differences from healthy observers in the way their eye movements are controlled when looking at images. The interpretation of these papers is that eye movements can be modeled as the selection of points on a saliency map, and that agnosic subjects show an increased reliance on visual saliency, i.e., brightness and contrast in low-level stimulus features. Here we review this approach and present new data from our own experiments with an agnosic patient that quantifies the relationship between saliency and fixation location. In addition, we consider whether the perceptual difficulties of individual patients might be modeled by selectively weighting the different features involved in a saliency map. Our data indicate that saliency is not always a good predictor of fixation in agnosia: even for our agnosic subject, as for normal observers, the saliency-fixation relationship varied as a function of the task. This means that top-down processes still have a significant effect on the earliest stages of scanning in the setting of visual agnosia, indicating severe limitations for the saliency map model. Top-down, active strategies-which are the hallmark of our human visual system-play a vital role in eye movement control, whether we know what we are looking at or not.  相似文献   

9.
The role of the binocular disparity in the deployment of visual attention is examined in this paper. To address this point, we compared eye tracking data recorded while observers viewed natural images in 2D and 3D conditions. The influence of disparity on saliency, center and depth biases is first studied. Results show that visual exploration is affected by the introduction of the binocular disparity. In particular, participants tend to look first at closer areas in 3D condition and then direct their gaze to more widespread locations. Beside this behavioral analysis, we assess the extent to which state-of-the-art models of bottom-up visual attention predict where observers looked at in both viewing conditions. To improve their ability to predict salient regions, low-level features as well as higher-level foreground/background cues are examined. Results indicate that, consecutively to initial centering response, the foreground feature plays an active role in the early but also middle instants of attention deployments. Importantly, this influence is more pronounced in stereoscopic conditions. It supports the notion of a quasi-instantaneous bottom-up saliency modulated by higher figure/ground processing. Beyond depth information itself, the foreground cue might constitute an early process of ??selection for action??. Finally, we propose a time-dependent computational model to predict saliency on still pictures. The proposed approach combines low-level visual features, center and depth biases. Its performance outperforms state-of-the-art models of bottom-up attention.  相似文献   

10.
According to the research results reported in the past decades, it is well acknowledged that face recognition is not a trivial task. With the development of electronic devices, we are gradually revealing the secret of object recognition in the primate’s visual cortex. Therefore, it is time to reconsider face recognition by using biologically inspired features. In this paper, we represent face images by utilizing the C1 units, which correspond to complex cells in the visual cortex, and pool over S1 units by using a maximum operation to reserve only the maximum response of each local area of S1 units. The new representation is termed C1 Face. Because C1 Face is naturally a third-order tensor (or a three dimensional array), we propose three-way discriminative locality alignment (TWDLA), an extension of the discriminative locality alignment, which is a top-level discriminate manifold learning-based subspace learning algorithm. TWDLA has the following advantages: (1) it takes third-order tensors as input directly so the structure information can be well preserved; (2) it models the local geometry over every modality of the input tensors so the spatial relations of input tensors within a class can be preserved; (3) it maximizes the margin between a tensor and tensors from other classes over each modality so it performs well for recognition tasks and (4) it has no under sampling problem. Extensive experiments on YALE and FERET datasets show (1) the proposed C1Face representation can better represent face images than raw pixels and (2) TWDLA can duly preserve both the local geometry and the discriminative information over every modality for recognition.  相似文献   

11.
Monkeys with selective damage to the hippocampus are often unimpaired in matching‐to‐sample tests but are reportedly impaired in visual paired comparison. While both tests assess recognition of previously seen images, delayed matching‐to‐sample may engage active memory maintenance whereas visual paired comparison may not. Passive memory tests that are not rewarded with food and that do not require extensive training may provide more sensitive measures of hippocampal function. To test this hypothesis, we assessed memory in monkeys with hippocampal damage and matched controls by providing them the opportunity to repeatedly view small sets of videos. Monkeys pressed a button to play each video. The same 10 videos were used for six consecutive days, after which 10 new videos were introduced in each of seven cycles of testing. Our measure of memory was the extent to which monkeys habituated with repeated presentations, watching fewer videos per session over time. Monkeys with hippocampal lesions habituated more slowly than did control monkeys, indicating poorer memory for previous viewings. Both groups dishabituated each time new videos were introduced. These results, like those from preferential viewing, suggest that the hippocampus may be especially important for memory of incidentally encoded events.  相似文献   

12.
Gesture recognition has been suffering from long-term dependencies and complex variations in both spatial and temporal dimensions. Many traditional methods use hand cropping and sliding window scheme in the spatial and temporal space, respectively. In this paper, we propose a sequentially supervised long short-term memory architecture, which allows using pose information to guide the learning process of gesture recognition using variable length inputs. Technically, we add supervision at each frame using human joint positions. Our proposed methods can solve gesture recognition and pose estimation problems simultaneously using only RGB videos without hand cropping. Experimental results on two benchmark datasets demonstrate the effectiveness of the proposed framework compared with the state-of-the-art methods.  相似文献   

13.
Background: Using images in multiple‐choice formats for comprehension testing in aphasia is common. It is generally assumed that persons being assessed perceive the content of the images represented in such tasks. However, specific visual characteristics of individual images may influence visual attention, which may influence accuracy in the selection of a correct target image corresponding to a verbal stimulus. The validity of test responses may be confounded by (1) physical stimulus features, such as size, and (2) semantic content conveyed by images, such as image familiarity.

Aims: The first aim was to develop a rating instrument to assess visual stimulus properties and semantic content conveyance in multiple‐choice images, based on an extensive review of empirical literature and validated by experts in graphic design. The second aim was to study the degree of relationship between viewers' subjective ratings of images selected from published aphasia batteries, using the rating instrument, and eye movement measures recorded as independent viewers looked at the same images. The third aim was to compare the viewers' actual eye movement indices of disproportionate visual attention to an ideal value of evenly proportionate visual attention for each image set.

Methods & Procedures: A rating instrument, based on an extensive review of literature and assessed and revised by graphic design and eye‐tracking experts, was developed to identify such influences within multiple‐choice images and was assessed through empirical testing of viewers' eye movement patterns as they looked at images from published aphasia tests. A total of 20 adults rated 20 image sets from five aphasia batteries. Eye movements were recorded for a separate group of 40 adults viewing the same images.

Outcomes & Results: Ratings were not statistically correlated with eye movement responses. All multiple‐choice image sets prompted significantly disproportionate visual attention.

Conclusions: Results highlight the importance of: (1) considering the possible influence of visual stimulus confounds on any given patient's test performance, and (2) better controlled image design for multiple‐choice test images to improve the validity of assessment. Further research is needed to improve subjective and objective means of assessment of images and guidelines for improved design of multiple‐choice image displays.  相似文献   

14.
Bistable perception emerges when a stimulus under continuous view is perceived as the alternation of two mutually exclusive states. Such a stimulus provides a unique opportunity for understanding the neural basis of visual perception because it dissociates the perception from the visual input. In this paper we analyze the dynamic activity of local field potential (LFP), simultaneously collected from multiple channels in the middle temporal (MT) visual cortex of a macaque monkey, for decoding its bistable structure-from-motion (SFM) perception. Based on the observation that the discriminative information of neuronal population activity evolves and accumulates over time, we propose to select features from the integrated time-frequency representation of LFP using a relaxation (RELAX) algorithm and a sequential forward selection (SFS) algorithm with maximizing the Mahalanobis distance as the criterion function. The integrated-spectrogram based feature selection is much more robust and can achieve significantly better features than the instantaneous-spectrogram based feature selection. We exploit the support vector machines (SVM) classifier and the linear discriminant analysis (LDA) classifier based on the selected features to decode the reported perception on a single trial basis. Our results demonstrate the excellent performance of the integrated-spectrogram based feature selection and suggest that the features in the gamma frequency band (30-100 Hz) of LFP within specific temporal windows carry the most discriminative information for decoding bistable perception. The proposed integrated-spectrogram based feature selection approach may have potential for a myriad of applications involving multivariable time series such as brain-computer interfaces (BCI).  相似文献   

15.
The accuracy of visual diagnosis of seizures based on semiologic features among different health care professionals is largely unknown. We evaluated the ability of health care professionals to correctly diagnose epileptic seizures (ES) and psychogenic nonepileptic seizures (PNES) from a random selection of 10 ES and 10 PNES videos. The 20 videos (without accompanying electroencephalography) were shown only once, in a random mix to different groups of health care professionals. These individuals, blinded to the diagnosis, were asked to classify the seizure as ES or PNES. We used summary receiver operating characteristic (SROC) curves to determine the accuracy for each group. Next we calculated the difference between the area under the curve (AUC) of SROC between neurologists (as the reference) and the other groups of health care professionals. Neurologists achieved significantly higher AUC results compared to other health care professionals. These results indicate a wide range of diagnostic accuracy among different health care professionals and have practical implications for the evaluation of patients with seizure disorders in acute settings.  相似文献   

16.
The current debate on mechanisms of action understanding and recognition has re-opened the question of how perceptual and motor systems are linked. It has been proposed that the human motor system has a role in action perception; however, there is still no direct evidence that actions can modulate early neural processes associated with perception of meaningful actions. Here we show that plans for action modulate the perceptual processing of observed actions within 200 ms of stimulus onset. We examined event-related potentials to images of hand gestures presented while participants planned either a matching (congruent) or non-matching (incongruent) gesture. The N170/VPP, representing visual processing of hand gestures, was reliably altered when participants concurrently planned congruent versus incongruent actions. In a second experiment, we showed that this congruency effect was specific to action planning and not to more general semantic aspects of action representation. Our findings demonstrate that actions encoded via the motor system have a direct effect on visual processing, and thus imply a bi-directional link between action and perception in the human brain. We suggest that through forward modelling, intended actions can facilitate the encoding of sensory inputs that would be expected as a consequence of the action.  相似文献   

17.
Gu  Jin  Liu  Baolin  Sun  Xiaolin  Ma  Fangyuan  Li  Xianglin 《Brain imaging and behavior》2021,15(1):231-243

Action recognition is an essential component of our daily life. The occipitotemporal cortex (OTC) is an important area in human movement perception. The previous studies have revealed that three vital regions including the extrastriate body area (EBA), human middle temporal complex (hMT+), and posterior superior temporal sulcus (pSTS) in OTC play an important role in motion perception. The aim of the current study is to explore the neural interactions between these three regions during basic human movement perception. Functional magnetic resonance imaging data were acquired when participants viewed dynamic videos depicting basic human movements. By the dynamic causal modeling analysis, a model space consisting of 576 models was constructed and evaluated to select the optimal model given the data. The information of the visual movement was found to enter the system through hMT+. We speculated that hMT+ would be the region to show sensitivity to the presence of motion and it subsequently influence and be influenced by the other two regions. Our results also revealed the manner in which the three regions interact during action recognition. Furthermore, We found significantly enhanced modulated connectivity from hMT+ to both EBA and pSTS, as well as from EBA to both hMT+ and pSTS. We inferred that there may be multiple routes for human action perception. One responsible route for processing motion signals is through hMT+ to pSTS, and the other projects information to pSTS may be via the form-processing route. In addition, pSTS may integrate and mediate visual signals and possibly convey them to distributed areas to maintain high-order cognitive tasks.

  相似文献   

18.
Spike directivity, a new measure that quantifies the transient charge density dynamics within action potentials provides better results in discriminating different categories of visual object recognition. Specifically, intracranial recordings from medial temporal lobe (MTL) of epileptic patients have been analyzed using firing rate, interspike intervals and spike directivity. A comparative statistical analysis of the same spikes from a local ensemble of four selected neurons shows that electrical patterns in these neurons display higher separability to input images compared to spike timing features. If the observation vector includes data from all four neurons then the comparative analysis shows a highly significant separation between categories for spike directivity (p=0.0023) and does not display separability for interspike interval (p=0.3768) and firing rate (p=0.5492). Since electrical patterns in neuronal spikes provide information regarding different presented objects this result shows that related information is intracellularly processed in neurons and carried out within a millisecond-level time domain of action potential occurrence. This significant statistical outcome obtained from a local ensemble of four neurons suggests that meaningful information can be electrically inferred at the network level to generate a better discrimination of presented images.  相似文献   

19.
Visual processing of global and local features differentially engages the right and left hemispheres and requires different allocations of spatial attention. To further understand the decline in visual cognition and visual attention with age, we studied the performance of healthy young subjects and healthy elders on a global-local figures task. The results showed that elders processed global images more quickly when presented in the left visual field and local images in the right visual field, similarly to the young controls. However, we did observe a significant impairment in the elders' ability to process global figures compared with local figures, despite there being no overall difference between global and local processing speed among the young. It is thought that this age-related decline in global processing is related to the narrowed attentional field that can be demonstrated in other age-related visual processing declines such as visual search and useful field of view.  相似文献   

20.
We investigated the effects of categorization on the representation of stimulus features in combined psychophysical-electrophysiological experiments. We used parameterized line drawings of faces and fish as stimuli, and we varied the relevance of the different features for the categorization task. The psychophysical and electrophysiological data support an exemplar-based framework for visual object recognition. We recorded from visual neurons in the anterior inferior temporal (IT) cortex of macaque monkeys, while they were performing a categorization task. The visual neurons did not respond selectively to one stimulus set, or to one category. The majority of the anterior IT feature selective neurons were tuned for features that were diagnostic for the categorization task. We argue that this fine-tuning of the neurons reflects the perceptual sensitization to the diagnostic features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号