Views
Content related to "feature-extraction"
-
(1)
Subspace Extension to Phase Correlation Approach for Fast Image Registration
-
A novel extension of phase correlation to subspace correlation is proposed, in which 2-D translation is decomposed into two 1-D motions thus only 1-D Fourier transform is used to estimate the corresponding motion. In each subspace, the first two highest peaks from 1-D correlation are linearly interpolated for subpixel accuracy. Experimental results have shown both the robustness and accuracy of our method.
-
(1)
Shot Boundary Detection in MPEG Videos using Local and Global Indicators
-
Shot boundary detection (SBD) plays important roles in many video applications. In this paper, we describe a novel method on SBD operating directly in compressed domain. Firstly, several local indicators are extracted from MPEG macroblocks, and AdaBoost is employed for feature selection and fusion. The selected features are then used in classifying candidate cuts into five sub-spaces via pre-filtering and rule-based decision making. Following that, global indicators of frame similarity between boundary frames of cut candidates are examined using phase correlation of DC-images. Gradual transitions like fade, dissolve and combined shot cuts are also identified. Experimental results on the test data from TRECVID’07 have demonstrated the effectiveness and robustness of our proposed methodology.
-
(1)
Extracting Objects and Events from MPEG Sequences for Video Highlights Indexing and Retrieval
-
Automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications. In this paper, we propose techniques to solve this problem using knowledge supported extraction of semantics, and employing compressed-domain processing for efficiency. Firstly, knowledge-supported rules are utilized for shot detection on the extracted DC-images, and statistical skin detection is applied for human object detection. Secondly, through filtering outliers in motion vectors, improved detection of camera motions like zooming, panning and tilting are achieved. High-level semantics like video highlights are then automatically extracted via low-level analysis in the detection of human objects and camera motion events, and finally these highlights are taken for shot-level annotation, indexing and retrieval. Results from large data of test videos have demonstrated the accuracy and robustness of the proposed techniques.
-
(1)
Statistical Classification of Skin Color Pixels from MPEG Videos
-
Detection and classification of skin regions plays important roles in
many image processing and vision applications. In this paper, we present a statistical
approach for fast skin detection in MPEG-compressed videos. Firstly,
conditional probabilities of skin and non-skin are extracted from manual
marked training images. Then, candidate skin pixels are identified using the
Bayesian maximum a posteriori decision rule. An optimal threshold is then obtained
by analysis of probability error on the basis of the likelihood ratio histogram
of skin and nonskin pixels. Experiments from sequences with varying
illuminations have demonstrated that effectiveness of our approach.
-
(1)
COMPRESSED-DOMAIN SHOT BOUNDARY DETECTION USING FINITE STATE MACHINE AND CONTENT-BASED RULES
-
We propose a fast and systematic method for shot boundary detection in compressed domain using content-based rules and FSM (finite state machine). Firstly, several feature indicators are acquired from DC images in MPEG videos including luminance, color, edge, prediction error and inter-frame difference as well as motion. Then, several content-based rules are utilized to detect abrupt cuts. Thirdly, boundaries of gradual transitions are determined by a coarse to fine procedure with a pre-processing module and a FSM. According to the experiments using publicly available sequences from TRECVID, the results have showed that the proposed algorithm outperforms the representative existing algorithms in both precision rate and recall rates.
-
(1)
Recognition of JPEG Compressed Face Images Based on AdaBoost
-
This paper presents an advanced face recognition system
based on AdaBoost algorithm in the JPEG compressed domain. First,
the dimensionality is reduced by truncating some of the block-based
DCT coefficients and the nonuniform illumination variations are alleviated
by discarding the DC coefficient of each block. Next, an improved
AdaBoost.M2 algorithm which uses Euclidean Distance(ED) to eliminate
non-effective weak classifiers is proposed to select most discriminative
DCT features from the truncated DCT coefficient vectors. At last, the
LDA is used as the final classifier. Experiments on Yale face databases
show that the proposed approach is superior to other methods in terms
of recognition accuracy, efficiency, and illumination robustness.
-
(1)
A New Robust Watermarking Scheme for Color Image in Spatial Domain
-
This paper presents a new robust watermarking scheme for color image based on a block probability in spatial domain. A binary watermark image is permutated using sequence numbers generated by a secret key and Gray code, and then embedded four times in different positions by a secret key. Each bit of the binary encoded watermark is embedded by modifying the intensities of a non-overlapping block of 8*8 of the blue component of the host image. The extraction of the watermark is by comparing the intensities of a block of 8*8 of the watermarked and the original images and calculating the probability of detecting '0' or '1'. Tested by benchmark Stirmark 4.0, the experimental results show that the proposed scheme is robust and secure against a wide range of image processing operations.
-
(1)
Camera Motion Analysis towards Semantic-based Video Retrieval in Compressed Domain
-
To reduce the semantic gap between low-level visual features and the richness of human semantics, this paper proposes new algorithms, by virtue of the combined camera motion descriptors with multi-threshold, to automatically retrieve the semantic concepts, i.e., close-up, and panorama, directly in MPEG compressed domain based on camera motion analysis. Extensive experiments illustrate that the proposed algorithms provide promising retrieval results under real-time application scenario and without human intervention
-
(1)
Face Detection based on Skin Color in Image by Neural Networks
-
Face detection is one of the challenging problems in the image processing. A novel face detection system is prsented in this paper. The approach relies on skin-based color features extracted from two dimentional Discreate Cosine Transfer (DCT) and neural networks, which can be used to detect faces by using skin color from DCT coefficient of Cb and Cr feature vectors. This system contains the skin color which is the main feature of faces for detection, and then the skin face candidate is examined by using the neural networks, which learn from the feature of faces to classify whether the original image includes a face or not. The processing is based on normalization and Discreate Cosin Transfer. Finally the classification based on neural networks approch. The expriment results on upright frontal color face images from the internt show an exellent detection rate.
-
(1)
Real-time and Automatic Close-up Retrieval from Compressed Videos
-
In this paper, we propose a thorough scheme, by virtue of camera zooming descriptor with two-level threshold, to automatically retrieve close-ups directly from MPEG compressed videos based on camera motion analysis. In the retrieval process, we build camera-motion-based semantic retrieval. To improve the coverage of the proposed scheme, we investigate close-up retrieval in all kinds of videos. Extensive experiments illustrate that the proposed scheme provides promising retrieval results under real-time and automatic application scenario.
-
(1)
Robustness Analysis on Facial Image Description in DCT Domain
-
In this letter, we report a DCT domain analysis of facial images to reveal that, when
certain number of DCT coefficients are removed, the corresponding facial image description
by the remaining DCT coefficients becomes robust to lighting changes and scale variations.
Such nice properties would be very useful for applications of face recognition, video object
tracking, object segmentation and visual content processing.
-
(1)
A Block-Edge-Pattern based Content Descriptor in DCT Domain
-
In this correspondence, we describe a robust and effective content descriptor based on block edge patterns extracted directly in DCT domain, which is suitable for applications in JPEG or MPEG compressed images and videos. This content descriptor is constructed by a run-length edge-block histogram with three patterns including horizontal edge, vertical edge and no-edge. In comparison with existing descriptors, the proposed features: (i) low-cost computing suitable for real-time implementation and high-speed processing of compressed images or videos; (ii) robust to orientation changes such as rotation, noise, reverse etc. (iii) directly operates in compressed domain. Extensive experiments support that the proposed content descriptor is effective in describing visual content. In comparison with existing techniques, the proposed descriptor achieves superior performances in terms of retrieval precision and recall rates.
-
(1)
MPEG-2 Compressed-Domain Algorithms for Video Analysis
-
This paper presents new algorithms for extracting metadata from video sequences in the MPEG-2 compressed domain. Three algorithms for efficient low-level metadata extraction in preprocessing stages are described. The first algorithm detects camera motion using the motion vector field of an MPEG-2 video. The second method extends the idea of motion detection to a limited region of interest, yielding an efficient algorithm to track objects inside video sequences. The third algorithm performs a cut detection using macroblock types and motion vectors.
-
(1)
Analysis of cluttered scenes using an elastic matching approach for stereo images.
-
We present a system for the automatic interpretation of cluttered scenes containing multiple partly occluded objects in front of unknown, complex backgrounds. The system is based on an extended elastic graph matching algorithm that allows the explicit modeling of partial occlusions. Our approach extends an earlier system in two ways. First, we use elastic graph matching in stereo image pairs to increase matching robustness and disambiguate occlusion relations. Second, we use richer feature descriptions in the object models by integrating shape and texture with color features. We demonstrate that the combination of both extensions substantially increases recognition performance. The system learns about new objects in a simple one-shot learning approach. Despite the lack of statistical information in the object models and the lack of an explicit background model, our system performs surprisingly well for this very difficult task. Our results underscore the advantages of view-based feature constellation representations for difficult object recognition problems.
-
(1)
DCT-Domain Image Retrieval Via Block-Edge-Patterns
-
A new algorithm for compressed image retrieval is proposed in this
paper based on DCT block edge patterns. This algorithm directly extract three
edge patterns from compressed image data to construct an edge pattern histogram
as an indexing key to retrieve images based on their content features.
Three feature-based indexing keys are described, which include: (i) the first two
features are represented by 3-D and 4-D histograms respectively; and (ii) the
third feature is constructed by following the spirit of run-length coding, which
is performed on consecutive horizontal and vertical edges. To test and evaluate
the proposed algorithms, we carried out two-stage experiments. The results
show that our proposed methods are robust to color changes and varied noise.
In comparison with existing representative techniques, the proposed algorithms
achieves superior performances in terms of retrieval precision and processing
speed.
-
(1)
Description of Online and Offline Metadata Extraction out of Sports Videos
-
We focus on online and offline metadata extraction and annotation out of sports videos. The main benefit of our method is immediate and automatic extraction and annotation of metadata by giving semantics to combinations of heterogeneous low-level visual features. It brings new opportunities for efficient utilisation of sports video in improved ways, and is easily customized to address the characteristics. Firstly, semantic scene classification is described, including key-frames extraction, similarities determination between shots, and rule based estimation of scene boundaries. Secondly, fuzzy logic based categorizing is presented, including paradigm, Fuzzy membership function, and fuzzy feature generation and similarity measure. Thirdly, automatic sports video annotation is proposed, including robust dominant colour region detection, combined motion feature analysis. This work has been evaluated in the TRECVID 2007 competition.
-
(1)
Extracting Semantics and Content Adaptive Summarisation for Effective Video Retrieval
-
In this paper, we provide a system for semantic video retrieval in which extracted semantic contents are used to generate summarised videos for effective delivery of retrieved results. Firstly, several useful features are extracted in compressed video on the basis of the DC-images and motion vectors. Secondly, shot changes are detected to enable shot-level content indexing and retrieval. Thirdly, several semantics concepts are automatically detected including outdoor/indoor scenes, building, sky and human objects. The results of detected shots and extracted semantic concepts are then used for semantic indexing of video contents. Furthermore, a combined measurement is produced from these semantics for content adaptive video summarisation. According to the network performance, the retrieved video can be delivered at various sizes using our summarisation techniques for efficiency.
-
(1)
An Effective and Fast Scene Change Detection Algorithm for MPEG Compressed Videos
-
In this paper, we propose an effective and fast scene change detection
algorithm directly in MPEG compressed domain. The proposed scene change
detection exploits the MPEG motion estimation and compensation scheme by
examining the prediction status for each macro-block inside B frames and P
frames. As a result, locating both abrupt and dissolved scene changes is operated
by a sequence of comparison tests, and no feature extraction or histogram
differentiation is needed. Therefore, the proposed algorithm can operate in
compressed domain, and suitable for real-time implementations. Extensive experiments
illustrate that the proposed algorithm achieves up to 94% precision
for abrupt scene change detection and 100% for gradual scene change detection.
In comparison with similar existing techniques, the proposed algorithm
achieves superiority measured by recall and precision rates.
-
(1)
Video Indexing and Retrieval in Compressed Domain Using Fuzzy-Categorization
-
There has been an increased interest in video indexing and retrieval
in recent years. In this work, indexing and retrieval system of the visual
contents is based on feature extracted from the compressed domain. Direct possessing
of the compressed domain spares the decoding time, which is extremely
important when indexing large number of multimedia archives. A fuzzycategorizing
structure is designed in this paper to improve the retrieval performance.
In our experiment, a database that consists of basketball videos has
been constructed for our study. This database includes three categories: fullcourt
match, penalty and close-up. First, spatial and temporal feature extraction
is applied to train the fuzzy membership functions using the minimum entropy
optimal algorithm. Then, the max composition operation is used to generate a
new fuzzy feature to represent the content of the shots. Finally, the fuzzy-based
representation becomes the indexing feature for the content-based video retrieval
system. The experimental results show that the proposal algorithm is
quite promising for semantic-based video retrieval.
-
(1)
Afuzzy logic approach for detection of video shot boundaries
-
Video temporal segmentation is normally the first and important step for content-based video applications. Many features including
the pixel difference, colour histogram, motion, and edge information etc. have been widely used and reported in the literature to detect
shot cuts inside videos. Although existing research on shot cut detection is active and extensive, it still remains a challenge to achieve
accurate detection of all types of shot boundaries with one single algorithm. In this paper, we propose a fuzzy logic approach to integrate
hybrid features for detecting shot boundaries inside general videos. The fuzzy logic approach contains two processing modes, where one
is dedicated to detection of abrupt shot cuts including those short dissolved shots, and the other for detection of gradual shot cuts. These
two modes are unified by a mode-selector to decide which mode the scheme should work on in order to achieve the best possible detection
performances. By using the publicly available test data set from Carleton University, extensive experiments were carried out and the test
results illustrate that the proposed algorithm outperforms the representative existing algorithms in terms of the precision and recall rates.
2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
-
(1)
Subsampling-based image watermarkng in compressed DCT domain
-
In this paper, a new embedding strategy for watermarking is presented based on DC components of subimages in
compressed discrete cosine transform (DCT) domain. These subimages are obtained through subsampling the
host image. More robustness has been achieved when watermarks are embedded in perceptually significant DC
components. Furthermore, the original image is not required in the extraction process. Experimental results
show that the proposed scheme successfully makes the watermark perceptually invisible and robust for a wide
range of attacks, including JPEG-loss compression, filtering, scaling, and cropping attacks.
-
(1)
Progressive content access to databases of JPEG compressed images
-
Progressive content access provides a mode that allows a coarse version of an image being
viewed at a lower computing cost and then gradually refined by subsequent resolution enhancement if
required. This proves extremely useful when millions of compressed images and video sequences need to
be browsed manually or processed in pixel domain, saving the cost and removing the necessity of full
decompression. In this paper, we propose such a progressive content access algorithm suitable for all
DCT-based JPEG and MPEG compressed files. We first develop a theoretical model in approximation of
cosine function used in IDCT with various orders. Following that, we then propose a progressive content
access algorithm, which comprehends both the successive approximation and the spectral selection.
Further analysis and experiments are reported to show that our proposed algorithm saves computational
cost in comparison with JPEG full decompression. Extensive experiments also support that the proposed
algorithm achieves encouraging PSNR values for reconstructed images even with lower order
approximation.
-
(1)
University of Bradford at TRECVID 2008 Content Based Copy Detection Task
-
We present a novel method for spatial-temporal video copy detection based on adaptive masking. Firstly, a dedicated video analysis is implemented for input videos, which ensures the accurate detection of complicated distortions query videos may undergo. Secondly, simple signatures are extracted for the benefit of time and space efficiency, and the frame mask is generated adaptively to reduce video temporal redundancy. Thirdly, a matching process is implemented to find video copies. The proposed video copy detection framework is effective, and robust against spatial and temporal variations.
-
(1)
D5.4 Report on evaluation of methods
-
This document reports on the first evaluation of tools developed in the LIVE project for manual,
semiautomatic and automatic annotation and extraction of knowledge in work package 5.
We start this report with findings on the international TRECVID 2007 evaluation of LIVE
tools for automatic shot boundary classification. The compressed domain shot boundary detector
developed in the LIVE project showed the third best recognition performance of all 15
participating research groups in this competition. Despite the excellent results, the generalization
of the performance from news and documentary data used in TRECVID 2007 to more
difficult sports data produced by the LIVE streams of Olympia 2008 remains difficult. Only
further evaluations on labelled data stemming from Olympia 2004 and the upcoming Olympia
2008 event will show how suitable the developed technology is for extracting information
automatically from sports broadcasts – a domain, for which neither standard international
benchmarks nor any international competition exist. The detection of gradual transitions in
sports video must still be considered unsolved and need further research. However, the
evaluation results of TRECVID show the potential of the developed technology and their maturity.
The next section of this document deals with the performance of different face recognition
methods which are developed in the LIVE project to identify athletes and other important persons
in the video stream automatically. We measure the performance in rather controlled optimal
situations, benchmarked on the Bochum gallery, but also on a “worse-case” gallery with
rather mixed content. The result is promising but uncontrolled environment and incorrect feature
correspondence lead to poor results – especially if more advanced P2D-HMMs face recognition
technology is applied. Hence, component face detectors have been developed in the
project in order to improve the correspondence search in pose estimation before any identification
can be performed. We report in this document on the performance of several face component
detectors for eyes, nose and mouth locations developed in the course of the project to
improve face pose estimation and recognition. Despite the fact that the performance of individual
face component detectors is quite high when evaluated on a test set stemming from the
same database, generalization of the facial recognition algorithms to other more uncontrolled
galleries remains a challenge. However, as the integration of the face component detectors in
the face recognition framework is still lacking, no sound evaluation can be performed. We
will report in an upcoming report D 5.7 on the results of our research and how the different
algorithms perform on Olympia 2008 sports data during the field trial.
-
(1)
Skin Detection from Different Color Spaces for Model-based Face Detection
-
Skin and face detection has many important applications in intelligent human-machine interfaces, reliable video surveillance and visual understanding of human activities. In this paper, we propose an efficient and effective method for frontal-view face detection based on skin detection and knowledge-based modeling. Firstly, skin pixels are modeled by using supervised training, and boundary conditions are then extracted for skin segmentation. Faces are further detected by shape filtering and knowledge-based modeling. Skin results from different color spaces are compared. In addition, experimental results have demonstrated our method robust in successful detection of skin and face re-gions even with variant lighting conditions and poses.
-
(1)
An efficient face image retrieval through DCT features
-
This paper proposes a new simple method of DCT feature extraction that utilize to accelerate the speed and decrease storage needed in image retrieving process by the aim of direct content access and extraction from JPEG compressed domain. Our method extracts the average of some DCT block coefficients. This method needs only a vector of six coefficients per block over the whole image blocks In our retrieval system, for simplicity, an image of both query and database are normalized and resized from the original database based on the cantered position of the eyes, the normalized image equally divided into non overlapping 8X8 block pixel Therefore, each of which are associated with a feature vector derived directly from discrete cosine transform DCT. Users can select any query as the main theme of the query image. The retrieval images is the relevance between a query image and any database image, the relevance similarity is ranked according to the closest similar measures computed by the Euclidean distance. The experimental results show that our approach is easy to identify main objects and reduce the influence of background in the image, and thus improve the performance of image retrieval.
-
(1)
A Block-Edge-Pattern-Based Content Descriptor in DCT Domain
-
In this correspondence, we describe a robust and effective content descriptor based on block-edge patterns extracted in discrete cosine transform domain, which is suitable for applications in JPEG or MPEG compressed images and videos. This content descriptor is constructed by a run-length edge-block histogram with three patterns including horizontal edge, vertical edge and no edge. In comparison with existing descriptors, the proposed features: 1) low-cost computing suitable for real-time implementation and high-speed processing of compressed videos; 2) robust to orientation changes such as rotation, noise, reverse, etc.; 3) operates
in compressed domain. Extensive experiments support that the proposed content descriptor is effective in describing visual content, and achieves superior performances in terms of retrieval precision
and recall rates.
-
(1)
Fusion of intensity and channel difference for improved colour edge detection
-
Edge detection, especially from colour images, plays very
important roles in many applications for image analysis,
segmentation and recognition. In this paper, a new colourgray
mapping method for effective colour edge detection is
proposed. From any given colour image C, a gray image D is
defined as the accumulative differences between each of its
two colour channels, and another gray image R is then
obtained by weighting of D and gray intensity image G.
Fusion of edges extracted from R and G forms the final
results. Comparing with edges detected from traditional
colour spaces like RGB, YCbCr and HSV, all using same
Canny operator, it seems the proposed method can achieve
more effective results from different test images.
-
(1)
Face Detection based Neural Networks using Robust Skin Color Segmentation
-
This paper proposes a robust schema for face detection system via Gaussian mixture model to segment image based on skin color. After skin and non skin face candidates’ selection, features are extracted directly from discrete cosine transform (DCT) coefficients computed from these candidates. Moreover, the back-propagation neural networks are used to train and classify faces based on DCT feature coefficients in Cb and Cr color spaces. This schema utilizes the skin color information, which is the main feature of face detection. DCT feature values of faces, representing the data set of skin/non-skin face candidates obtained from Gaussian mixture model are fed into the back-propagation neural networks to classify whether the original image includes a face or not. Experimental results shows that the proposed schema is reliable for face detection, and pattern features are detected and classified accurately by the backpropagation neural networks.
-
(1)
Hierarchical Modeling and Adaptive Clustering for Real-time Summarization of Rush Videos
-
In this paper, we provide detailed descriptions of a proposed new algorithm for video summarization, which are also included in our submission to TRECVID’08 on BBC rush summarization. Firstly, rush videos are hierarchically modeled using the formal language technique. Secondly, shot detection are applied to introduce a new concept of V-unit for structuring videos in line with the hierarchical model, and thus junk frames within the model are effectively removed. Thirdly, adaptive clustering is employed to group shots into clusters to determine retakes for redundancy removal. Finally, each most representative shot selected from every cluster is ranked according to its length and sum of activity level for summarization. Competitive results have been achieved to prove the effectiveness and efficiency of our techniques, which are fully implemented in the compressed domain. Our work does not require high-level semantics such as object detection and speech/audio analysis which provides a more flexible and general solution for this topic.
This site conforms to the following standards:
|