Views
Content related to "high-level-metadata-extraction"
-
(1)
Description of Online and Offline Metadata Extraction out of Sports Videos
-
We focus on online and offline metadata extraction and annotation out of sports videos. The main benefit of our method is immediate and automatic extraction and annotation of metadata by giving semantics to combinations of heterogeneous low-level visual features. It brings new opportunities for efficient utilisation of sports video in improved ways, and is easily customized to address the characteristics. Firstly, semantic scene classification is described, including key-frames extraction, similarities determination between shots, and rule based estimation of scene boundaries. Secondly, fuzzy logic based categorizing is presented, including paradigm, Fuzzy membership function, and fuzzy feature generation and similarity measure. Thirdly, automatic sports video annotation is proposed, including robust dominant colour region detection, combined motion feature analysis. This work has been evaluated in the TRECVID 2007 competition.
-
(1)
Extracting Semantics and Content Adaptive Summarisation for Effective Video Retrieval
-
In this paper, we provide a system for semantic video retrieval in which extracted semantic contents are used to generate summarised videos for effective delivery of retrieved results. Firstly, several useful features are extracted in compressed video on the basis of the DC-images and motion vectors. Secondly, shot changes are detected to enable shot-level content indexing and retrieval. Thirdly, several semantics concepts are automatically detected including outdoor/indoor scenes, building, sky and human objects. The results of detected shots and extracted semantic concepts are then used for semantic indexing of video contents. Furthermore, a combined measurement is produced from these semantics for content adaptive video summarisation. According to the network performance, the retrieved video can be delivered at various sizes using our summarisation techniques for efficiency.
-
(1)
Constrained Region-Growing and Edge Enhancement Towards Automated Semantic Video Object Segmentation
-
Most existing object segmentation algorithms suffer from a so-called
under-segmentation problem, where parts of the segmented object are missing
and holes often occur inside the object region. This problem becomes even
more serious when the object pixels have similar intensity values as that of
backgrounds. To resolve the problem, we propose a constrained region-growing
and contrast enhancement to recover those missing parts and fill in the holes inside
the segmented objects. Our proposed scheme consists of three elements: (i)
a simple linear transform for contrast enhancement to enable stronger edge detection;
(ii) an 8-connected linking regional filter for noise removal; and (iii) a
constrained region-growing for elimination of those internal holes. Our experiments
show that the proposed scheme is effective towards revolving the undersegmentation
problem, in which a representative existing algorithm with edgemap
based segmentation technique is used as our benchmark.
-
(1)
ROBUST MULTIPLE WATERMARKING IN COLOR IMAGES WITH CORRELATION COEFFICIENT DETECTOR
-
In this paper, we propose a robust multiple digital watermarking technique for the copyright protection of digital color images. The watermark is a binary image, which is divided into four parts, each encrypted using a secret Key and embedded using the spatial domain into four different regions of size 128×128 of the blue component of the color image. Watermark extraction is based on the comparisons between the original intensity pixel values and the corresponding watermarked intensity pixel values in blocks of size 8x8. The watermark extracted bits are determined using the probabilities of detecting bit '1' or bit '0'. The watermark can be extracted in sixteen parts but only four of these are selected by a correlation coefficient detector and used to reconstruct the extracted watermark. Experimental results show that the proposed scheme successfully makes the watermark
perceptually invisible and robust for a wide range of attacks, including JPEG loss compression, median filtering, low pass filtering, rotation, rotation-scaling, rotation-crop, image cropping, image scaling and self similarity attacks.
-
(1)
A Block-Edge-Pattern-Based Content Descriptor in DCT Domain
-
In this correspondence, we describe a robust and effective content descriptor based on block-edge patterns extracted in discrete cosine transform domain, which is suitable for applications in JPEG or MPEG compressed images and videos. This content descriptor is constructed by a run-length edge-block histogram with three patterns including horizontal edge, vertical edge and no edge. In comparison with existing descriptors, the proposed features: 1) low-cost computing suitable for real-time implementation and high-speed processing of compressed videos; 2) robust to orientation changes such as rotation, noise, reverse, etc.; 3) operates
in compressed domain. Extensive experiments support that the proposed content descriptor is effective in describing visual content, and achieves superior performances in terms of retrieval precision
and recall rates.
-
(1)
D5.4 Report on evaluation of methods
-
This document reports on the first evaluation of tools developed in the LIVE project for manual,
semiautomatic and automatic annotation and extraction of knowledge in work package 5.
We start this report with findings on the international TRECVID 2007 evaluation of LIVE
tools for automatic shot boundary classification. The compressed domain shot boundary detector
developed in the LIVE project showed the third best recognition performance of all 15
participating research groups in this competition. Despite the excellent results, the generalization
of the performance from news and documentary data used in TRECVID 2007 to more
difficult sports data produced by the LIVE streams of Olympia 2008 remains difficult. Only
further evaluations on labelled data stemming from Olympia 2004 and the upcoming Olympia
2008 event will show how suitable the developed technology is for extracting information
automatically from sports broadcasts – a domain, for which neither standard international
benchmarks nor any international competition exist. The detection of gradual transitions in
sports video must still be considered unsolved and need further research. However, the
evaluation results of TRECVID show the potential of the developed technology and their maturity.
The next section of this document deals with the performance of different face recognition
methods which are developed in the LIVE project to identify athletes and other important persons
in the video stream automatically. We measure the performance in rather controlled optimal
situations, benchmarked on the Bochum gallery, but also on a “worse-case” gallery with
rather mixed content. The result is promising but uncontrolled environment and incorrect feature
correspondence lead to poor results – especially if more advanced P2D-HMMs face recognition
technology is applied. Hence, component face detectors have been developed in the
project in order to improve the correspondence search in pose estimation before any identification
can be performed. We report in this document on the performance of several face component
detectors for eyes, nose and mouth locations developed in the course of the project to
improve face pose estimation and recognition. Despite the fact that the performance of individual
face component detectors is quite high when evaluated on a test set stemming from the
same database, generalization of the facial recognition algorithms to other more uncontrolled
galleries remains a challenge. However, as the integration of the face component detectors in
the face recognition framework is still lacking, no sound evaluation can be performed. We
will report in an upcoming report D 5.7 on the results of our research and how the different
algorithms perform on Olympia 2008 sports data during the field trial.
-
(1)
An Intelligent Media Framework for Multimedia Content
-
Search, retrieval and navigation in multimedia repositories is a task common to all multimedia management systems: Users are supported by a wide range of features which are traditionally based on full text search and metadata queries. However generating metadata is an error-prune and work-intensive task, that for multimedia content cannot yet be made fully automatically. In this position paper we describe our vision of an Intelligent Media Framework that is capable of combining metadata and knowledge about media items in order to support user- orientation, search and retrieval in media-rich information spaces: We try to integrate heterogeneous sources to create an Intelligent Media Framework containing Intelligent Media Objects carrying behavioural knowledge and capable of fully describing themselves. The properties of these objects amongst others serve to the fact that users more likely search by the ”meaning” of audiovisual objects and what is represented tby them respectively, than by their pure low-level features.
-
(1)
Skin Detection from Different Color Spaces for Model-based Face Detection
-
Skin and face detection has many important applications in intelligent human-machine interfaces, reliable video surveillance and visual understanding of human activities. In this paper, we propose an efficient and effective method for frontal-view face detection based on skin detection and knowledge-based modeling. Firstly, skin pixels are modeled by using supervised training, and boundary conditions are then extracted for skin segmentation. Faces are further detected by shape filtering and knowledge-based modeling. Skin results from different color spaces are compared. In addition, experimental results have demonstrated our method robust in successful detection of skin and face re-gions even with variant lighting conditions and poses.
-
(1)
Knowledge-supported segmentation and semantic contents extraction from MPEG videos for highlight-based annotation, indexing and retrieval
-
Automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications. In this paper, we propose techniques to solve this problem by using knowledge supported extraction of semantic contents, and compressed-domain processing is employed for efficiency. Firstly, video shots are detected by using know-ledge-supported rules. Then, human objects are detected via statistical skin de-tection. Meanwhile, camera motion like zoom in is identified. Finally, highlights of zooming in human objects are extracted and used for annotation, indexing and retrieval of the whole videos. Results from large data of test videos have demonstrated the accuracy and robustness of the proposed techniques.
This site conforms to the following standards:
|