Scientific Papers
Predicting future User Behaviour in interactive live TV
Authors: Martin Gude, Stefan M. Gruenvogel, and Andreas Puetz
Abstract: Recommender systems are a means of personalisation providing their users with personalised recommendations of items that would possibly suit the users needs. They are used in a broad area of contexts where items are somehow linked to users. The creation of recommendations of interactive live TV suffers from several inherent problems, e.g. the impossibility to foresee the contents of the next items or the reactions of the user to the changing programme. This paper proposes an algorithm for building personalised streams within interactive live TV. The development of the algorithm comprises a basic model for users and media items. A first preliminary evaluation of the alogithm is executed and the results discussed.
In M. Tscheligi, M.Obrsit, and A. Lugmayr (Eds.): EuroITV 2008, LNCS 4066, pp.117-121, 2008, Springer-Verlag, Berlin
Extracting Semantics and Content Adaptive Summarisation for Effective Video Retrieval
Authors: Jianmin Jiang, Jinchang Ren
Abstract: In this paper, we provide a system for semantic video retrieval in which extracted semantic contents are used to generate summarised videos for effective delivery of retrieved results. Firstly, several useful features are extracted in compressed video on the basis of the DC-images and motion vectors. Secondly, shot changes are detected to enable shot-level content indexing and retrieval. Thirdly, several semantics concepts are automatically detected including outdoor/indoor scenes, building, sky and human objects. The results of detected shots and extracted semantic concepts are then used for semantic indexing of video contents. Furthermore, a combined measurement is produced from these semantics for content adaptive video summarisation. According to the network performance, the retrieved video can be delivered at various sizes using our summarisation techniques for efficiency.
Published in EU NEM Summit 2009
Extracting Objects and Events from MPEG Sequences for Video Highlights Indexing and Retrieval
Authors: Jianmin Jiang, Jinchang Ren
Abstract: Automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications. In this paper, we propose techniques to solve this problem using knowledge supported extraction of semantics, and employing compressed-domain processing for efficiency. Firstly, knowledge-supported rules are utilized for shot detection on the extracted DC-images, and statistical skin detection is applied for human object detection. Secondly, through filtering outliers in motion vectors, improved detection of camera motions like zooming, panning and tilting are achieved. High-level semantics like video highlights are then automatically extracted via low-level analysis in the detection of human objects and camera motion events, and finally these highlights are taken for shot-level annotation, indexing and retrieval. Results from large data of test videos have demonstrated the accuracy and robustness of the proposed techniques.
To appear in Journal of Multimedia
Shot Boundary Detection in MPEG Videos using Local and Global Indicators
Authors: Jianmin Jiang, Jinchang Ren
Abstract: Shot boundary detection (SBD) plays important roles in many video applications. In this paper, we describe a novel method on SBD operating directly in compressed domain. Firstly, several local indicators are extracted from MPEG macroblocks, and AdaBoost is employed for feature selection and fusion. The selected features are then used in classifying candidate cuts into five sub-spaces via pre-filtering and rule-based decision making. Following that, global indicators of frame similarity between boundary frames of cut candidates are examined using phase correlation of DC-images. Gradual transitions like fade, dissolve and combined shot cuts are also identified. Experimental results on the test data from TRECVID’07 have demonstrated the effectiveness and robustness of our proposed methodology.
To appear in IEEE Trans. CSVT
Hierarchical Modeling and Adaptive Clustering for Real-time Summarization of Rush Videos
Authors: Jianmin Jiang, Jinchang Ren
Abstract: In this paper, we provide detailed descriptions of a proposed new algorithm for video summarization, which are also included in our submission to TRECVID’08 on BBC rush summarization. Firstly, rush videos are hierarchically modeled using the formal language technique. Secondly, shot detection are applied to introduce a new concept of V-unit for structuring videos in line with the hierarchical model, and thus junk frames within the model are effectively removed. Thirdly, adaptive clustering is employed to group shots into clusters to determine retakes for redundancy removal. Finally, each most representative shot selected from every cluster is ranked according to its length and sum of activity level for summarization. Competitive results have been achieved to prove the effectiveness and efficiency of our techniques, which are fully implemented in the compressed domain. Our work does not require high-level semantics such as object detection and speech/audio analysis which provides a more flexible and general solution for this topic.
To appear in IEEE Trans. Multimedia
Recommender System for the Multi-Channel TV Production
Authors: Janez Zaletelj
Abstract: This paper presents the concept of content recommendations for the production of multi-channel TV shows. Within the IST FP6 project "LIVE; Live Staging of Media Events" we are developing a production support system which will have a functionality of content recommendations and will support production of multi-channels programs. The paper outlines a concept of a recommender system for the multi-channel TV production and presents basic architecture and workflows within the system. The recommendation of the archive content for a given channel is personalized by taking into account the profile of the target audience.
LIVE - a system for consumer-personalised production of TV programmes
Authors: Janez Zaletelj
Abstract: This paper presents the concept of viewer feedback in the production of multi-channel TV shows. Within the IST project LIVE; Live Staging of Media Events we are developing a production support system which will have a functionality of content recommendations and will support production of multi-channels programs. The paper outlines a concept of a recommender and feedback system for the multi-channel TV production and presents examples of the feedback tools within the TV production office. The LIVE system enables the Director to track the preferences of the TV viewers in real-time, during the live production of the show, and on the other hand give the viewers the possibility to actively influence the TV content.
Scientific Papers from 2008
University of Bradford at TRECVID 2008 Content Based Copy Detection Task
Authors: Jianmin Jiang
Abstract: We present a novel method for spatial-temporal video copy detection based on adaptive masking. Firstly, a dedicated video analysis is implemented for input videos, which ensures the accurate detection of complicated distortions query videos may undergo. Secondly, simple signatures are extracted for the benefit of time and space efficiency, and the frame mask is generated adaptively to reduce video temporal redundancy. Thirdly, a matching process is implemented to find video copies. The proposed video copy detection framework is effective, and robust against spatial and temporal variations.
TrecVid 2008 workshop on video copy detection, http://www-nlpir.nist.gov/projects/tvpubs/tv8.papers/bradford.pdf
Face Detection based Neural Networks using Robust Skin Color Segmentation
Authors: Ying Weng
Abstract: This paper proposes a robust schema for face detection system via Gaussian mixture model to segment image based on skin color. After skin and non skin face candidates’ selection, features are extracted directly from discrete cosine transform (DCT) coefficients computed from these candidates. Moreover, the back-propagation neural networks are used to train and classify faces based on DCT feature coefficients in Cb and Cr color spaces. This schema utilizes the skin color information, which is the main feature of face detection. DCT feature values of faces, representing the data set of skin/non-skin face candidates obtained from Gaussian mixture model are fed into the back-propagation neural networks to classify whether the original image includes a face or not. Experimental results shows that the proposed schema is reliable for face detection, and pattern features are detected and classified accurately by the backpropagation neural networks.
Published by IEEE Fifth International Multi-Conference on Systems, Signals and Devices (IEEE SSD'08), 20-23 July 2008, Amman-Jordan.
Demo of Video Conducting the Olympic Games 2008: The iTV Field Trial of the EU-IST Project LIVE
Authors: Carmen Mac Williams, Roland Westermaier, Torsten Kliemand
Abstract: The EU IP project LIVE investigates new professional roles, production methods and iTV formats and staging of live media events in professional broadcast environment. The vision of LIVE is to change the role of the traditional live TV director to become a future Video Conductor communicating in real time with his audience.
In: Proceedings of the EuroITV 2008: July 3 - 4, 2008, Salzburg, Austria.
Video Conducting the Olympic Games 2008: The iTV Field Trial of the EU-IST Project LIVE
Authors: Carmen Mac Williams, Richard Wages
Abstract: In the upcoming field trial of LIVE a Video Conductor and his team at the public Austrian TV Station ORF will stage a live non linear multi-perspective show around the Olympic Games 2008 with the instant feedback of 500 invited Austrian Telecom IPTV test end users. The aim of the field trial is to improve the public TV service to the Austrian public by dynamically linking multi-stream videos as the live event of the Olympic Games 2008 and as the viewers demand it. The Video Conductor ensures a quality of drama by linking the multi-stream videos responding to the unfolding sport action and the audience’s mood.
In: Proceedings of the 3rd ACM International Conference on Digital Interactive Media in Entertainment and Arts, DIMEA 2008, September 10 - 12, 2008, Athens, Greece
ROBUST MULTIPLE WATERMARKING IN COLOR IMAGES WITH CORRELATION COEFFICIENT DETECTOR
Authors: Ibrahim Nasir, Ying Weng, Jianmin Jiang and Stanley Ipson
Abstract: In this paper, we propose a robust multiple digital watermarking technique for the copyright protection of digital color images. The watermark is a binary image, which is divided into four parts, each encrypted using a secret Key and embedded using the spatial domain into four different regions of size 128×128 of the blue component of the color image. Watermark extraction is based on the comparisons between the original intensity pixel values and the corresponding watermarked intensity pixel values in blocks of size 8x8. The watermark extracted bits are determined using the probabilities of detecting bit '1' or bit '0'. The watermark can be extracted in sixteen parts but only four of these are selected by a correlation coefficient detector and used to reconstruct the extracted watermark. Experimental results show that the proposed scheme successfully makes the watermark perceptually invisible and robust for a wide range of attacks, including JPEG loss compression, median filtering, low pass filtering, rotation, rotation-scaling, rotation-crop, image cropping, image scaling and self similarity attacks.
Proceeding of the 8th IASTED International Conference on Visualization, Imaging and Image Processing (VIIP 2008), Sep 1-3, 2008, Palma de Mallorca, Spain, pp.280-285.
Ein Ansatz zur Unterstützung traditioneller Klassifikation durch Social Tagging
Authors: Georg Güntner, Rolf Sint, Rupert Westenthaler
Abstract: Der vorliegende Beitrag stellt einen Ansatz zur Kombination von traditionellen, geschlossenen Klassifikationsverfahren mit offenen, auf Social Tagging basieren-den Klassifikationsverfahren vor. Die Darstellung geht von den grundsätzlichen Anforderungen an die Suche und Navigation in Dokumentenarchiven aus, erörtert die Vor- und Nachteile von geschlossenen und offenen Klassifikationsansätzen und präsentiert schließlich einen kombinierten Lösungsansatz, der im Rahmen ei-nes Prototypen umgesetzt wurde. Der Lösungsansatz sieht vor, dass Dokumente grundsätzlich mit freien Tags klas-sifiziert werden können: Die Klassifikation wird jedoch durch ein kontrolliertes Vokabular unterstützt. Freie Tags werden in einem nachgeordneten, moderierten Prozess in das kontrollierte Vokabular übernommen. Das auf diese Weise wach-sende und laufend gepflegte Vokabular unterstützt die Suche und Navigation im Dokumentenraum.
In: Birgit Gaiser, Thorsten Hampel, Stefanie Panke (Hrsg.) Good Tags – Bad Tags. Social Tagging in der Wissensorganisation.
Bringing "Intelligence" to iTV: The Intelligent Media Framework
Authors: Georg Güntner, Tobias Bürger, Dietmar Glachs, Rupert Westenthaler
Abstract: This paper gives an overview of a software framework designed for the creation of interactive multi-channel television shows. The “Intelligent Media Framework” forms the middleware of an iTV production support system developed in the context of the European integrated project “LIVE”. The framework is designed according to service oriented architecture (SOA) principles for easy integration into existing iTV and TV production environments. Moreover, the Intelligent Media Frame-work is based on a knowledge model formalising the main aspects identified to make up the domain of real-time staging of media events: the content (media clips and media streams), the events, the staging and the users (professional users and consumers). The framework offers services for the development of tools assisting the production team of multi-channel iTV shows in an intelligent way: The envisaged “intelligence” is based on formal, machine understandable descriptions of the content and the events: This document introduces the knowledge model and provides an overview of the architecture of a media framework designed to support the iTV production process in an intelligent way
Presented at EuroiTV 2008 Salzburg, Austria
Skin detection from different colour spaces for model-based face detection
Authors: Jianmin Jiang, Jinchang Ren
Abstract: Skin and face detection has many important applications in intelligent human-machine interfaces, reliable video surveillance and visual understanding of human activities. In this paper, we propose an efficient and effective method for frontal-view face detection based on skin detection and knowledge-based modeling. Firstly, skin pixels are modeled by using supervised training, and boundary conditions are then extracted for skin segmentation. Faces are further detected by shape filtering and knowledge-based modeling. Skin results from different color spaces are compared. In addition, experimental results have demonstrated our method robust in successful detection of skin and face re-gions even with variant lighting conditions and poses.
Springer CCIS, Vol. 15, ISBN 978-3-540-85929-1, 2008
Knowledge-supported segmentation and semantic contents extraction from MPEG videos for highlight-based annotation, indexing and retrieval
Authors: Jianmin Jiang, Jinchang Ren
Abstract: Automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications. In this paper, we propose techniques to solve this problem by using knowledge supported extraction of semantic contents, and compressed-domain processing is employed for efficiency. Firstly, video shots are detected by using knowledge-supported rules. Then, human objects are detected via statistical skin detection. Meanwhile, camera motion like zoom in is identified. Finally, highlights of zooming in human objects are extracted and used for annotation, indexing and retrieval of the whole videos. Results from large data of test videos have demonstrated the accuracy and robustness of the proposed techniques.
Springer LNCS 5226, pp. 258-265, ISBN 978-3-540-87440-9, 2008. http://www.springerlink.com/content/x280vukw44g54552/
Fusion of intensity and channel difference for improved colour edge detection
Authors: Jianmin Jiang, Jinchang Ren
Abstract: Edge detection, especially from colour images, plays very important roles in many applications for image analysis, segmentation and recognition. In this paper, a new colour gray mapping method for effective colour edge detection is proposed. From any given colour image C, a gray image D is defined as the accumulative differences between each of its two colour channels, and another gray image R is then obtained by weighting of D and gray intensity image G. Fusion of edges extracted from R and G forms the final results. Comparing with edges detected from traditional colour spaces like RGB, YCbCr and HSV, all using same Canny operator, it seems the proposed method can achieve more effective results from different test images.
In Proc. VIE'08, Xi'an, China, pp. 18-22, July 29-Aug 1, 2008
A Block-Edge-Pattern-Based Content Descriptor in DCT Domain
Authors: Jianmin Jiang, Kaijin Qiu, and Guoqiang Xiao
Abstract: In this correspondence, we describe a robust and effective content descriptor based on block-edge patterns extracted in discrete cosine transform domain, which is suitable for applications in JPEG or MPEG compressed images and videos. This content descriptor is constructed by a run-length edge-block histogram with three patterns including horizontal edge, vertical edge and no edge. In comparison with existing descriptors, the proposed features: 1) low-cost computing suitable for real-time implementation and high-speed processing of compressed videos; 2) robust to orientation changes such as rotation, noise, reverse, etc.; 3) operates in compressed domain. Extensive experiments support that the proposed content descriptor is effective in describing visual content, and achieves superior performances in terms of retrieval precision and recall rates
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL.18, NO.7, pp.994-998,JULY 2008
An Efficient Face Image Retrieval through DCT Features
Authors: Aamer. S. S. Mohamed, Ying Weng,Jianmin Jiang andStan Ipson
Abstract: This paper proposes a new simple method of DCT feature extraction that utilize to accelerate the speed and decrease storage needed in image retrieving process by the aim of direct content access and extraction from JPEG compressed domain. Our method extracts the average of some DCT block coefficients. This method needs only a vector of six coefficients per block over the whole image blocks In our retrieval system, for simplicity, an image of both query and database are normalized and resized from the original database based on the cantered position of the eyes, the normalized image equally divided into non overlapping 8X8 block pixel Therefore, each of which are associated with a feature vector derived directly from discrete cosine transform DCT. Users can select any query as the main theme of the query image. The retrieval images is the relevance between a query image and any database image, the relevance similarity is ranked according to the closest similar measures computed by the Euclidean distance. The experimental results show that our approach is easy to identify main objects and reduce the influence of background in the image, and thus improve the performance of image retrieval.
Accepted for The 10th IASTED International Conference on Signal and Image Processing, August 18 – 20, 2008, USA
Subsampling-based image watermarkng in compressed DCT domain
Authors: Ibrahim Nasir , Ying Weng, Jianmin Jiang and Stan Ipson
Abstract: In this paper, a new embedding strategy for watermarking is presented based on DC components of subimages in compressed discrete cosine transform (DCT) domain. These subimages are obtained through subsampling the host image. More robustness has been achieved when watermarks are embedded in perceptually significant DC components. Furthermore, the original image is not required in the extraction process. Experimental results show that the proposed scheme successfully makes the watermark perceptually invisible and robust for a wide range of attacks, including JPEG-loss compression, filtering, scaling, and cropping attacks.
Accepted for The 10th IASTED International Conference on Signal and Image Processing, August 18 – 20, 2008, USA
Scientific Papers from 2007
D5.2 Report On Live Human Annotation
Authors: Carsten Rosche, Christian Eckes, Felix Zielke, Matthias Aust, Richard Wages, Stefan Gruenvogel, Sven Hoffmann
Abstract:
This document reports on human annotation within the LIVE project. First it gives an overview about different annotation types that are useful for the LIVE staging of media events. It then summarizes the requirements for manual annotation by collecting results from potential users, e.g. from discussions performed with broadcasters, reporters, editors and video jockeys (VJs). It defines the necessary content metadata needed within the LIVE system, gives an overview over existing tools and describes the tools developed for the LIVE project. Finally, user evaluations of the developed tools that were performed with professional users from the ORF are described at the end of the document.
LIVE Public Annual Report 2007
Authors: John Pereira, Marion Borowski
Abstract:
This public annual report takes a look at some of the major activities and achievements of LIVE in 2007.
Specifications of Concepts and Professional User Interfaces for Live Staging with Consumer Feedback
Authors: Richard Wages, Stefan M. Gruenvogel, Tilen Mlakar, Torsten Kliemand
Abstract:
In this document we specify concepts for Live Staging and the planning for Live Staging. In addition – where possible – procedures are defined which facilitate the development of a concrete Live Staging Concept by bringing the primarily artistic and intuitive approaches for the creation of live stories to a more formal level.
Basic Specification of the Intelligent Media Framework (D7.4)
Authors: Dietmar Glachs, Georg Guentner, Rupert Westenthaler, Tobias Buerger
Abstract:
The objective of this report is to provide a synopsis of the basic specification of the Intelligent Media Framework as developed during the first iteration cycle of the LIVE project (from January 2006 to June 2007). The implementation of this specification formed the middleware of the first prototype of the LIVE production support system. The Intelligent Media Framework is introduced to be based on a combination of a classical three-tier architecture with the principles of service oriented architectures (SOA). This report is made available for and addressed to, the interested public. It presupposes some basic knowledge of software and knowledge engineering as well as some understanding of broadcasting issues. Topics covered in this report are: - An overview of the architecture of the LIVE production support system - An overview of the LIVE staging domain and the requirements of different agents in the LIVE staging process - The knowledge and the framework requirements of the Intelligent Media Framework - The knowledge model of the Intelligent Media Framework comprising the knowledge structure, the term model, the event domain model and the basic IMA model. - The architecture of the Intelligent Media Framework (system design). - Initial conclusions and an assessment of the requirements.
Description of Online and Offline Metadata Extraction out of Sports Videos
Authors: Ying Weng
Abstract:
We focus on online and offline metadata extraction and annotation out of sports videos. The main benefit of our method is immediate and automatic extraction and annotation of metadata by giving semantics to combinations of heterogeneous low-level visual features. It brings new opportunities for efficient utilisation of sports video in improved ways, and is easily customized to address the characteristics. Firstly, semantic scene classification is described, including key-frames extraction, similarities determination between shots, and rule based estimation of scene boundaries. Secondly, fuzzy logic based categorizing is presented, including paradigm, Fuzzy membership function, and fuzzy feature generation and similarity measure. Thirdly, automatic sports video annotation is proposed, including robust dominant colour region detection, combined motion feature analysis. This work has been evaluated in the TRECVID 2007 competition.
Public Synopsis Market Report on Video and iTV Technologies
Authors: David Salama, Marta Sedano
Abstract:
This report tries to identify the companies, markets and environments surrounding the iTV industry. The report classifies the main actors depending on the nature of their product or service technology. In addition to the short description of each technology area, this document also includes chapter subdivisions analysing the possible implications of technology for the LIVE project.
Public Synopsis Market Report on Digital Interactive TV
Authors: David Salama, Marta Sedano
Abstract:
This Public Synopsis on Interactive Digital Television provides an overview of the iDTV market including a short background introduction and outline of interactive features. The different sections in which the document is divided cover the different types of existing services and solutions as well as market trends.
Semantic-based Realization of Novel iTV Formats for the Broadcasting of Media Events
Authors: Christian Eckes, Felix Zielke, Georg Guentner, Janez Zaletelj, Richard Wages, Rupert Westenthaler, Tobias Buerger
Abstract: Broadcasting of media events is a real-time action demanding reliable just in time decisions based on the current content of incoming video streams and the availability of background material. Novel iTV formats for broadcasting this type of event thus demand monitoring of multiple streams and background material. Due to the potentially large amount of streams and other available material, manual monitoring is likely to fail on the long term. We therefore developed an indexing pipeline based on semantic technologies that enables real-time analysis of broadcasted streams and reliable content recommendations of streams and background material based on formal machine understandable descriptions of content. Our approach enables real-time interpretation of broadcasted streams and thus establishes a bridge over the “Semantic Gap” in video analysis.
Presented at The Second International Conference on Semantic and Digital Media Technologies, 5-7 December 2007, Genova, Italy. See http://samt2007.ge.imati.cnr.it/
Progressive content access to databases of JPEG compressed images
Authors: J. Jiang, G. Feng, and Y. Yin
Abstract: Progressive content access provides a mode that allows a coarse version of an image being viewed at a lower computing cost and then gradually refined by subsequent resolution enhancement if required. This proves extremely useful when millions of compressed images and video sequences need to be browsed manually or processed in pixel domain, saving the cost and removing the necessity of full decompression. In this paper, we propose such a progressive content access algorithm suitable for all DCT-based JPEG and MPEG compressed files. We first develop a theoretical model in approximation of cosine function used in IDCT with various orders. Following that, we then propose a progressive content access algorithm, which comprehends both the successive approximation and the spectral selection. Further analysis and experiments are reported to show that our proposed algorithm saves computational cost in comparison with JPEG full decompression. Extensive experiments also support that the proposed algorithm achieves encouraging PSNR values for reconstructed images even with lower order approximation.
IET Image Processing, Vol 1, No 2, 2007, pp 207-214, ISSN: 1751-9667;
Robustness Analysis on Facial Image Description in DCT Domain
Authors: Jianmin Jiang, Guocan Feng
Abstract: In this letter, we report a DCT domain analysis of facial images to reveal that, when certain number of DCT coefficients are removed, the corresponding facial image description by the remaining DCT coefficients becomes robust to lighting changes and scale variations. Such nice properties would be very useful for applications of face recognition, video object tracking, object segmentation and visual content processing.
Electronics Letters Volume 43, Issue 24, Nov. 22, 2007
Real-time shot cut detection in compressed domain
Authors: J. Jiang, Z. Li, G. Xiao and J. Chen
Abstract: In this short paper, we propose a fast and simple shot cut detection algorithm, which directly operates in compressed domain and suitable for real-time implementation. The proposed algorithm exploits the existing MPEG techniques by examining the prediction status for each macro-block inside B frames and P frames. As a result, locating both abrupt and dissolved shot cuts is operated by a sequence of comparison tests, and thus no feature extraction or histogram differentiation is needed. Although the description of the algorithm is primarily based on MPEG-1 and MPEG-2 streams, the scheme can be readily extended to other video compression standards such as MPEG-4 and H.264 by following the principle on monitoring: (i) balance between forward prediction and backward prediction; and (ii) boundaries among P, B and I frames. Extensive experiments illustrate that the proposed algorithm outperforms similar existing algorithm, providing a useful technique for fast and on-line video content processing.
Accepted for publication in: Journal of Electronic Imaging, SPIE
A Block-Edge-Pattern based Content Descriptor in DCT Domain
Authors: Jianmin Jiang, Kaijin Qiu and Guoqiang Xiao
Abstract: In this correspondence, we describe a robust and effective content descriptor based on block edge patterns extracted directly in DCT domain, which is suitable for applications in JPEG or MPEG compressed images and videos. This content descriptor is constructed by a run-length edge-block histogram with three patterns including horizontal edge, vertical edge and no-edge. In comparison with existing descriptors, the proposed features: (i) low-cost computing suitable for real-time implementation and high-speed processing of compressed images or videos; (ii) robust to orientation changes such as rotation, noise, reverse etc. (iii) directly operates in compressed domain. Extensive experiments support that the proposed content descriptor is effective in describing visual content. In comparison with existing techniques, the proposed descriptor achieves superior performances in terms of retrieval precision and recall rates.
Accepted for publication in: IEEE Transactions on Circuits, Systems and Video Technology
Subspace Extension to Phase Correlation Approach for Fast Image Registration
Authors: Jinchang Ren, Theodore Vlachos, Jianmin Jiang
Abstract: A novel extension of phase correlation to subspace correlation is proposed, in which 2-D translation is decomposed into two 1-D motions thus only 1-D Fourier transform is used to estimate the corresponding motion. In each subspace, the first two highest peaks from 1-D correlation are linearly interpolated for subpixel accuracy. Experimental results have shown both the robustness and accuracy of our method.
ICIP2007, vol. I, 481-484
Statistical Classification of Skin Color Pixels from MPEG Videos
Authors: Jinchang Ren, Jianmin Jiang
Abstract: Detection and classification of skin regions plays important roles in many image processing and vision applications. In this paper, we present a statistical approach for fast skin detection in MPEG-compressed videos. Firstly, conditional probabilities of skin and non-skin are extracted from manual marked training images. Then, candidate skin pixels are identified using the Bayesian maximum a posteriori decision rule. An optimal threshold is then obtained by analysis of probability error on the basis of the likelihood ratio histogram of skin and nonskin pixels. Experiments from sequences with varying illuminations have demonstrated that effectiveness of our approach.
ACIVS 2007: 395-405
Compressed-domain Shot Boundary Detection using Finited State Machine and Content-based Rules
Authors: Juan Chen, Jinchang Ren, Jianmin Jiang
Abstract: We propose a fast and systematic method for shot boundary detection in compressed domain using content-based rules and FSM (finite state machine). Firstly, several feature indicators are acquired from DC images in MPEG videos including luminance, color, edge, prediction error and inter-frame difference as well as motion. Then, several content-based rules are utilized to detect abrupt cuts. Thirdly, boundaries of gradual transitions are determined by a coarse to fine procedure with a pre-processing module and a FSM. According to the experiments using publicly available sequences from TRECVID, the results have showed that the proposed algorithm outperforms the representative existing algorithms in both precision rate and recall rates.
Asia-Pacific Workshop on Visual Information Processing, 2007, pp. 137-142
Recognition of JPEG Compressed Face Images Based on AdaBoost
Authors: Chunmei Qing and Jianming Jiang
Abstract: This paper presents an advanced face recognition system based on AdaBoost algorithm in the JPEG compressed domain. First, the dimensionality is reduced by truncating some of the block-based DCT coefficients and the nonuniform illumination variations are alleviated by discarding the DC coefficient of each block. Next, an improved AdaBoost.M2 algorithm which uses Euclidean Distance(ED) to eliminate non-effective weak classifiers is proposed to select most discriminative DCT features from the truncated DCT coefficient vectors. At last, the LDA is used as the final classifier. Experiments on Yale face databases show that the proposed approach is superior to other methods in terms of recognition accuracy, efficiency, and illumination robustness.
SAMT 2007, LNCS 4816, pp. 272–275, 2007.
A New Robust Watermarking Scheme for Color Image in Spatial Domain
Authors: Ibrahim Nasir, Ying Weng, and Jianmin Jiang
Abstract: This paper presents a new robust watermarking scheme for color image based on a block probability in spatial domain. A binary watermark image is permutated using sequence numbers generated by a secret key and Gray code, and then embedded four times in different positions by a secret key. Each bit of the binary encoded watermark is embedded by modifying the intensities of a non-overlapping block of 8*8 of the blue component of the host image. The extraction of the watermark is by comparing the intensities of a block of 8*8 of the watermarked and the original images and calculating the probability of detecting '0' or '1'. Tested by benchmark Stirmark 4.0, the experimental results show that the proposed scheme is robust and secure against a wide range of image processing operations.
Accepted by IEEE THE THIRD INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET–BASED SYSTEMS (SITIS' 2007 )
Camera Motion Analysis towards Semantic-based Video Retrieval in Compressed Domain
Authors: Ying Weng, and Jianmin Jiang
Abstract: To reduce the semantic gap between low-level visual features and the richness of human semantics, this paper proposes new algorithms, by virtue of the combined camera motion descriptors with multi-threshold, to automatically retrieve the semantic concepts, i.e., close-up, and panorama, directly in MPEG compressed domain based on camera motion analysis. Extensive experiments illustrate that the proposed algorithms provide promising retrieval results under real-time application scenario and without human intervention
International conference on Semantics And digital Media Technologies (SAMT)2007, LNCS 4816, pp. 276–279, 2007.
Face Detection based on Skin Color in Image by Neural Networks
Authors: Aamer S.S. Mohamed, Ying Weng, Stan S Ipson, and Jianmin Jiang
Abstract: Face detection is one of the challenging problems in the image processing. A novel face detection system is presented in this paper. The approach relies on skin-based color features extracted from two dimensional Discreate Cosine Transfer (DCT) and neural networks, which can be used to detect faces by using skin color from DCT coefficient of Cb and Cr feature vectors. This system contains the skin color which is the main feature of faces for detection, and then the skin face candidate is examined by using the neural networks, which learn from the feature of faces to classify whether the original image includes a face or not. The processing is based on normalization and Discreate Cosin Transfer. Finally the classification based on neural networks approch. The experiment results on upright frontal color face images from the internet show an excellent detection rate.
Accepted by IEEE International Conference on Intelligent and Advanced Systems (ICIAS2007)
Real-time and Automatic Close-up Retrieval from Compressed Videos
Authors: Ying Weng and Jianmin Jiang
Abstract: In this paper, we propose a thorough scheme, by virtue of camera zooming descriptor with two-level threshold, to automatically retrieve close-ups directly from MPEG compressed videos based on camera motion analysis. In the retrieval process, we build camera-motion-based semantic retrieval. To improve the coverage of the proposed scheme, we investigate close-up retrieval in all kinds of videos. Extensive experiments illustrate that the proposed scheme provides promising retrieval results under real-time and automatic application scenario.
13th International Conference on Automation and Computing, 15 September 2007, Staffordshire University, Stafford, UK
A Novel System for Interactive Live TV
Authors: Stefan Gruenvogel, Richard Wages, Tobias Buerger, and Janez Zaletelj
Abstract: For the production of interactive live TV formats, new content and
new productions workflows are necessary. We explain how future content of a parallel multi-stream production of live events may be created from a design and a technical perspective. It is argued, that the content should be arranged by dramaturgical principles taking into account the meaning of the base material. To support the production process a new approach for content recommendation is described which uses semantic annotation from audio-visual material and user feedback.
Published in Lizhuang Ma, Matthias Rauterberg, Ryohei Nakatsu (Eds.): Entertainment Computing - ICEC 2007, 6th International Conference, Shanghai, China, September 15-17, 2007, Proceedings. Lecture Notes in Computer Science 4740 Springer 2007
Next Generation Live iTV Formats and Aesthetics: A Joint Scientific and Artistic Approach
Authors: Richard Wages, Stefan Gruenvogel, Georg Guentner, and Janez Zaletelj
Abstract: In this work we describe our vision of next generation live iTV formats, which consist of a broadcast bouquet of meaningfully interwoven and cross- referencing parallel sub-channels covering a single live event in multi-view or even several concurring live events. At the same time these sub-channels have to serve a diversity of consumer interests and moods. To achieve this future live broadcasting production teams will need both, artistic and conceptual patterns to create such live formats, as well as a maximum technological support to realise such shows in real-time. We describe our developed foundational components of the envisaged system, which according to our approach consist of a set of live staging concepts, a framework for the management of intelligent media assets and a recommender system. Furthermore we argue for the integration of knowledge and sophistication achieved by live audio-visual artists, namely Video Jockeys, who hence participated in our first live staging tests. We conclude with a short description of our next step,
the integration of instant consumer feedback, to complete the system for the upcoming field trial during the 2008 Olympics.
Proceedings of the eChallenges e-2007 <http://echallenges.org/e2007/>: Conference & Exhibition, seventeenth in a series of technology research conferences supported by the European Commission, 24-26 October 2007, The Hague, The Netherlands.
Journal Paper JVRB 2007 Vol.4: Video Composer and Live Video Conductor: Future Professions for the Interactive Digital Broadcasting Industry
Authors: Richard Wages, Carmen Mac Williams, Stefan M. Gruenvogel, and Georg Trogemann
Abstract: Innovations in hardware and network technologies lead to an exploding number of non-interrelated parallel media streams. Per se this does not mean any additional value for consumers. Broadcasting and advertisement industries have not yet found new formats to reach the individual user with their content. In this work we propose and describe a novel digital broadcasting framework, which allows for the live staging of (mass) media events and improved consumer personalisation. In addition new professions for future TV production workflows which will emerge are described, namely the 'video composer' and the 'live video conductor'.
Online publication: http://www.jvrb.org/4.2007/1076/
Content Recommendation System in the Production of Multi-Channel TV Programs
Authors: Janez Zaletelj, Richard Wages, Tobias Buerger, Stefan M. Gruenvogel
Abstract: This paper presents the concept of content recommendations for the production of multi-channel TV shows. Within the 6th Framework project “LIVE –Live Staging of Media Events” we are developing a production support system which will have a functionality of content recommendations and will support production of multi-channels programs. The paper outlines a concept of a multi-channel show and presents a possible workflow scenario on how to use content recommendations in the production. The details of the semantic content annotations are given and an example on computation of personalized recommendation of archive content is presented.
The paper was submitted to the 3rd International Conference on Automated Production of Cross Media Content for Multi-channel Distribution (AxMedis 2007), www.axmedis.org/axmedis2007.
The evaluation of a hybrid recommender system for recommendation of movies
Authors: Tomaz Pozrl and Matevz Kunaver and Matevz Pogacnik and Jurij F. Tasic
Absract: In this paper we present our approach to generation of movie recommendations. The idea of our hybrid approach is to first separately generate predicted ratings for movies using the content based and collaborative recommender modules. Predicted ratings from both recommender engines are then combined into final classification by the hybrid recommender using weighted voting scheme. The basis for the calculations are Pearson’s correlation coefficient, True Bayesian prediction and M5Rules decision rules. The evaluation of the system performance was based on the EachMovie data corpus, for around 7000 users. Preliminary results show that this approach works really well, while there is still some room for improvement.
Scientific Papers from 2006
Annual LIVE Public Report 2006
Authors: Joachim Koehler
Abstract:
This document gives a report on the work and the results of the first year (2006) of the project.
State of the Art Report Intelligent Media Framework
Authors: Albert J.Cruz, Christian Eckes, Felix Zielke, Georg Guentner, Gerhard Stanz, Janez Zaletelj, Rupert Westenthaler, Tobias Buerger, Wernher Behrendt
Abstract:
The integrated project “LIVE Staging of Media Events” (LIVE; FP6-27312) aims at the creation of novel intelligent content production methods and tools for interactive digital broadcasters to stage live media events in the area of sports, such as the 2008 Olympic Games. This report presents the state of the art of the concepts, technologies and standards related to one of the core subsystems developed in the LIVE project: The “Intelligent Media Framework” provides a robust framework for the creation, management and delivery of so called “Intelligent Media Assets” under real-time conditions.
Methods, Design Guidelines, Workflows for Online Staging
Authors: Jaanis Garancs, Richard Wages, Stefan Gruenvogel
Abstract:
After describing the conceptual background which is necessary for the development of future live staging TV formats, this document proposes both visionary as well as first concrete methods and design guidelines for online staging. In addition considerations on the respective future workflows and the results of a first survey on suitable live video performance tools are presented.
Public Synopsis on the Basic LIVE System Architecture
Authors: Carmen Mac Williams, Dietmar Glachs, Felix Zielke, Heike Fischer, Janez Zaletelj, Maria Schubert, Matevz Pogacnik, Oliver Lucht, Richard Wages, Rupert Westenthaler
Abstract:
The goal of this public deliverable is to provide a high-level overview of the idea of the LIVE project and its basic system architecture. The description goes to the level of detail that is needed to understand the basic architecture. For more detailed descriptions, particularly of the subsystems, the reader is referred to the respective subsystem deliverables. The described first basic system architecture of this document was developed including the results of the first six months of research within the LIVE project. Derived from the basic idea of a system - whereby an interactive digital broadcaster should be able to create a non-linear multi-stream video show in real-time, which changes due the consumers’ interests - first user tests were made and analyzed at the public Austrian broadcaster ORF (Österreichischer Rund-funk). These tests resulted in a set of initial requirements (compare deliverable D9.1 “Results from the initial requirement analysis”). Based on these requirements, actors of the LIVE Sys-tem and their basic use cases were identified. This finally results in the basic system architec-ture which is briefly described in this deliverable. The target audience for this document is any person inside or outside of the LIVE project in-terested in learning about the proposed functionality and architecture of LIVE.
Public Synopsis on Initial LIVE Exploitation Plan
Authors: David Salama
Abstract:
This document, Public Synopsis on Initial LIVE Exploitation Plan, is the first step to prepare the LIVE consortium for the exploitation of the project’s results. Following this aim, this document provides a first overview on exploitable results and some related market information.
Future Live iTV Production: Challenges and Opportunities
Authors: Richard Wages, Stefan M. Gruenvogel, Janez Zaletelj, Carmen Mac Williams Georg Trogemann
Abstract: Today's TV broadcasting companies are highly professionalized in the production of linear TV formats. Workflows and technologies for these linear formats are reliable, the production personnel is highly skilled and we can trust in well-known viewing habits of the consumers. The key issue of this paper is: how do we enable such a broadcasting working environment to produce by far more variable, multiperspective or interactive TV formats? We are especially interested in formats entailing a multitude of live audiovisual material like for example sport events or elections, which shall be transformed into an interactive TV event for the consumer at home. This paper is not concerned with the variety of technical problems the interactive TV paradigm leads to, but with questions of future tools and practices on the producers' side, levels of consumer personalization and the respective consumer interfaces to make digital content accessible.
Proceedings of the AXMEDIS 2006: Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution, December 13 - 15, 2006, Leeds, UK.
EU-IST Project Live: Live Staging of Media Events
Authors: Joachim Koehler, Richard Wages, Carmen Mac Williams, and Heike Fischer
This paper gives an overview, the main objectives and the first results of the EU IST project Live. This integrated project will investigate new methods and formats for authoring and production of live sports events in a professional broadcast environment. To achieve these new interactive TV formats the processing of different types of audio-visual (A/V) material must be enhanced to allow for a more sophisticated selection and advanced authoring of sports content. This live staging process will generate several output streams which are interlinked with each other. The consumers have the benefit that they can select and receive video material in a more personalized manner.
Paper presented at the SAMT 2006: First International Conference on Semantics and Digital Media Technology, December 6 - 8, 2006, Athens, Greece.
Video Composer and Live Video Conductor: Future Professions for the Interactive Digital Broadcasting Industry
Authors: Richard Wages, Carmen Mac Williams, Stefan M. Gruenvogel, Georg Trogemann
Abstract: Innovations in hardware and network technologies lead to an exploding number of non-interrelated parallel media streams. Per se this does not mean any additional value for consumers. Broadcasting and advertisement industries have not yet found new formats to reach the individual user with their content. In this work we propose and describe a novel digital broadcasting framework, which allows for the live staging of (mass) media events and improved consumer personalisation. In addition new professions for future TV production workflows which will emerge are described, namely the 'video composer' and the 'live video conductor'.
EuroITV 2006 – Beyond Usability, Broadcast, and TV: Fourth European Conference on Interactive Television, Athens, Greece, 25-26 May, 2006, Proceedings, pp. 32-38
Mind the gap - Requirements for the combination of content and knowledge
Authors: Tobias Buerger and Rupert Westenthaler
Abstract: Semantic enrichment of content can be done manually, which is expensive, or automatically, which is error-prone. In particular, automatic semantic enrichment must be aware of the gap between the semantics that are directly retrievable from the content and those which can be inferred within a given interpretative context. We report on a model for content and knowledge which distinguishes between three descriptive levels: information relating directly to the resource, to the metadata of the resource and to the subject matter addressed by the content. This model addresses five fundamental requirements for automation: formality, interoperability, multiple interpretations, contextualization, and independence of knowledge items from the resource's content.
SAMT Conference 2006
An Intelligent Media Framework for Multimedia Content
Authors: Tobias Buerger
Abstract: Search, retrieval and navigation in multimedia repositories is a task common to all multimedia management systems: Users are supported by a wide range of features which are traditionally based on full text search and metadata queries. However generating metadata is an error-prune and work-intensive task, that for multimedia content cannot yet be made fully automatically. In this position paper we describe our vision of an Intelligent Media Framework that is capable of combining metadata and knowledge about media items in order to support user orientation, search and retrieval in media-rich information spaces: We try to integrate heterogeneous sources to create an Intelligent Media Framework containing Intelligent Media Objects carrying behavioural knowledge and capable of fully describing themselves. The properties of these objects amongst others serve to the fact that users more likely search by the ”meaning” of audiovisual objects and what is represented by them respectively, than by their pure low-level features.
A Management System for Distributed Knowledge and Content Objects
Authors: Wernher Behrendt, Nitin Arora, Tobias Buerger, Rupert Westenthaler
Abstract: We present the results of a European research project which developed specifications for so-called Knowledge Content Objects (KCO) and for an attendant infrastructure, the Knowledge Content Carrier Architecture (KCCA). The work addresses the problem that while there are many standards for content and for meta data, there is at present, no suitable framework that enables organizations to manage knowledge alongside content, in a coherent manner. Our approach postulates the KCO as a common structural entity which can be recognised and manipulated by a KCCA enabled system.
AXEMDIS Conference 2006
Smart Content – Scenarios and Technologies for a Knowledge-based Audiovisual Archive
Authors: Georg Gunther, Tobias Bueger, Erich Gams
Abstract: In our paper we present the intermediate results of a project aiming at the creation of a knowledge-based infrastructure for search and navigation in audiovisual repositories. The approach is based on highly automated media processing and is therefore specifically targeted to historically grown archives (broadcasters, uni-versities, public and corporate media archives, etc.) lacking the time and/or the finan-cial means to manually annotate their digital media assets. In the project a conceptual architecture was developed to meet the requirements of a set of knowledge-intensive user scenarios for the utilization of rich media content in the B2B and B2C areas. Pluggable RDF knowledge components act as a link between a semantic indexing and knowledge-based navigation.
eChallenges Conference 2006
The Role of MPEG-7 in Semantic Annotation and the Cross-Media Publishing Process
Authors: Tobias Buerger, Georg Guentner, Erich Gams
Abstract: During the development of a knowledge-based audio- visual information system the authors of this article defined a conceptual system architecture based on MPEG-7 as the general description scheme for the media assets in the middleware. This concept was not only used to achieve a high abstraction and independence of the underlying media asset management system, it was also and primarily used as the basis of a semantic indexing process. Based on lightweight ontologies the descriptions of the media assets were associated with semantic concepts. Semantically annotated MPEG-7 assets were then propagated to the presentation layer, thus allowing the implementation of a variety of publication scenarios, including crossmedia scenarios for the creation of concise video summaries.
AXEMDIS Conference 2006
Radio Relief: Radio Archives Departments Benefit from Digital Audio Processing
Authors: Martha Larson, Thomas Beckers and Volker Schlögell
Abstract: The archives departments of radio broadcasters are currently facing face two significant challenges, namely, how to store rapidly increasing amounts of radio content, and how to satisfy the rising demand for easy retrieval of audio clips that can be recycled into new programs. A pilot project demonstrates that digital audio processing techniques have the potential to provide much-needed support.
Appeared in the ERCIM News No. 66, July 2006
Smart Content Factory – Approaching the Vision
Authors: Georg Guentner, Siegfried Reich
Abstract: In this paper we describe the objectives and achievements in developing the vision of a “Smart Content Factory”. The “Smart Content Factory” aims at the creation of a knowledge-aware system infrastructure to improve the utilization (re-use and adaptation) of audiovisual content. We will provide an overview of the project objectives and introduce “digital content engineering” as a scientific discipline dealing with concepts, methodologies, techniques and tools for a quantifiable approach towards the vision of smart content, thereby addressing future scenarios of electronic publishing, especially for embedded publishers. We will further take a look at the user and system requirements of the “Smart Content Factory” and their impact on the architecture of the system prototype.
A fuzzy logic approach for detection of video shot boundaries
Authors: Hui Fang, Jianmin Jiang,Yue Feng
Abstract: Video temporal segmentation is normally the first and important step for content-based video applications. Many features including the pixel difference, colour histogram, motion, and edge information etc. have been widely used and reported in the literature to detect shot cuts inside videos. Although existing research on shot cut detection is active and extensive, it still remains a challenge to achieve accurate detection of all types of shot boundaries with one single algorithm. In this paper, we propose a fuzzy logic approach to integrate hybrid features for detecting shot boundaries inside general videos. The fuzzy logic approach contains two processing modes, where one is dedicated to detection of abrupt shot cuts including those short dissolved shots, and the other for detection of gradual shot cuts. These two modes are unified by a mode-selector to decide which mode the scheme should work on in order to achieve the best possible detection performances. By using the publicly available test data set from Carleton University, extensive experiments were carried out and the test results illustrate that the proposed algorithm outperforms the representative existing algorithms in terms of the precision and recall rates.
Appeared in the Journal of the Pattern Recognition Society www.elsevier.com/locate/patcog
Video Indexing and Retrieval in Compressed Domain Using Fuzzy-Categorization
Authors: Hui Fang, Rami Qahwaji, and Jianmin Jiang
Abstract: There has been an increased interest in video indexing and retrieval in recent years. In this work, indexing and retrieval system of the visual contents is based on feature extracted from the compressed domain. Direct possessing of the compressed domain spares the decoding time, which is extremely important when indexing large number of multimedia archives. A fuzzy categorizing structure is designed in this paper to improve the retrieval performance. In our experiment, a database that consists of basketball videos has been constructed for our study. This database includes three categories: full court match, penalty and close-up. First, spatial and temporal feature extraction is applied to train the fuzzy membership functions using the minimum entropy optimal algorithm. Then, the max composition operation is used to generate a new fuzzy feature to represent the content of the shots. Finally, the fuzzy-based representation becomes the indexing feature for the content-based video retrieval system. The experimental results show that the proposal algorithm is quite promising for semantic-based video retrieval.
G. Bebis et al. (Eds.): ISCV 2006, LNCS 4292, pp. 1143 – 1150, 2006. Copyright Springer-Verlag Berlin Heidelberg 2006
An Effective and Fast Scene Change Detection Algorithm for MPEG Compressed Videos
Authors: Z. Li, J. Jiang, G. Xiao, and H. Fang
Abstract: In this paper, we propose an effective and fast scene change detection algorithm directly in MPEG compressed domain. The proposed scene change detection exploits the MPEG motion estimation and compensation scheme by examining the prediction status for each macro-block inside B frames and P frames. As a result, locating both abrupt and dissolved scene changes is operated by a sequence of comparison tests, and no feature extraction or histogram differentiation is needed. Therefore, the proposed algorithm can operate in compressed domain, and suitable for real-time implementations. Extensive experiments illustrate that the proposed algorithm achieves up to 94% precision for abrupt scene change detection and 100% for gradual scene change detection. In comparison with similar existing techniques, the proposed algorithm achieves superiority measured by recall and precision rates.
A. Campilho and M. Kamel (Eds.): ICIAR 2006, LNCS 4141, pp. 206 – 214, 2006. Copyright Springer-Verlag Berlin Heidelberg 2006
DCT-Domain Image Retrieval Via Block-Edge-Patterns
Authors: K.J. Qiu, J. Jiang, G. Xiao, and S.Y. Irianto
Abstract: A new algorithm for compressed image retrieval is proposed in this paper based on DCT block edge patterns. This algorithm directly extract three edge patterns from compressed image data to construct an edge pattern histogram as an indexing key to retrieve images based on their content features. Three feature-based indexing keys are described, which include: (i) the first two features are represented by 3-D and 4-D histograms respectively; and (ii) the third feature is constructed by following the spirit of run-length coding, which is performed on consecutive horizontal and vertical edges. To test and evaluate the proposed algorithms, we carried out two-stage experiments. The results show that our proposed methods are robust to color changes and varied noise. In comparison with existing representative techniques, the proposed algorithms achieves superior performances in terms of retrieval precision and processing speed.
A. Campilho and M. Kamel (Eds.): ICIAR 2006, LNCS 4141, pp. 673 – 684, 2006. Copyright Springer-Verlag Berlin Heidelberg 2006
Constrained Region-Growing and Edge Enhancement Towards Automated Semantic Video Object Segmentation
Authors: L. Gao, J. Jiang, and S.Y. Yang
Abstract: Most existing object segmentation algorithms suffer from a so-called under-segmentation problem, where parts of the segmented object are missing and holes often occur inside the object region. This problem becomes even more serious when the object pixels have similar intensity values as that of backgrounds. To resolve the problem, we propose a constrained region-growing and contrast enhancement to recover those missing parts and fill in the holes inside the segmented objects. Our proposed scheme consists of three elements: (i) a simple linear transform for contrast enhancement to enable stronger edge detection; (ii) an 8-connected linking regional filter for noise removal; and (iii) a constrained region-growing for elimination of those internal holes. Our experiments show that the proposed scheme is effective towards revolving the undersegmentation problem, in which a representative existing algorithm with edgemap based segmentation technique is used as our benchmark.
J. Blanc-Talon et al. (Eds.): ACIVS 2006, LNCS 4179, pp. 323 – 331, 2006. Copyright Springer-Verlag Berlin Heidelberg 2006
Adding Lossless Video Compression to MPEGs
Authors: Jianmin Jiang and Guoqiang Xiao
Abstract: In this correspondence, we propose to add a lossless compression functionality into existing MPEGs by developing a new context tree to drive arithmetic coding for lossless video compression. In comparison with the existing work on context tree design, the proposed algorithm features in 1) prefix sequence matching to locate the statistics model at the internal node nearest to the stopping point, where successful match of context sequence is broken; 2) traversing the context tree along a fixed order of context structure with a maximum number of four motion compensated errors; and 3) context thresholding to quantize the higher end of error values into a single statistics cluster. As a result, the proposed algorithm is able to achieve competitive processing speed, low computational complexity and high compression performances, which bridges the gap between universal statistics modeling and practical compression techniques. Extensive experiments show that the proposed algorithm outperforms JPEG-LS by up to 24% and CALIC by up to 22%, yet the processing time ranges from less than 2 seconds per frame to 6 seconds per frame on a typical PC computing platform.
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 8, NO. 3, JUNE 2006
Analysis of Cluttered Scenes Using an Elastic Matching Approach for Stereo Images
Authors: Christian Eckes, Jochen Triesch,and Christoph von der Malsburg
Abstract: We present a system for the automatic interpretation of cluttered scenes containing multiple partly occluded objects in front of unknown, complex backgrounds. The system is based on an extended Elastic Graph Matching algorithm that allows to explicitly model partial occlusions. Our approach extends an earlier system in two ways. First, we use Elastic Graph Matching in stereo image pairs to increase matching robustness and disambiguate occlusion relations. Second, we use richer feature descriptions in the object models by integrating shape/texture with color features. We demonstrate that the combination of both extensions substantially increases recognition performance. The system learns about new objects in a simple one-shot learning approach. Despite the lack of statistical information in the object models and the lack of an explicit background model, our system performs surprisingly well for this very difficult task. Our results underscore the advantages of view-based feature constellation representations for difficult object recognition problems.
Unsupervised Speaker Clustering using Global Similarity and Fo Features
Authors: Konstantin Biatov, Martha Larson
Abstract: This paper investigates an unsupervised speaker clustering approach that exploits global similarity and also proposes extending the standard cepstal feature set used for speaker clustering with prosodic features, extracted from F0. The global-similarity based speaker clustering algorithm, initially proposed by the authors in [6], leverages the insight that audio segments within a single cluster are not only similar to one another, but also display the same patterns of similarities and differences with audio segments belonging to all other clusters. First, speaker clustering performance using the standard Bayesian Information Criterion (BIC) is compared to the performance achieved using a BIC-based algorithm incorporating global similarity. Then, both clustering techniques are tested using an extended feature set including F0-derived features in addition to the standard cepstral features. The evaluation, which is performed on data recorded from German language radio, shows the clear benefits of using global information when performing clustering. It also demonstrates that in most cases F0-features outperform the cepstralonly feature set both in standard BIC clustering and in the BIC global-similarity-based approach.
MPEG-2 Compressed-Domain Algorithms for Video Analysis
Authors: Wolfgang Hesseler and Stefan Eickeler
Abstract: This paper presents new algorithms for extracting metadata from video sequences in the MPEG-2 compressed domain. Three algorithms for efficient low-level metadata extraction in preprocessing stages are described. The first algorithm detects camera motion using the motion vector field of an MPEG-2 video. The second method extends the idea of motion detection to a limited region of interest, yielding an efficient algorithm to track objects inside video sequences. The third algorithm performs a cut detection using macroblock types and motion vectors.
Hindawi Publishing Corporation EURASIP Journal on Applied Signal Processing Volume 2006, Article ID 56940, Pages 1–11 DOI 10.1155/ASP/2006/56940
Scientific Papers from 2005
Speaker Clustering via Bayesian Information Criterion using a Global Similarity Constraint
Authors: Konstantin Biatov, Martha Larson
Abstract: In this paper we proposed a global similarity constraint that improves speaker clustering as standardly performed using the Bayesian Information Criterion (BIC). The novelty of our approach lies in the fact that it exploits the hypothesis that audio segments belonging to the same speaker cluster should demonstrate similar global behavior, i.e. exhibit approximately the same pattern of similarity and dissimilarity with the all other segments. Every segment is represented by a global similarity vector whose components encode the BICbased local similarity between that segment and each of the other segments to be clustered. Speaker clustering is performed bottom up using the BIC to compare each pair of segments and determine if their similarity is high enough to merge them. We use the global similarity vectors to constrain merging to segment pairs that have approximately the same patterns of global similarity. The evaluation, performed on audio data from 4 different German-language radio programs, shows that the proposed approach represents an improvement on the standard BIC clustering.
Smart Content Factory - Assisting Search for Digital Objects by Generic Linking Concepts to Multimedia Content
Authros: Tobias Buerger, Erich Gams, Georg Guentner
Abstract: Search, retrieval and navigation in audiovisual repositories is a task common to all media asset management systems: Users are supported by a wide range of features which are traditionally based on full text search and metadata queries. In this paper we describe an approach to superimpose a semantic indexing infrastructure over the media assets and the metadata associated with them. The infrastructure is based on formal knowledge models and facilitates the use of further navigation dimensions: By identifying semantic concepts we are able to create a dynamic navigation structure which is based on the underlying knowledge model and the conceptual relations defined therein.
VIRTUAL PERSONALISED CHANNELS: VIDEO CONDUCTING OF FUTURE TV BROADCASTING
Authors: Carmen Mac Williams, Richard Wages, Stefan M. Gruenvogel, Georg Trogemann
Abstract: Television is undergoing a historical change. Interactive Digital Broadcasting will be reality in 2010+. Heaps of video material will be produced by TV broadcasters, which will overwhelm both the producers as well as the consumers. Current TV formats and forms of broadcasting do not satisfy the personal moods and interests of the consumer. We hence propose the development of a TV environment which allows for the establishment of 'virtual personalised channels'. To do so, (live) semantic annotation of video material as well as methods for live staging of media events have to be designed. The resulting drastically different process of content production and consuming will lead to the satisfaction of individual human needs. The approaches outlined in this extended abstract are the basis for our upcoming IST research project LIVE.
EWIMT 2005: Second European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies, 30 November – 1 December, 2005, IEE Savoy Place, London
Personal content recommender based on a hierarchical user model for the selection of TV programmes
Authors: Matevz Pogacnik, Jurij Tasic, Marko Meza, Andrej Kosir
Abstract: In this paper we present our approach to user modeling for a personalized selection of multimedia content tested on a corpus of TV programmes. The idea of this approach is to classify content (TV programmes) based on the calculation of similarities between the description of content and the user model for each description attribute. Calculated similarities are then combined into a classification decision using the Support Vector Machines. The basis for the calculation of similarities is a hierarchical structure of the user model, overlaid upon a taxonomy of TV programme genres. Preliminary results show that it works well with a varying quality of content descriptions including incomplete genre classi¯cation and arbitrary number of description attributes. The evaluation of the system performance was based on content described using the TV-Anytime standard, but the approach can be adapted for search of other types of content with multi-attribute descriptions.
Copyright 2005 Kluwer Academic Publishers. Printed in the Netherlands.
Scientific Papers from 2004
AUTOMATIC EXTRACTION OF MPEG-7 AUDIO METADATA USING THE MEDIA ASSET MANAGEMENT SYSTEM IFINDER
Authors: JOBST LÖFFLER, KONSTANTIN BIATOV, JOACHIM KÖHLER
Abstract: This paper describes the MPEG-7 compliant media asset management system iFinder, which provides a set of automatic methods and software tools for media analysis, archiving and retrieval. The iFinder was developed for use in the media industry and consists of the iFinderSDK and the iFinder retrieval engine. The iFinderSDK is composed of a bundle of modules that realize individual technologies for audio and video metadata extraction. In this paper we present the audio content processing workflow and the pattern recognition methods implemented in iFinder. In particular, a technique for precise audio/text alignment and a browser that displays the synchronized media channels of the retrieval results are discussed. This paper also provides practical insight into how to use MPEG-7 as a standardized metadata format for media asset management.
AES 25th International Conference, London, United Kingdom, 2004 June 17–19
An Extraction of Speech Data from Audio Stream Using Unsupervised Pre-Segmentation
Authors: K. Biatov
Abstract: In this paper we investigate an extraction of speech data from audio stream. Our method includes unsupervised optimal self segmentation of the audio stream into small, homogeneous segments. The homogeneity is defined on a base of the average amplitude and a zero-crossing in a frame. A measure of the homogeneity is entropy. In our approach we calculate a relative ratio between the average amplitudes of the neighboring homogeneous segments. For a speech signal this ratio is less than a threshold defined on a short pure speech signal. As a discriminative feature we use a percent of the homogeneous segments within 1 sec interval having high relative amplitude ratio. In the process of the classification each 1 sec is labeled incrementally as a speech or a nonspeech segment. The discrimination technique shows high performance for more than six-hour data that include different types of audio.