Demo of Video Conducting the Olympic Games 2008: The iTV Field Trial of the EU-IST Project LIVE

Authors: Carmen Mac Williams, Roland Westermaier, Torsten Kliemand

Abstract: The EU IP project LIVE investigates new professional roles, production methods and iTV formats and staging of live media events in professional broadcast environment. The vision of LIVE is to change the role of the traditional live TV director to become a future Video Conductor communicating in real time with his audience.

In: Proceedings of the EuroITV 2008: July 3 - 4, 2008, Salzburg, Austria.


Video Conducting the Olympic Games 2008: The iTV Field Trial of the EU-IST Project LIVE

Authors: Carmen Mac Williams, Richard Wages

Abstract: In the upcoming field trial of LIVE a Video Conductor and his team at the public Austrian TV Station ORF will stage a live non linear multi-perspective show around the Olympic Games 2008 with the instant feedback of 500 invited Austrian Telecom IPTV test end users. The aim of the field trial is to improve the public TV service to the Austrian public by dynamically linking multi-stream videos as the live event of the Olympic Games 2008 and as the viewers demand it. The Video Conductor ensures a quality of drama by linking the multi-stream videos responding to the unfolding sport action and the audience’s mood.

In: Proceedings of the 3rd ACM International Conference on Digital Interactive Media in Entertainment and Arts, DIMEA 2008, September 10 - 12, 2008, Athens, Greece



Authors: Ibrahim Nasir, Ying Weng, Jianmin Jiang and Stanley Ipson

Abstract: In this paper, we propose a robust multiple digital watermarking technique for the copyright protection of digital color images. The watermark is a binary image, which is divided into four parts, each encrypted using a secret Key and embedded using the spatial domain into four different regions of size 128×128 of the blue component of the color image. Watermark extraction is based on the comparisons between the original intensity pixel values and the corresponding watermarked intensity pixel values in blocks of size 8x8. The watermark extracted bits are determined using the probabilities of detecting bit '1' or bit '0'. The watermark can be extracted in sixteen parts but only four of these are selected by a correlation coefficient detector and used to reconstruct the extracted watermark. Experimental results show that the proposed scheme successfully makes the watermark perceptually invisible and robust for a wide range of attacks, including JPEG loss compression, median filtering, low pass filtering, rotation, rotation-scaling, rotation-crop, image cropping, image scaling and self similarity attacks.

Proceeding of the 8th IASTED International Conference on Visualization, Imaging and Image Processing (VIIP 2008), Sep 1-3, 2008, Palma de Mallorca, Spain, pp.280-285.


Ein Ansatz zur Unterstützung traditioneller Klassifikation durch Social Tagging

Authors: Georg Güntner, Rolf Sint, Rupert Westenthaler

Abstract: Der vorliegende Beitrag stellt einen Ansatz zur Kombination von traditionellen, geschlossenen Klassifikationsverfahren mit offenen, auf Social Tagging basieren-den Klassifikationsverfahren vor. Die Darstellung geht von den grundsätzlichen Anforderungen an die Suche und Navigation in Dokumentenarchiven aus, erörtert die Vor- und Nachteile von geschlossenen und offenen Klassifikationsansätzen und präsentiert schließlich einen kombinierten Lösungsansatz, der im Rahmen ei-nes Prototypen umgesetzt wurde. Der Lösungsansatz sieht vor, dass Dokumente grundsätzlich mit freien Tags klas-sifiziert werden können: Die Klassifikation wird jedoch durch ein kontrolliertes Vokabular unterstützt. Freie Tags werden in einem nachgeordneten, moderierten Prozess in das kontrollierte Vokabular übernommen. Das auf diese Weise wach-sende und laufend gepflegte Vokabular unterstützt die Suche und Navigation im Dokumentenraum.

In: Birgit Gaiser, Thorsten Hampel, Stefanie Panke (Hrsg.) Good Tags – Bad Tags. Social Tagging in der Wissensorganisation.


Bringing "Intelligence" to iTV: The Intelligent Media Framework

Authors: Georg Güntner, Tobias Bürger, Dietmar Glachs, Rupert Westenthaler

Abstract: This paper gives an overview of a software framework designed for the creation of interactive multi-channel television shows. The “Intelligent Media Framework” forms the middleware of an iTV production support system developed in the context of the European integrated project “LIVE”. The framework is designed according to service oriented architecture (SOA) principles for easy integration into existing iTV and TV production environments. Moreover, the Intelligent Media Frame-work is based on a knowledge model formalising the main aspects identified to make up the domain of real-time staging of media events: the content (media clips and media streams), the events, the staging and the users (professional users and consumers). The framework offers services for the development of tools assisting the production team of multi-channel iTV shows in an intelligent way: The envisaged “intelligence” is based on formal, machine understandable descriptions of the content and the events: This document introduces the knowledge model and provides an overview of the architecture of a media framework designed to support the iTV production process in an intelligent way

Presented at EuroiTV 2008 Salzburg, Austria


Skin detection from different colour spaces for model-based face detection

Authors: Jianmin Jiang, Jinchang Ren

Abstract: Skin and face detection has many important applications in intelligent human-machine interfaces, reliable video surveillance and visual understanding of human activities. In this paper, we propose an efficient and effective method for frontal-view face detection based on skin detection and knowledge-based modeling. Firstly, skin pixels are modeled by using supervised training, and boundary conditions are then extracted for skin segmentation. Faces are further detected by shape filtering and knowledge-based modeling. Skin results from different color spaces are compared. In addition, experimental results have demonstrated our method robust in successful detection of skin and face re-gions even with variant lighting conditions and poses.

Springer CCIS, Vol. 15, ISBN 978-3-540-85929-1, 2008


Knowledge-supported segmentation and semantic contents extraction from MPEG videos for highlight-based annotation, indexing and retrieval

Authors: Jianmin Jiang, Jinchang Ren

Abstract: Automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications. In this paper, we propose techniques to solve this problem by using knowledge supported extraction of semantic contents, and compressed-domain processing is employed for efficiency. Firstly, video shots are detected by using knowledge-supported rules. Then, human objects are detected via statistical skin detection. Meanwhile, camera motion like zoom in is identified. Finally, highlights of zooming in human objects are extracted and used for annotation, indexing and retrieval of the whole videos. Results from large data of test videos have demonstrated the accuracy and robustness of the proposed techniques.

Springer LNCS 5226, pp. 258-265, ISBN 978-3-540-87440-9, 2008.


Fusion of intensity and channel difference for improved colour edge detection

Authors: Jianmin Jiang, Jinchang Ren

Abstract: Edge detection, especially from colour images, plays very important roles in many applications for image analysis, segmentation and recognition. In this paper, a new colour gray mapping method for effective colour edge detection is proposed. From any given colour image C, a gray image D is defined as the accumulative differences between each of its two colour channels, and another gray image R is then obtained by weighting of D and gray intensity image G. Fusion of edges extracted from R and G forms the final results. Comparing with edges detected from traditional colour spaces like RGB, YCbCr and HSV, all using same Canny operator, it seems the proposed method can achieve more effective results from different test images.

In Proc. VIE'08, Xi'an, China, pp. 18-22, July 29-Aug 1, 2008


A Block-Edge-Pattern-Based Content Descriptor in DCT Domain

Authors: Jianmin Jiang, Kaijin Qiu, and Guoqiang Xiao

Abstract: In this correspondence, we describe a robust and effective content descriptor based on block-edge patterns extracted in discrete cosine transform domain, which is suitable for applications in JPEG or MPEG compressed images and videos. This content descriptor is constructed by a run-length edge-block histogram with three patterns including horizontal edge, vertical edge and no edge. In comparison with existing descriptors, the proposed features: 1) low-cost computing suitable for real-time implementation and high-speed processing of compressed videos; 2) robust to orientation changes such as rotation, noise, reverse, etc.; 3) operates in compressed domain. Extensive experiments support that the proposed content descriptor is effective in describing visual content, and achieves superior performances in terms of retrieval precision and recall rates



An Efficient Face Image Retrieval through DCT Features

Authors: Aamer. S. S. Mohamed, Ying Weng,Jianmin Jiang andStan Ipson 

Abstract: This paper proposes a new simple method of DCT feature extraction that utilize to accelerate the speed and decrease storage needed in image retrieving process by the aim of direct content access and extraction from JPEG compressed domain. Our method extracts the average of some DCT block coefficients. This method needs only a vector of six coefficients per block over the whole image blocks In our retrieval system, for simplicity, an image of both query and database are normalized and resized from the original database based on the cantered position of the eyes, the normalized image equally divided into non overlapping 8X8 block pixel Therefore, each of which are associated with a feature vector derived directly from discrete cosine transform DCT. Users can select any query as the main theme of the query image. The retrieval images is the relevance between a query image and any database image, the relevance similarity is ranked according to the closest similar measures computed by the Euclidean distance. The experimental results show that our approach is easy to identify main objects and reduce the influence of background in the image, and thus improve the performance of image retrieval.

Accepted for The 10th IASTED International Conference on Signal and Image Processing, August 18 – 20, 2008, USA 


Subsampling-based image watermarkng in compressed DCT domain

Authors: Ibrahim Nasir , Ying Weng,  Jianmin Jiang and Stan Ipson 

Abstract: In this paper, a new embedding strategy for watermarking is presented based on DC components of subimages in compressed discrete cosine transform (DCT) domain. These subimages are obtained through subsampling the host image. More robustness has been achieved when watermarks are embedded in perceptually significant DC components. Furthermore, the original image is not required in the extraction process. Experimental results show that the proposed scheme successfully makes the watermark perceptually invisible and robust for a wide range of attacks, including JPEG-loss compression, filtering, scaling, and cropping attacks.

Accepted for The 10th IASTED International Conference on Signal and Image Processing, August 18 – 20, 2008, USA  


Scientific Papers from 2007


Progressive content access to databases of JPEG compressed images

Authors: J. Jiang, G. Feng, and Y. Yin

Abstract: Progressive content access provides a mode that allows a coarse version of an image being viewed at a lower computing cost and then gradually refined by subsequent resolution enhancement if required. This proves extremely useful when millions of compressed images and video sequences need to be browsed manually or processed in pixel domain, saving the cost and removing the necessity of full decompression. In this paper, we propose such a progressive content access algorithm suitable for all DCT-based JPEG and MPEG compressed files. We first develop a theoretical model in approximation of cosine function used in IDCT with various orders. Following that, we then propose a progressive content access algorithm, which comprehends both the successive approximation and the spectral selection. Further analysis and experiments are reported to show that our proposed algorithm saves computational cost in comparison with JPEG full decompression. Extensive experiments also support that the proposed algorithm achieves encouraging PSNR values for reconstructed images even with lower order approximation.

IET Image Processing, Vol 1, No 2, 2007, pp 207-214, ISSN: 1751-9667; 


Robustness Analysis on Facial Image Description in DCT Domain

Authors: Jianmin Jiang, Guocan Feng

Abstract: In this letter, we report a DCT domain analysis of facial images to reveal that, when certain number of DCT coefficients are removed, the corresponding facial image description by the remaining DCT coefficients becomes robust to lighting changes and scale variations. Such nice properties would be very useful for applications of face recognition, video object tracking, object segmentation and visual content processing.

Electronics Letters Volume 43, Issue 24, Nov. 22, 2007 


Real-time shot cut detection in compressed domain

Authors: J. Jiang, Z. Li, G. Xiao and J. Chen

Abstract: In this short paper, we propose a fast and simple shot cut detection algorithm, which directly operates in compressed domain and suitable for real-time implementation. The proposed algorithm exploits the existing MPEG techniques by examining the prediction status for each macro-block inside B frames and P frames. As a result, locating both abrupt and dissolved shot cuts is operated by a sequence of comparison tests, and thus no feature extraction or histogram differentiation is needed. Although the description of the algorithm is primarily based on MPEG-1 and MPEG-2 streams, the scheme can be readily extended to other video compression standards such as MPEG-4 and H.264 by following the principle on monitoring: (i) balance between forward prediction and backward prediction; and (ii) boundaries among P, B and I frames. Extensive experiments illustrate that the proposed algorithm outperforms similar existing algorithm, providing a useful technique for fast and on-line video content processing.

Accepted for publication in: Journal of Electronic Imaging, SPIE


A Block-Edge-Pattern based Content Descriptor in DCT Domain

Authors: Jianmin Jiang, Kaijin Qiu and Guoqiang Xiao

Abstract: In this correspondence, we describe a robust and effective content descriptor based on block edge patterns extracted directly in DCT domain, which is suitable for applications in JPEG or MPEG compressed images and videos. This content descriptor is constructed by a run-length edge-block histogram with three patterns including horizontal edge, vertical edge and no-edge. In comparison with existing descriptors, the proposed features: (i) low-cost computing suitable for real-time implementation and high-speed processing of compressed images or videos; (ii) robust to orientation changes such as rotation, noise, reverse etc. (iii) directly operates in compressed domain. Extensive experiments support that the proposed content descriptor is effective in describing visual content. In comparison with existing techniques, the proposed descriptor achieves superior performances in terms of retrieval precision and recall rates.

Accepted for publication in: IEEE Transactions on Circuits, Systems and Video Technology 


Subspace Extension to Phase Correlation Approach for Fast Image Registration

Authors: Jinchang Ren, Theodore Vlachos, Jianmin Jiang

Abstract: A novel extension of phase correlation to subspace correlation is proposed, in which 2-D translation is decomposed into two 1-D motions thus only 1-D Fourier transform is used to estimate the corresponding motion. In each subspace, the first two highest peaks from 1-D correlation are linearly interpolated for subpixel accuracy. Experimental results have shown both the robustness and accuracy of our method.

ICIP2007, vol. I, 481-484


Statistical Classification of Skin Color Pixels from MPEG Videos

Authors: Jinchang Ren, Jianmin Jiang

Abstract: Detection and classification of skin regions plays important roles in many image processing and vision applications. In this paper, we present a statistical approach for fast skin detection in MPEG-compressed videos. Firstly, conditional probabilities of skin and non-skin are extracted from manual marked training images. Then, candidate skin pixels are identified using the Bayesian maximum a posteriori decision rule. An optimal threshold is then obtained by analysis of probability error on the basis of the likelihood ratio histogram of skin and nonskin pixels. Experiments from sequences with varying illuminations have demonstrated that effectiveness of our approach.

ACIVS 2007: 395-405


Compressed-domain Shot Boundary Detection using Finited State Machine and Content-based Rules

Authors: Juan Chen, Jinchang Ren, Jianmin Jiang

Abstract: We propose a fast and systematic method for shot boundary detection in compressed domain using content-based rules and FSM (finite state machine). Firstly, several feature indicators are acquired from DC images in MPEG videos including luminance, color, edge, prediction error and inter-frame difference as well as motion. Then, several content-based rules are utilized to detect abrupt cuts. Thirdly, boundaries of gradual transitions are determined by a coarse to fine procedure with a pre-processing module and a FSM. According to the experiments using publicly available sequences from TRECVID, the results have showed that the proposed algorithm outperforms the representative existing algorithms in both precision rate and recall rates.

Asia-Pacific Workshop on Visual Information Processing, 2007, pp. 137-142


Recognition of JPEG Compressed Face Images Based on AdaBoost

Authors: Chunmei Qing and Jianming Jiang

Abstract: This paper presents an advanced face recognition system based on AdaBoost algorithm in the JPEG compressed domain. First, the dimensionality is reduced by truncating some of the block-based DCT coefficients and the nonuniform illumination variations are alleviated by discarding the DC coefficient of each block. Next, an improved AdaBoost.M2 algorithm which uses Euclidean Distance(ED) to eliminate non-effective weak classifiers is proposed to select most discriminative DCT features from the truncated DCT coefficient vectors. At last, the LDA is used as the final classifier. Experiments on Yale face databases show that the proposed approach is superior to other methods in terms of recognition accuracy, efficiency, and illumination robustness.

SAMT 2007, LNCS 4816, pp. 272–275, 2007.


A New Robust Watermarking Scheme for Color Image in Spatial Domain

Authors: Ibrahim Nasir, Ying Weng, and Jianmin Jiang

Abstract: This paper presents a new robust watermarking scheme for color image based on a block probability in spatial domain. A binary watermark image is permutated using sequence numbers generated by a secret key and Gray code, and then embedded four times in different positions by a secret key. Each bit of the binary encoded watermark is embedded by modifying the intensities of a non-overlapping block of 8*8 of the blue component of the host image. The extraction of the watermark is by comparing the intensities of a block of 8*8 of the watermarked and the original images and calculating the probability of detecting '0' or '1'. Tested by benchmark Stirmark 4.0, the experimental results show that the proposed scheme is robust and secure against a wide range of image processing operations.



Camera Motion Analysis towards Semantic-based Video Retrieval in Compressed Domain

Authors: Ying Weng, and Jianmin Jiang

Abstract: To reduce the semantic gap between low-level visual features and the richness of human semantics, this paper proposes new algorithms, by virtue of the combined camera motion descriptors with multi-threshold, to automatically retrieve the semantic concepts, i.e., close-up, and panorama, directly in MPEG compressed domain based on camera motion analysis. Extensive experiments illustrate that the proposed algorithms provide promising retrieval results under real-time application scenario and without human intervention

International conference on Semantics And digital Media Technologies (SAMT)2007, LNCS 4816, pp. 276–279, 2007.


Face Detection based on Skin Color in Image by Neural Networks

Authors: Aamer S.S. Mohamed, Ying Weng, Stan S Ipson, and Jianmin Jiang

Abstract: Face detection is one of the challenging problems in the image processing. A novel face detection system is presented in this paper. The approach relies on skin-based color features extracted from two dimensional Discreate Cosine Transfer (DCT) and neural networks, which can be used to detect faces by using skin color from DCT coefficient of Cb and Cr feature vectors. This system contains the skin color which is the main feature of faces for detection, and then the skin face candidate is examined by using the neural networks, which learn from the feature of faces to classify whether the original image includes a face or not. The processing is based on normalization and Discreate Cosin Transfer. Finally the classification based on neural networks approch. The experiment results on upright frontal color face images from the internet show an excellent detection rate.

Accepted by IEEE International Conference on Intelligent and Advanced Systems (ICIAS2007)


Real-time and Automatic Close-up Retrieval from Compressed Videos

Authors: Ying Weng  and Jianmin Jiang

Abstract: In this paper, we propose a thorough scheme, by virtue of camera zooming descriptor with two-level threshold, to automatically retrieve close-ups directly from MPEG compressed videos based on camera motion analysis. In the retrieval process, we build camera-motion-based semantic retrieval. To improve the coverage of the proposed scheme, we investigate close-up retrieval in all kinds of videos. Extensive experiments illustrate that the proposed scheme provides promising retrieval results under real-time and automatic application scenario.

13th International Conference on Automation and Computing, 15 September 2007, Staffordshire University, Stafford, UK


A Novel System for Interactive Live TV

Authors: Stefan Gruenvogel, Richard Wages, Tobias Buerger, and Janez Zaletelj

Abstract: For the production of interactive live TV formats, new content and
new productions workflows are necessary. We explain how future content of a parallel multi-stream production of live events may be created from a design and a technical perspective. It is argued, that the content should be arranged by dramaturgical principles taking into account the meaning of the base material. To support the production process a new approach for content recommendation is described which uses semantic annotation from audio-visual material and user feedback.

Published in Lizhuang Ma, Matthias Rauterberg, Ryohei Nakatsu (Eds.): Entertainment Computing - ICEC 2007, 6th International Conference, Shanghai, China, September 15-17, 2007, Proceedings. Lecture Notes in Computer Science 4740 Springer 2007


Next Generation Live iTV Formats and Aesthetics: A Joint Scientific and Artistic Approach

Authors: Richard Wages, Stefan Gruenvogel, Georg Guentner, and Janez Zaletelj

Abstract: In this work we describe our vision of next generation live iTV formats, which consist of a broadcast bouquet of meaningfully interwoven and cross- referencing parallel sub-channels covering a single live event in multi-view or even several concurring live events. At the same time these sub-channels have to serve a diversity of consumer interests and moods. To achieve this future live broadcasting production teams will need both, artistic and conceptual patterns to create such live formats, as well as a maximum technological support to realise such shows in real-time. We describe our developed foundational components of the envisaged system, which according to our approach consist of a set of live staging concepts, a framework for the management of intelligent media assets and a recommender system. Furthermore we argue for the integration of knowledge and sophistication achieved by live audio-visual artists, namely Video Jockeys, who hence participated in our first live staging tests. We conclude with a short description of our next step,
the integration of instant consumer feedback, to complete the system for the upcoming field trial during the 2008 Olympics.

Proceedings of the eChallenges e-2007 <>: Conference & Exhibition, seventeenth in a series of technology research conferences supported by the European Commission, 24-26 October 2007, The Hague, The Netherlands.


Journal Paper JVRB 2007 Vol.4: Video Composer and Live Video Conductor: Future Professions for the Interactive Digital Broadcasting Industry

Authors: Richard Wages, Carmen Mac Williams, Stefan M. Gruenvogel, and Georg Trogemann

Abstract: Innovations in hardware and network technologies lead to an exploding number of non-interrelated parallel media streams. Per se this does not mean any additional value for consumers. Broadcasting and advertisement industries have not yet found new formats to reach the individual user with their content. In this work we propose and describe a novel digital broadcasting framework, which allows for the live staging of (mass) media events and improved consumer personalisation. In addition new professions for future TV production workflows which will emerge are described, namely the 'video composer' and the 'live video conductor'.

Online publication:


Content Recommendation System in the Production of Multi-Channel TV Programs

Authors: Janez Zaletelj, Richard Wages, Tobias Buerger, Stefan M. Gruenvogel

Abstract: This paper presents the concept of content recommendations for the production of multi-channel TV shows. Within the 6th Framework project “LIVE –Live Staging of Media Events” we are developing a production support system which will have a functionality of content recommendations and will support production of multi-channels programs. The paper outlines a concept of a multi-channel show and presents a possible workflow scenario on how to use content recommendations in the production. The details of the semantic content annotations are given and an example on computation of personalized recommendation of archive content is presented.

The paper was submitted to the 3rd International Conference on Automated Production of Cross Media Content for Multi-channel Distribution (AxMedis 2007),


The evaluation of a hybrid recommender system for recommendation of movies

Authors: Tomaz Pozrl and Matevz Kunaver and Matevz Pogacnik and Jurij F. Tasic

Absract: In this paper we present our approach to generation of movie recommendations. The idea of our hybrid approach is to first separately generate predicted ratings for movies using the content based and collaborative recommender modules. Predicted ratings from both recommender engines are then combined into final classification by the hybrid recommender using weighted voting scheme. The basis for the calculations are Pearson’s correlation coefficient, True Bayesian prediction and M5Rules decision rules. The evaluation of the system performance was based on the EachMovie data corpus, for around 7000 users. Preliminary results show that this approach works really well, while there is still some room for improvement.


Scientific Papers from 2006


Future Live iTV Production: Challenges and Opportunities

Authors: Richard Wages, Stefan M. Gruenvogel, Janez Zaletelj, Carmen Mac Williams Georg Trogemann

Abstract: Today's TV broadcasting companies are highly professionalized in the production of linear TV formats. Workflows and technologies for these linear formats are reliable, the production personnel is highly skilled and we can trust in well-known viewing habits of the consumers. The key issue of this paper is: how do we enable such a broadcasting working environment to produce by far more variable, multiperspective or interactive TV formats? We are especially interested in formats entailing a multitude of live audiovisual material like for example sport events or elections, which shall be transformed into an interactive TV event for the consumer at home. This paper is not concerned with the variety of technical problems the interactive TV paradigm leads to, but with questions of future tools and practices on the producers' side, levels of consumer personalization and the respective consumer interfaces to make digital content accessible.

Proceedings of the AXMEDIS 2006: Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution, December 13 - 15, 2006, Leeds, UK.

EU-IST Project Live: Live Staging of Media Events

Authors: Joachim Koehler, Richard Wages, Carmen Mac Williams, and Heike Fischer

This paper gives an overview, the main objectives and the first results of the EU IST project Live. This integrated project will investigate new methods and formats for authoring and production of live sports events in a professional broadcast environment. To achieve these new interactive TV formats the processing of different types of audio-visual (A/V) material must be enhanced to allow for a more sophisticated selection and advanced authoring of sports content. This live staging process will generate several output streams which are interlinked with each other. The consumers have the benefit that they can select and receive video material in a more personalized manner.

Paper presented at the SAMT 2006: First International Conference on Semantics and Digital Media Technology, December 6 - 8, 2006, Athens, Greece.


Video Composer and Live Video Conductor: Future Professions for the Interactive Digital Broadcasting Industry

Authors: Richard Wages, Carmen Mac Williams, Stefan M. Gruenvogel, Georg Trogemann

Abstract: Innovations in hardware and network technologies lead to an exploding number of non-interrelated parallel media streams. Per se this does not mean any additional value for consumers. Broadcasting and advertisement industries have not yet found new formats to reach the individual user with their content. In this work we propose and describe a novel digital broadcasting framework, which allows for the live staging of (mass) media events and improved consumer personalisation. In addition new professions for future TV production workflows which will emerge are described, namely the 'video composer' and the 'live video conductor'.

EuroITV 2006 – Beyond Usability, Broadcast, and TV: Fourth European Conference on Interactive Television, Athens, Greece, 25-26 May, 2006, Proceedings, pp. 32-38


Mind the gap - Requirements for the combination of content and knowledge

Authors: Tobias Buerger and Rupert Westenthaler

Abstract: Semantic enrichment of content can be done manually, which is expensive, or automatically, which is error-prone. In particular, automatic semantic enrichment must be aware of the gap between the semantics that are directly retrievable from the content and those which can be inferred within a given interpretative context. We report on a model for content and knowledge which distinguishes between three descriptive levels: information relating directly to the resource, to the metadata of the resource and to the subject matter addressed by the content. This model addresses five fundamental requirements for automation: formality, interoperability, multiple interpretations, contextualization, and independence of knowledge items from the resource's content.

SAMT Conference 2006


An Intelligent Media Framework for Multimedia Content

Authors: Tobias Buerger

Abstract: Search, retrieval and navigation in multimedia repositories is a task common to all multimedia management systems: Users are supported by a wide range of features which are traditionally based on full text search and metadata queries. However generating metadata is an error-prune and work-intensive task, that for multimedia content cannot yet be made fully automatically. In this position paper we describe our vision of an Intelligent Media Framework that is capable of combining metadata and knowledge about media items in order to support user orientation, search and retrieval in media-rich information spaces: We try to integrate heterogeneous sources to create an Intelligent Media Framework containing Intelligent Media Objects carrying behavioural knowledge and capable of fully describing themselves. The properties of these objects amongst others serve to the fact that users more likely search by the ”meaning” of audiovisual objects and what is represented by them respectively, than by their pure low-level features.


A Management System for Distributed Knowledge and Content Objects

Authors: Wernher Behrendt, Nitin Arora, Tobias Buerger, Rupert Westenthaler

Abstract: We present the results of a European research project which developed specifications for so-called Knowledge Content Objects (KCO) and for an attendant infrastructure, the Knowledge Content Carrier Architecture (KCCA). The work addresses the problem that while there are many standards for content and for meta data, there is at present, no suitable framework that enables organizations to manage knowledge alongside content, in a coherent manner. Our approach postulates the KCO as a common structural entity which can be recognised and manipulated by a KCCA enabled system.

AXEMDIS Conference 2006


Smart Content – Scenarios and Technologies for a Knowledge-based Audiovisual Archive

Authors: Georg Gunther, Tobias Bueger, Erich Gams

Abstract: In our paper we present the intermediate results of a project aiming at the creation of a knowledge-based infrastructure for search and navigation in audiovisual repositories. The approach is based on highly automated media processing and is therefore specifically targeted to historically grown archives (broadcasters, uni-versities, public and corporate media archives, etc.) lacking the time and/or the finan-cial means to manually annotate their digital media assets. In the project a conceptual architecture was developed to meet the requirements of a set of knowledge-intensive user scenarios for the utilization of rich media content in the B2B and B2C areas. Pluggable RDF knowledge components act as a link between a semantic indexing and knowledge-based navigation.

eChallenges Conference 2006


The Role of MPEG-7 in Semantic Annotation and the Cross-Media Publishing Process

Authors: Tobias Buerger, Georg Guentner, Erich Gams

Abstract: During the development of a knowledge-based audio- visual information system the authors of this article defined a conceptual system architecture based on MPEG-7 as the general description scheme for the media assets in the middleware. This concept was not only used to achieve a high abstraction and independence of the underlying media asset management system, it was also and primarily used as the basis of a semantic indexing process. Based on lightweight ontologies the descriptions of the media assets were associated with semantic concepts. Semantically annotated MPEG-7 assets were then propagated to the presentation layer, thus allowing the implementation of a variety of publication scenarios, including crossmedia scenarios for the creation of concise video summaries.

AXEMDIS Conference 2006


Radio Relief: Radio Archives Departments Benefit from Digital Audio Processing

Authors: Martha Larson, Thomas Beckers and Volker Schlögell

Abstract: The archives departments of radio broadcasters are currently facing face two significant challenges, namely, how to store rapidly increasing amounts of radio content, and how to satisfy the rising demand for easy retrieval of audio clips that can be recycled into new programs. A pilot project demonstrates that digital audio processing techniques have the potential to provide much-needed support.

Appeared in the ERCIM News No. 66, July 2006

Smart Content Factory – Approaching the Vision

Authors: Georg Guentner, Siegfried Reich

Abstract: In this paper we describe the objectives and achievements in developing the vision of a “Smart Content Factory”. The “Smart Content Factory” aims at the creation of a knowledge-aware system infrastructure to improve the utilization (re-use and adaptation) of audiovisual content. We will provide an overview of the project objectives and introduce “digital content engineering” as a scientific discipline dealing with concepts, methodologies, techniques and tools for a quantifiable approach towards the vision of smart content, thereby addressing future scenarios of electronic publishing, especially for embedded publishers. We will further take a look at the user and system requirements of the “Smart Content Factory” and their impact on the architecture of the system prototype.

ELpub Conference 2006

A fuzzy logic approach for detection of video shot boundaries

Authors: Hui Fang, Jianmin Jiang,Yue Feng

Abstract: Video temporal segmentation is normally the first and important step for content-based video applications. Many features including the pixel difference, colour histogram, motion, and edge information etc. have been widely used and reported in the literature to detect shot cuts inside videos. Although existing research on shot cut detection is active and extensive, it still remains a challenge to achieve accurate detection of all types of shot boundaries with one single algorithm. In this paper, we propose a fuzzy logic approach to integrate hybrid features for detecting shot boundaries inside general videos. The fuzzy logic approach contains two processing modes, where one is dedicated to detection of abrupt shot cuts including those short dissolved shots, and the other for detection of gradual shot cuts. These two modes are unified by a mode-selector to decide which mode the scheme should work on in order to achieve the best possible detection performances. By using the publicly available test data set from Carleton University, extensive experiments were carried out and the test results illustrate that the proposed algorithm outperforms the representative existing algorithms in terms of the precision and recall rates.

Appeared in the Journal of the Pattern Recognition Society

Video Indexing and Retrieval in Compressed Domain Using Fuzzy-Categorization

Authors: Hui Fang, Rami Qahwaji, and Jianmin Jiang 

Abstract: There has been an increased interest in video indexing and retrieval in recent years. In this work, indexing and retrieval system of the visual contents is based on feature extracted from the compressed domain. Direct possessing of the compressed domain spares the decoding time, which is extremely important when indexing large number of multimedia archives. A fuzzy categorizing structure is designed in this paper to improve the retrieval performance. In our experiment, a database that consists of basketball videos has been constructed for our study. This database includes three categories: full court match, penalty and close-up. First, spatial and temporal feature extraction is applied to train the fuzzy membership functions using the minimum entropy optimal algorithm. Then, the max composition operation is used to generate a new fuzzy feature to represent the content of the shots. Finally, the fuzzy-based representation becomes the indexing feature for the content-based video retrieval system. The experimental results show that the proposal algorithm is quite promising for semantic-based video retrieval.

G. Bebis et al. (Eds.): ISCV 2006, LNCS 4292, pp. 1143 – 1150, 2006. Copyright Springer-Verlag Berlin Heidelberg 2006 


An Effective and Fast Scene Change Detection Algorithm for MPEG Compressed Videos

Authors: Z. Li, J. Jiang, G. Xiao, and H. Fang 

Abstract: In this paper, we propose an effective and fast scene change detection algorithm directly in MPEG compressed domain. The proposed scene change detection exploits the MPEG motion estimation and compensation scheme by examining the prediction status for each macro-block inside B frames and P frames. As a result, locating both abrupt and dissolved scene changes is operated by a sequence of comparison tests, and no feature extraction or histogram differentiation is needed. Therefore, the proposed algorithm can operate in compressed domain, and suitable for real-time implementations. Extensive experiments illustrate that the proposed algorithm achieves up to 94% precision for abrupt scene change detection and 100% for gradual scene change detection. In comparison with similar existing techniques, the proposed algorithm achieves superiority measured by recall and precision rates.

A. Campilho and M. Kamel (Eds.): ICIAR 2006, LNCS 4141, pp. 206 – 214, 2006. Copyright Springer-Verlag Berlin Heidelberg 2006


DCT-Domain Image Retrieval Via Block-Edge-Patterns

Authors: K.J. Qiu, J. Jiang, G. Xiao, and S.Y. Irianto 

Abstract: A new algorithm for compressed image retrieval is proposed in this paper based on DCT block edge patterns. This algorithm directly extract three edge patterns from compressed image data to construct an edge pattern histogram as an indexing key to retrieve images based on their content features. Three feature-based indexing keys are described, which include: (i) the first two features are represented by 3-D and 4-D histograms respectively; and (ii) the third feature is constructed by following the spirit of run-length coding, which is performed on consecutive horizontal and vertical edges. To test and evaluate the proposed algorithms, we carried out two-stage experiments. The results show that our proposed methods are robust to color changes and varied noise. In comparison with existing representative techniques, the proposed algorithms achieves superior performances in terms of retrieval precision and processing speed.

A. Campilho and M. Kamel (Eds.): ICIAR 2006, LNCS 4141, pp. 673 – 684, 2006. Copyright Springer-Verlag Berlin Heidelberg 2006


Constrained Region-Growing and Edge Enhancement Towards Automated Semantic Video Object Segmentation

Authors: L. Gao, J. Jiang, and S.Y. Yang 

Abstract: Most existing object segmentation algorithms suffer from a so-called under-segmentation problem, where parts of the segmented object are missing and holes often occur inside the object region. This problem becomes even more serious when the object pixels have similar intensity values as that of backgrounds. To resolve the problem, we propose a constrained region-growing and contrast enhancement to recover those missing parts and fill in the holes inside the segmented objects. Our proposed scheme consists of three elements: (i) a simple linear transform for contrast enhancement to enable stronger edge detection; (ii) an 8-connected linking regional filter for noise removal; and (iii) a constrained region-growing for elimination of those internal holes. Our experiments show that the proposed scheme is effective towards revolving the undersegmentation problem, in which a representative existing algorithm with edgemap based segmentation technique is used as our benchmark.

J. Blanc-Talon et al. (Eds.): ACIVS 2006, LNCS 4179, pp. 323 – 331, 2006. Copyright Springer-Verlag Berlin Heidelberg 2006


Adding Lossless Video Compression to MPEGs

Authors: Jianmin Jiang and Guoqiang Xiao 

Abstract: In this correspondence, we propose to add a lossless compression functionality into existing MPEGs by developing a new context tree to drive arithmetic coding for lossless video compression. In comparison with the existing work on context tree design, the proposed algorithm features in 1) prefix sequence matching to locate the statistics model at the internal node nearest to the stopping point, where successful match of context sequence is broken; 2) traversing the context tree along a fixed order of context structure with a maximum number of four motion compensated errors; and 3) context thresholding to quantize the higher end of error values into a single statistics cluster. As a result, the proposed algorithm is able to achieve competitive processing speed, low computational complexity and high compression performances, which bridges the gap between universal statistics modeling and practical compression techniques. Extensive experiments show that the proposed algorithm outperforms JPEG-LS by up to 24% and CALIC by up to 22%, yet the processing time ranges from less than 2 seconds per frame to 6 seconds per frame on a typical PC computing platform.



Analysis of Cluttered Scenes Using an Elastic Matching Approach for Stereo Images

Authors: Christian Eckes, Jochen Triesch,and Christoph von der Malsburg

Abstract: We present a system for the automatic interpretation of cluttered scenes containing multiple partly occluded objects in front of unknown, complex backgrounds. The system is based on an extended Elastic Graph Matching algorithm that allows to explicitly model partial occlusions. Our approach extends an earlier system in two ways. First, we use Elastic Graph Matching in stereo image pairs to increase matching robustness and disambiguate occlusion relations. Second, we use richer feature descriptions in the object models by integrating shape/texture with color features. We demonstrate that the combination of both extensions substantially increases recognition performance. The system learns about new objects in a simple one-shot learning approach. Despite the lack of statistical information in the object models and the lack of an explicit background model, our system performs surprisingly well for this very difficult task. Our results underscore the advantages of view-based feature constellation representations for difficult object recognition problems.


Unsupervised Speaker Clustering using Global Similarity and Fo Features

Authors: Konstantin Biatov, Martha Larson

Abstract: This paper investigates an unsupervised speaker clustering approach that exploits global similarity and also proposes extending the standard cepstal feature set used for speaker clustering with prosodic features, extracted from F0. The global-similarity based speaker clustering algorithm, initially proposed by the authors in [6], leverages the insight that audio segments within a single cluster are not only similar to one another, but also display the same patterns of similarities and differences with audio segments belonging to all other clusters. First, speaker clustering performance using the standard Bayesian Information Criterion (BIC) is compared to the performance achieved using a BIC-based algorithm incorporating global similarity. Then, both clustering techniques are tested using an extended feature set including F0-derived features in addition to the standard cepstral features. The evaluation, which is performed on data recorded from German language radio, shows the clear benefits of using global information when performing clustering. It also demonstrates that in most cases F0-features outperform the cepstralonly feature set both in standard BIC clustering and in the BIC global-similarity-based approach.


MPEG-2 Compressed-Domain Algorithms for Video Analysis

Authors: Wolfgang Hesseler and Stefan Eickeler

Abstract: This paper presents new algorithms for extracting metadata from video sequences in the MPEG-2 compressed domain. Three algorithms for efficient low-level metadata extraction in preprocessing stages are described. The first algorithm detects camera motion using the motion vector field of an MPEG-2 video. The second method extends the idea of motion detection to a limited region of interest, yielding an efficient algorithm to track objects inside video sequences. The third algorithm performs a cut detection using macroblock types and motion vectors.

Hindawi Publishing Corporation EURASIP Journal on Applied Signal Processing Volume 2006, Article ID 56940, Pages 1–11 DOI 10.1155/ASP/2006/56940


Scientific Papers from 2005


Speaker Clustering via Bayesian Information Criterion using a Global Similarity Constraint

Authors: Konstantin Biatov, Martha Larson

Abstract: In this paper we proposed a global similarity constraint that improves speaker clustering as standardly performed using the Bayesian Information Criterion (BIC). The novelty of our approach lies in the fact that it exploits the hypothesis that audio segments belonging to the same speaker cluster should demonstrate similar global behavior, i.e. exhibit approximately the same pattern of similarity and dissimilarity with the all other segments. Every segment is represented by a global similarity vector whose components encode the BICbased local similarity between that segment and each of the other segments to be clustered. Speaker clustering is performed bottom up using the BIC to compare each pair of segments and determine if their similarity is high enough to merge them. We use the global similarity vectors to constrain merging to segment pairs that have approximately the same patterns of global similarity. The evaluation, performed on audio data from 4 different German-language radio programs, shows that the proposed approach represents an improvement on the standard BIC clustering.

Smart Content Factory - Assisting Search for Digital Objects by Generic Linking Concepts to Multimedia Content

Authros: Tobias Buerger, Erich Gams, Georg Guentner

Abstract: Search, retrieval and navigation in audiovisual repositories is a task common to all media asset management systems: Users are supported by a wide range of features which are traditionally based on full text search and metadata queries. In this paper we describe an approach to superimpose a semantic indexing infrastructure over the media assets and the metadata associated with them. The infrastructure is based on formal knowledge models and facilitates the use of further navigation dimensions: By identifying semantic concepts we are able to create a dynamic navigation structure which is based on the underlying knowledge model and the conceptual relations defined therein.


Authors: Carmen Mac Williams, Richard Wages, Stefan M. Gruenvogel, Georg Trogemann

Abstract: Television is undergoing a historical change. Interactive Digital Broadcasting will be reality in 2010+. Heaps of video material will be produced by TV broadcasters, which will overwhelm both the producers as well as the consumers. Current TV formats and forms of broadcasting do not satisfy the personal moods and interests of the consumer. We hence propose the development of a TV environment which allows for the establishment of 'virtual personalised channels'. To do so, (live) semantic annotation of video material as well as methods for live staging of media events have to be designed. The resulting drastically different process of content production and consuming will lead to the satisfaction of individual human needs. The approaches outlined in this extended abstract are the basis for our upcoming IST research project LIVE.

EWIMT 2005: Second European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies, 30 November – 1 December, 2005, IEE Savoy Place, London


Personal content recommender based on a hierarchical user model for the selection of TV programmes

Authors: Matevz Pogacnik, Jurij Tasic, Marko Meza, Andrej Kosir

Abstract: In this paper we present our approach to user modeling for a personalized selection of multimedia content tested on a corpus of TV programmes. The idea of this approach is to classify content (TV programmes) based on the calculation of similarities between the description of content and the user model for each description attribute. Calculated similarities are then combined into a classification decision using the Support Vector Machines. The basis for the calculation of similarities is a hierarchical structure of the user model, overlaid upon a taxonomy of TV programme genres. Preliminary results show that it works well with a varying quality of content descriptions including incomplete genre classi¯cation and arbitrary number of description attributes. The evaluation of the system performance was based on content described using the TV-Anytime standard, but the approach can be adapted for search of other types of content with multi-attribute descriptions.

Copyright 2005 Kluwer Academic Publishers. Printed in the Netherlands.


Scientific Papers from 2004




Abstract: This paper describes the MPEG-7 compliant media asset management system iFinder, which provides a set of automatic methods and software tools for media analysis, archiving and retrieval. The iFinder was developed for use in the media industry and consists of the iFinderSDK and the iFinder retrieval engine. The iFinderSDK is composed of a bundle of modules that realize individual technologies for audio and video metadata extraction. In this paper we present the audio content processing workflow and the pattern recognition methods implemented in iFinder. In particular, a technique for precise audio/text alignment and a browser that displays the synchronized media channels of the retrieval results are discussed. This paper also provides practical insight into how to use MPEG-7 as a standardized metadata format for media asset management.

AES 25th International Conference, London, United Kingdom, 2004 June 17–19


An Extraction of Speech Data from Audio Stream Using Unsupervised Pre-Segmentation

Authors: K. Biatov

Abstract: In this paper we investigate an extraction of speech data from audio stream. Our method includes unsupervised optimal self segmentation of the audio stream into small, homogeneous segments. The homogeneity is defined on a base of the average amplitude and a zero-crossing in a frame. A measure of the homogeneity is entropy. In our approach we calculate a relative ratio between the average amplitudes of the neighboring homogeneous segments. For a speech signal this ratio is less than a threshold defined on a short pure speech signal. As a discriminative feature we use a percent of the homogeneous segments within 1 sec interval having high relative amplitude ratio. In the process of the classification each 1 sec is labeled incrementally as a speech or a nonspeech segment. The discrimination technique shows high performance for more than six-hour data that include different types of audio.