Personal tools
You are here: Home Knowledge Map speech-detection  
Views
  • State: visible

Content related to "speech-detection"

(1) speech-detection Unsupervised Speaker Clustering Using a Global Similarity and F0 Features
This paper investigates an unsupervised speaker clustering approach that exploits global similarity and also proposes extending the standard cepstal feature set used for speaker clustering with prosodic features, extracted from F0. The global-similarity-based speaker clustering algorithm, initially proposed by the authors in [6], leverages the insight that audio segments within a single cluster are not only similar to one another, but also display the same patterns of similarities and differences with audio segments belonging to all other clusters. First, speaker clustering performance using the standard Bayesian Information Criterion (BIC) is compared to the performance achieved using a BIC-based algorithm incorporating global similarity. Then both clustering techniques are tested using an extended feature set including F0-derived features in addition to the standard cepstral features. The evaluation, which is performed on data recorded from German language radio, shows the clear benefits of using global information when performing clustering. It also demonstrates that in most cases F0-features outperform the cepstral feature set both in standard BIC clustering and in the BIC global-similarity-based approach.
(1) speech-detection Improvement Speaker Clustering Using Global Similarity Features
In this paper global similarity features that improve speaker clustering based on standard bottom-up clustering are proposed. The novelty of this approach lies in the fact that it exploits the hypothesis that audio segments belonging to the same speaker cluster should demonstrate similar global behavior, i.e. in a way exhibit the same similarity and dissimilarity with all the other segments. Every segment is represented by a global similarity vector whose components are encoded by the distance between that segment and each of the other segments to be clustered. The distance between global similarity vectors is used for pre-selection of segment pairs having high global similarity for further merging. In this paper inter-segment distance for global similarity vectors based on Bayesian Information Criterion (BIC) and based on adapted cross likelihood ratio (CLR) are investigated. The evaluation, performed on radio programs, shows that the proposed approach represents an improvement in comparison with the baseline clustering.
(1) speech-detection An Extraction of Speech Data from Audio Stream Using Unsupervised Pre-Segmentaiton
In this paper we investigate an extraction of speech data from audio stream. Our method includes unsupervised optimal self-segmentation of the audio stream into small, homogeneous segments. The homogeneity is defined on a base of the average amplitude and a zero-crossing in a frame. A measure of the homogeneity is entropy. In our approach we calculate a relative ratio between the average amplitudes of the neighboring homogeneous segments. For a speech signal this ratio is less than a threshold defined on a short pure speech signal. As a discriminative feature we use a percent of the homogeneous segments within 1 sec interval having high relative amplitude ratio. In the process of the classification each 1 sec is labeled incrementally as a speech or a non-speech segment. The discrimination technique shows high performance for more than six-hour data that include different types of audio.
(1) speech-detection Speaker Clustering via Bayesian Information Criterion using a Global Similarity Constraint
In this paper we proposed a global similarity constraint that improves speaker clustering as standardly performed using the Bayesian Information Criterion (BIC). The novelty of our approach lies in the fact that it exploits the hypothesis that audio segments belonging to the same speaker cluster should demonstrate similar global behavior, i.e. exhibit approximately the same pattern of similarity and dissimilarity with the all other segments. Every segment is represented by a global similarity vector whose components encode the BIC-based local similarity between that segment and each of the other segments to be clustered. Speaker clustering is performed bottom up using the BIC to compare each pair of segments and determine if their similarity is high enough to merge them. We use the global similarity vectors to constrain merging to segment pairs that have approximately the same patterns of global similarity. The evaluation, performed on audio data from 4 different German-language radio programs, shows that the proposed approach represents an improvement on the standard BIC clustering.