An Extraction of Speech Data from Audio Stream Using Unsupervised Pre-Segmentaiton
- Main information
-
In this paper we investigate an extraction of speech data from audio stream. Our method includes unsupervised optimal self-segmentation of the audio stream into small, homogeneous segments. The homogeneity is defined on a base of the average amplitude and a zero-crossing in a frame. A measure of the homogeneity is entropy. In our approach we calculate a relative ratio between the average amplitudes of the neighboring homogeneous segments. For a speech signal this ratio is less than a threshold defined on a short pure speech signal. As a discriminative feature we use a percent of the homogeneous segments within 1 sec interval having high relative amplitude ratio. In the process of the classification each 1 sec is labeled incrementally as a speech or a non-speech segment. The discrimination technique shows high performance for more than six-hour data that include different types of audio. WP5: Detection, Extraction and Annotation of Knowledge. IAIS Konstantin Biatov 2007-03-15 17:44 Request for more detail
- Access and Use Rights
-
Condition of use defined in response to "need to access request". Copyright Fraunhofer Institut Intelligente Analyse- und Informationssysteme. Closed, attachment is not public
This item is not available for public download. For further information click on Request for more details above