Voice activity detection (VAD) is a key technique in numerous speech-related application such as speech recognition, speech enhancement and speech coding. In these applications, VAD discriminates the speech from the incoming signal, so that subsequent process steps can aim to speech signal rather than silence or noise. Therefore, VAD must have a robust accuracy in severe, various noise environment. Furthermore, VAD should have a low complexity to be adapted in real-time applications. The most important thing to construct the robust VAD is the feature that system found from the speech signal. Thus, the VAD design procedure can be mapped to feature extraction problem from speech signal. In this paper, we proposed two-direction to extract the robust feature from speech signal. First, unsupervised learning based feature that used the intrinsic harmonicity in the vowel sound. In this procedure, the new approach is proposed to verify the harmonicity and it was applied to VAD system. Our experiments show that the computation cost was extraordinarily reduced compared to previ-ous harmonicity based approach even though the accuracy is slightly improved in severe noise environment. Second, supervised learning based feature which use the discriminative pre-training (DPT). In this approach, we assume that various speech-related features have dissimilar robustness according to different noise types so that, if we fuse these features well, the fused one become a robust feature regardless of the noise type. In order to veri-fy this assumption, well-known speech-related features are fused by DPT. The training step was conducted with various SNR and noise type signal different from previous approach. The result show that the accuracy was out-standing compared to other state-of-the-art approaches.
음성 검출기는 대부분의 음성 신호처리 관련 분야에 사용되는 중요한 기술이다. 음성 검출기는 다양한 잡음환경에 강건해야 하고, 계산 량이 적어야 한다. 이러한 조건을 만족시키는 음성 검출기를 위하여 본 논문에서는 두 가지 방향의 방법을 제안하였다. 첫 째는 비 학습기반의 음성 검출기로써, 음성 중 모음이 가지고 있는 조화성을 이용하여 음성의 특징을 찾고 이를 이용하여 음성 검출기를 구성하였다. 둘 째는 학습 기반의 음성 검출기로써, 기존에 개발된 음성의 특징들이 특정 잡음에서 강건할 것이고, 따라서 그러한 음성의 특징들을 잘 조합하면 강건한 특징을 만들 수 있을 것이라는 가정을 세운 뒤, 깊은 심층 신경만을 이용하여 음성의 특징들을 조합 한 후 그것을 이용한 음성 검출기를 설계하였다. 실험 결과 제안한 두 방법 모두 좋은 정확도를 나타내었으며, 특히 첫 번째 방법은 계산 량 측면에서 상당히 우수한 결과를 보였다.