中文語音資訊摘要-模型與特徵之改進
No Thumbnail Available
Date
2007
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
大量包含聲音與影像的多媒體內容持續增加,並且遍佈於網際網路與我們的日常生活中,如何有系統化及自動化地處理與統整,已成為當前重要的課題之一。其中,語音為多媒體內容中最具有語意的主要內涵之一,通常可用來表示多媒體檔案的主題與概念。近幾年來,有許多學者已投入多媒體內容組織與理解的相關研究,並有豐碩的成果與貢獻,例如語音文件的轉譯、檢索與摘要。
文件摘要可分為摘錄式(Extractive)與非摘錄式(Non-extractive or Abstract)摘要,摘錄式摘要依特定摘要比例,從原文件中選出重要的文句、段落或章節來組成摘要;非摘錄式摘要是直接根據文件內容的主題或概念所產生的摘要內容。由於非摘錄式摘要仍具相當的困難度,故目前自動語音文件摘要的相關研究大多以摘錄式摘要為主。本論文主要探討摘錄式中文廣播新聞語音文件摘要方法。我們提出一個機率生成架構,它能將文句生成模型與文句事前機率緊密地耦合,用於摘錄式摘要之重要文句選取。待摘要文件中每一文句被視為一個機率生成式模型,藉以預測文件生成的機率。我們提出二種機率生成模型:隱藏式馬可夫模型(Hidden Markov Model, HMM)與關聯性模型(Relevance Model, RM)的結合,以及詞層次混合模型(Word Topical Mixture Model, wTMM)。同時,我們亦初步將辨識信心度與一些語音聲韻特徵用來作為文句事前機率的估測。我們於中文廣播新聞語料上進行實驗與分析,經由初步的結果證明所提出的方法較其它常見方法可達到更好的摘要結果。
Huge quantities of multimedia contents including audio and video are continuously growing and filling networks and our lives. Speech information is one of the most important sources for multimedia contents, and usually represents the concepts and topics. Hence, in the recent past, several attempts have been made to investigate the possibility of understanding and organization of multimedia content using speech, and substantial efforts and very encouraging results on spoken document transcription, retrieval and summarization have been reported. Spoken document summarization can be either extractive or abstractive. Extractive summarization selects indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio and sequences them to form a summary. Abstractive summarization, on the other hand, produces a concise abstract of a certain length that reflects the key concepts of the document. The latter is more difficult to achieve, thus recent research has focused on the former. In this thesis, we consider extractive summarization of Chinese broadcast news speech. An unified probabilistic generative framework that seamlessly combined the sentence generative probability and the sentence prior probability for sentence ranking was proposed. Each sentence of the spoken documents to be summarized was treated as a probabilistic generative model for predicting the document. To achieve this goal, two alternative approaches, i.e., the hidden Markov model (HMM) that was integrated with the relevance model (RM), and the word topical mixture model (TMM- ), were extensively investigated. On the other hand, the confidence measure and a set of prosodic features were exploited for modeling the sentence prior probability. The summarization capabilities of the proposed approaches were verified by comparison with the other conventional summarization ones. The experiments were performed on the Chinese broadcast news collected in Taiwan. Very promising and encouraging results were initially obtained.
Huge quantities of multimedia contents including audio and video are continuously growing and filling networks and our lives. Speech information is one of the most important sources for multimedia contents, and usually represents the concepts and topics. Hence, in the recent past, several attempts have been made to investigate the possibility of understanding and organization of multimedia content using speech, and substantial efforts and very encouraging results on spoken document transcription, retrieval and summarization have been reported. Spoken document summarization can be either extractive or abstractive. Extractive summarization selects indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio and sequences them to form a summary. Abstractive summarization, on the other hand, produces a concise abstract of a certain length that reflects the key concepts of the document. The latter is more difficult to achieve, thus recent research has focused on the former. In this thesis, we consider extractive summarization of Chinese broadcast news speech. An unified probabilistic generative framework that seamlessly combined the sentence generative probability and the sentence prior probability for sentence ranking was proposed. Each sentence of the spoken documents to be summarized was treated as a probabilistic generative model for predicting the document. To achieve this goal, two alternative approaches, i.e., the hidden Markov model (HMM) that was integrated with the relevance model (RM), and the word topical mixture model (TMM- ), were extensively investigated. On the other hand, the confidence measure and a set of prosodic features were exploited for modeling the sentence prior probability. The summarization capabilities of the proposed approaches were verified by comparison with the other conventional summarization ones. The experiments were performed on the Chinese broadcast news collected in Taiwan. Very promising and encouraging results were initially obtained.
Description
Keywords
語音文件, 摘錄式摘要, 隱藏式馬可夫模型, 關聯性模型, 詞層次主題混合模型, spoken documents, extractive summarization, hidden Markov model, relevance model, word topical mixture model