會議語音辨識之上下文語言模型 Reranking 研究

王泓壬; Wang, Hung-Ren

會議語音辨識之上下文語言模型 Reranking 研究

Files

202300044066-106364.pdf (2.17 MB)

Date

2023

Authors

王泓壬

Wang, Hung-Ren

Abstract

ASR N-Best Reranking是自動語音識別(ASR)系統中用於提高轉錄輸出準確性的一種技術。在ASR系統中，系統為輸入音頻片段生成多個後選假設，稱為N-Best列表。而BERT (Bidirectional Encoder Representations from Transformers)是一種先進的語言模型，在文本分類、命名實體識別和問題解答等各種自然語言處理(NLP)任務中表現出卓越的性能。由於BERT能夠捕捉上下文信息並生成高品質的輸入文本表示，因此被用於ASR N-Best Reranking。為了更進一步增強BERT模型的預測，我們探索了增強語意信息與訓練目標，大致分為四部分: (1)將文本文法優劣信息融入到模型中的有效方法;(2)間接將整個N-Best列表信息融入到模型中的有效方法;(3)探討分類、排序及多任務訓練目標於模型訓練的可行性;(4)強化模型提取的文本信息。大型生成式語言模型(LLMs)已經證明了其在各種語言相關任務中的卓越泛化能力。本研究我們評估利用LLMs如ChatGPT於ASR N-Best Reranking任務的可行性。我們在AMI會議語料庫進行一系列的實驗，實驗結果顯示在降低單詞錯誤率(WER %)，提出的方法有其有效性，與基本ASR系統比較最多可達到1.37%的絕對WER (%)下降。
ASR (Automatic Speech Recognition) N-Best reranking is a task that aims to improve the accuracy of ASR systems by re-ranking the output of the ASR system, known as N-Best lists. The N-Best reranking task involves selecting the most likely transcription from the N-Best list based on additional contextual information. BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art language model that has shown remarkable performance in various natural language processing (NLP) tasks. BERT is being used in ASR N-Best reranking due to its ability to capture contextual information and generate high-quality representations of input text. We explore the enhancement of semantic information and training objectives, which are broadly divided into four parts: (1) effective methods to incorporate text grammatical strength and weakness information into the model; (2) effective methods to indirectly incorporate the whole N-Best list information into the model; (3) exploring the feasibility of categorization, sorting, and multitask training objectives for the model training; and (4) enhancement of textual information extracted by the model. Large-scale generative language models (LLMs) have demonstrated their excellent generalization ability in various language-related tasks. In this study we evaluate to utilize the excellent generalization ability of LLMs in ASR N-Best Reranking task.We conduct a series of experiments on AMI meeting corpus and the experimental results show the effectiveness of the proposed method in reducing the Word Error Rate (1.37 %).

Keywords

自動語音辨識, 語言模型, 對話語音, N-Best 列表, 列表資訊, 重新排序, 跨句資訊, 大型生成式語言模型, ChatGPT, Automatic Speech Recognition, Language Modeling, Conversational Speech, N-Best Lists, List Information, Large Generative Language Models, ChatGPT

URI

https://etds.lib.ntnu.edu.tw/thesis/detail/3199aeaac3f4fb62bb17b7436150f2f6/
http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/121640

Collections

學位論文

Full item page

會議語音辨識之上下文語言模型 Reranking 研究

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By