應用兩階段生成模型於會議摘要之研究

黃怡萍; Huang, Yi-Ping

應用兩階段生成模型於會議摘要之研究

dc.contributor	陳柏琳	zh_TW
dc.contributor	Chen, Berlin	en_US
dc.contributor.author	黃怡萍	zh_TW
dc.contributor.author	Huang, Yi-Ping	en_US
dc.date.accessioned	2023-12-08T08:02:51Z
dc.date.available	2023-08-15
dc.date.available	2023-12-08T08:02:51Z
dc.date.issued	2023
dc.description.abstract	近年來，由於疫情的影響和遠端工作的普及，線上會議和視訊交流平台的使用變得更加廣泛。但隨之而來的問題是，會議記錄往往包含許多分散的資訊，要在大量的對話中擷取和理解關鍵資訊是困難的，且隨著會議越來越頻繁，意味著參與者需要在有限的時間內掌握會議的要點，以便在忙碌的日程中做出明智的決策。在這樣的情境下，能夠從會議紀錄中自動辨識和摘要出關鍵資訊的技術變得更為重要。自動文件摘要主要分為擷取式 (Extractive) 和重寫式 (Abstractive) 兩種方法，擷取式摘要透過計算原始文件中每個句子的重要性分數，選擇得分高的句子並將它們組合起來成為摘要。重寫式摘要透過對原始文件的理解重新改寫句子，生成出一個簡潔且包含原始文件中核心內容的摘要。由於對話中的話語經常是不流暢且資訊分散的，使用擷取式摘要容易擷取出不完整的句子，造成可讀性不高。目前在會議摘要任務中，主要的應用是能夠將原始語句改寫的重寫式摘要。雖然已有許多相關的研究被提出，重寫式的方法應用在會議摘要中仍面臨幾個普遍性的限制，包括輸入長度問題、複雜的對話結構，以及缺乏訓練資料與事實不一致，而這些問題也是提高會議摘要模型效能的關鍵。本論文專注在「輸入長度問題」和「對話式結構」的研究，提出了一個先擷取後生成的會議摘要模型架構，在擷取階段設計了三種方法來選擇重要的文本片段，分別是異質圖神經網路模型、對話語篇剖析和文本相似度。在生成階段使用先進的生成式預訓練模型。實驗結果顯示，提出的方法透過微調基線模型，可以達到效果提升。	zh_TW
dc.description.abstract	In recent years, the use of online meetings and video communication platforms has become more widespread due to the impact of the pandemic and the popularity of remote work. However, this trend brings along certain challenges. Meeting transcripts often contain scattered information, making it difficult to extract and understand key details from a large volume of conversations. Additionally, as meetings become increasingly frequent, participants need to grasp the main points of the discussions within limited time to make informed decisions amidst their busy schedules. In such a context, the ability to automatically identify and summarize crucial information from meeting transcripts becomes even more important.Automatic document summarization can be categorized into two main approaches: extractive and abstractive. Extractive summarization calculates the importance scores of each sentence in the original document and selects high-scoring sentences to form the summary. On the other hand, abstractive summarization involves understanding the original document and rewriting sentences to generate a concise summary that captures the core content. Extractive summarization is prone to extracting incomplete sentences due to the often disjointed and scattered nature of dialogues, leading to reduced readability. Currently, the primary application in meeting summarization tasks is abstractive summarization, which involves rewriting the original sentences. Despite the numerous related studies, the application of abstractive methods in meeting summarization still faces several common limitations, including input length constraints, complex dialogue structures, the lack of training data, and consistency with facts. Addressing these issues is crucial for improving the performance of meeting summarization models.This paper focuses on the research of"input length constraints" and "dialogue- style structures" and proposes a meeting summarization model architecture that follows an extract-then-generate approach. In the extraction phase, three methods are designed to select important text segments: heterogeneous graph neural network model, dialogue discourse parsing, and cosine similarity. Advanced generative pre-training model are employed in the generation phase. Experimental results demonstrate that the proposed approach, through fine-tuning the baseline model, achieves performance improvements.	en_US
dc.description.sponsorship	資訊工程學系	zh_TW
dc.identifier	61047089S-44274
dc.identifier.uri	https://etds.lib.ntnu.edu.tw/thesis/detail/f42cf161bf7f887ed417ee1dee285caf/
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/121639
dc.language	中文
dc.subject	會議摘要	zh_TW
dc.subject	自動文件摘要	zh_TW
dc.subject	自然語言處理	zh_TW
dc.subject	異質圖神經網路	zh_TW
dc.subject	對話語篇剖析	zh_TW
dc.subject	生成式模型	zh_TW
dc.subject	Meeting Summarization	en_US
dc.subject	Automatic Document Summarization	en_US
dc.subject	Natural Language Processing	en_US
dc.subject	Heterogeneous Graph Neural Network	en_US
dc.subject	Dialogue Discourse Parsing	en_US
dc.subject	Generative Model	en_US
dc.title	應用兩階段生成模型於會議摘要之研究	zh_TW
dc.title	A Study of Extract-then-Generate Model for Meeting Summarization	en_US
dc.type	etd

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 202300044274-106584.pdf
Size:: 2.32 MB
Format:: Adobe Portable Document Format
Description:: etd

Download

Collections

學位論文