應用語句關係網路計算語句向心性之新聞事件摘要方法

葉鎮源; 楊維邦; 柯皓仁; 鄭培成

應用語句關係網路計算語句向心性之新聞事件摘要方法

dc.contributor	國立臺灣師範大學圖書資訊學研究所	zh_TW
dc.contributor.author	葉鎮源	zh_TW
dc.contributor.author	楊維邦	zh_TW
dc.contributor.author	柯皓仁	zh_TW
dc.contributor.author	鄭培成	zh_TW
dc.date.accessioned	2015-09-03T01:11:17Z
dc.date.available	2015-09-03T01:11:17Z
dc.date.issued	2014-07-01
dc.description.abstract	摘錄式摘要技術的核心在於評估語句的摘要代表性，藉以排序語句作為摘錄語句時的依據。本研究將語句視為節點，藉由語句相似度來決定節點間是否存在連結，依此建構出語句關係網路模型。接著，衡量節點在網路中的重要性或對於其他相連節點的影響性，提出：(1) Degree Centrality、(2) Normalized Similarity-based Degree Centrality、(3) HITS Centrality、(4) PageRank Centrality，及(5) iSpreadRank Centrality 的節點向心性分析；並以語句向心性作為語句的摘要代表性，藉此達到排序語句的目的。最後，導入CSIS（Cross-Sentence Information Sub-sumption）過濾重複性資訊，依序擷取語句組成摘要。實驗使用DUC 2004 資料集來驗證上述摘要方法的可行性。在ROUGE-1 的指標下，結合不同語句向心性之摘要效能依序是：iSpreadRank > Normalized Similarity-based Degree > PageRank > Degree > HITS。整體而言，實驗得知應用語句關係網路計算語句向心性之摘要方法確實可行。	zh_TW
dc.description.abstract	Purpose: One widely-adopted summarization paradigm, sentence extraction, aims at extracting important sentences and composing them into a summary. The foundation towards sentence extraction is to assess importance of sentences in the summary so as to rank sentences for extraction. This paper employs graph-based text analysis to model documents and investigates measures of graph-based centrality as sentence salience in summarization. Design/methodology/approach: This paper models documents on the same (or related) topic as a sentence similarity network, in which a sentence is regarded as a node and relationship between sentences only exists if they are semantically related. Severalmethods for evaluating the importance of a node (i.e., a sentence) in the network are then proposed, namely: (1) Degree Centrality; (2) Normalized Similarity-based Degree Centrality; (3) HITS Centrality; (4) PageRank Centrality; and (5) iSpreadRank Centrality. All are designed on the basis of the idea that the importance of a node is determined not only by the number of nodes to which it connects, but also by the importance of its connected nodes. As to summary generation, CSIS (Cross-Sentence Information Sub-sumption) is employed for anti-redundancy while extracting sentences according to the sentence ranking produced based on the centrality of sentences. Findings: The proposed summarization method was evaluated using the ROUGE evaluation suite on the DUC 2004 news stories collection. Experimental results show that, while considering the ROUGE-1 metric, the performance ranking is: iSpreadRank > Normalized Similarity-base Degree > PageRank > Degree > HITS. Another experiment, conducted to combine sentence centrality with surface-level features, also presents competitive results, compared with the best participant in the DUC 2004 evaluation. Research limitations/implications: Directions for future research would be: (1) instead of symbolic-level analysis, to take into account semantics, such as synonymy, polysemy, and term dependency, while determining if two sentences are semantically related; (2) to investigate graph-based centrality developed in social network analysis for evaluating sentence salience in summarization; (3) to improve the cohesion andcoherence of summaries using natural language processing techniques, such as sentence planning and generation. Practical implications: The proposed summarization method is in an unsupervised manner; thus no training dataset is required. Since no domain-specific knowledge or deep linguistic analysis is exploited, the method is domain- and language-independent. However, it might lead to poor understanding of the input texts and would probably produces poor summaries, due to neither deep analysis of natural language processing performed, discourse structure considered, nor domain-specific knowledge involved in the process of summarization, Originality/value: The contributions of this work are threefold. First, this paper offers a sentence similarity network to model topic-related documents. Second, novelgraph-based sentence ranking methods are explored to rank the importance of sentences for extraction. Finally, the proposed method had been proven successful in a case study with the DUC 2004 benchmark dataset.	en_US
dc.identifier	ntnulib_tp_A1204_01_022
dc.identifier.issn	1608-5752
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/74315
dc.language	zh_TW
dc.publisher	國立清華大學科技管理研究所	zh_TW
dc.relation	資訊管理學報，21（3），271-304。	zh_TW
dc.subject.other	多文件摘要	zh_TW
dc.subject.other	摘錄式摘要	zh_TW
dc.subject.other	語句關係網路	zh_TW
dc.subject.other	網路節點向心性	zh_TW
dc.subject.other	語句排序	zh_TW
dc.subject.other	Multidocument summarization	en_US
dc.subject.other	Extraction-based summarization	en_US
dc.subject.other	Sentence similarity network	en_US
dc.subject.other	Network-based sentence centrality	en_US
dc.subject.other	Sentence ranking	en_US
dc.title	應用語句關係網路計算語句向心性之新聞事件摘要方法	zh_TW
dc.title	Extraction-based News Summarization Using Sentence Centrality in the Sentence Similarity Network	en_US

Collections

教師著作

應用語句關係網路計算語句向心性之新聞事件摘要方法

Files

Collections