應用語句關係網路計算語句向心性之新聞事件摘要方法

dc.contributor國立臺灣師範大學圖書資訊學研究所zh_TW
dc.contributor.author葉鎮源zh_TW
dc.contributor.author楊維邦zh_TW
dc.contributor.author柯皓仁zh_TW
dc.contributor.author鄭培成zh_TW
dc.date.accessioned2015-09-03T01:11:17Z
dc.date.available2015-09-03T01:11:17Z
dc.date.issued2014-07-01
dc.description.abstract摘錄式摘要技術的核心在於評估語句的摘要代表性,藉以排序語句作為摘錄 語句時的依據。本研究將語句視為節點,藉由語句相似度來決定節點間是否存在 連結,依此建構出語句關係網路模型。接著,衡量節點在網路中的重要性或對於 其他相連節點的影響性,提出:(1) Degree Centrality、(2) Normalized Similarity-based Degree Centrality、(3) HITS Centrality、(4) PageRank Centrality,及(5) iSpreadRank Centrality 的節點向心性分析;並以語句向心性作為語句的摘要代表性,藉此達到 排序語句的目的。最後,導入CSIS(Cross-Sentence Information Sub-sumption)過 濾重複性資訊,依序擷取語句組成摘要。實驗使用DUC 2004 資料集來驗證上述摘 要方法的可行性。在ROUGE-1 的指標下,結合不同語句向心性之摘要效能依序 是:iSpreadRank > Normalized Similarity-based Degree > PageRank > Degree > HITS。整體而言,實驗得知應用語句關係網路計算語句向心性之摘要方法確實可行。zh_TW
dc.description.abstractPurpose: One widely-adopted summarization paradigm, sentence extraction, aims at extracting important sentences and composing them into a summary. The foundation towards sentence extraction is to assess importance of sentences in the summary so as to rank sentences for extraction. This paper employs graph-based text analysis to model documents and investigates measures of graph-based centrality as sentence salience in summarization. Design/methodology/approach: This paper models documents on the same (or related) topic as a sentence similarity network, in which a sentence is regarded as a node and relationship between sentences only exists if they are semantically related. Severalmethods for evaluating the importance of a node (i.e., a sentence) in the network are then proposed, namely: (1) Degree Centrality; (2) Normalized Similarity-based Degree Centrality; (3) HITS Centrality; (4) PageRank Centrality; and (5) iSpreadRank Centrality. All are designed on the basis of the idea that the importance of a node is determined not only by the number of nodes to which it connects, but also by the importance of its connected nodes. As to summary generation, CSIS (Cross-Sentence Information Sub-sumption) is employed for anti-redundancy while extracting sentences according to the sentence ranking produced based on the centrality of sentences. Findings: The proposed summarization method was evaluated using the ROUGE evaluation suite on the DUC 2004 news stories collection. Experimental results show that, while considering the ROUGE-1 metric, the performance ranking is: iSpreadRank > Normalized Similarity-base Degree > PageRank > Degree > HITS. Another experiment, conducted to combine sentence centrality with surface-level features, also presents competitive results, compared with the best participant in the DUC 2004 evaluation. Research limitations/implications: Directions for future research would be: (1) instead of symbolic-level analysis, to take into account semantics, such as synonymy, polysemy, and term dependency, while determining if two sentences are semantically related; (2) to investigate graph-based centrality developed in social network analysis for evaluating sentence salience in summarization; (3) to improve the cohesion andcoherence of summaries using natural language processing techniques, such as sentence planning and generation. Practical implications: The proposed summarization method is in an unsupervised manner; thus no training dataset is required. Since no domain-specific knowledge or deep linguistic analysis is exploited, the method is domain- and language-independent. However, it might lead to poor understanding of the input texts and would probably produces poor summaries, due to neither deep analysis of natural language processing performed, discourse structure considered, nor domain-specific knowledge involved in the process of summarization, Originality/value: The contributions of this work are threefold. First, this paper offers a sentence similarity network to model topic-related documents. Second, novelgraph-based sentence ranking methods are explored to rank the importance of sentences for extraction. Finally, the proposed method had been proven successful in a case study with the DUC 2004 benchmark dataset.en_US
dc.identifierntnulib_tp_A1204_01_022
dc.identifier.issn1608-5752
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/74315
dc.languagezh_TW
dc.publisher國立清華大學科技管理研究所zh_TW
dc.relation資訊管理學報,21(3),271-304。zh_TW
dc.subject.other多文件摘要zh_TW
dc.subject.other摘錄式摘要zh_TW
dc.subject.other語句關係網路zh_TW
dc.subject.other網路節點向心性zh_TW
dc.subject.other語句排序zh_TW
dc.subject.otherMultidocument summarizationen_US
dc.subject.otherExtraction-based summarizationen_US
dc.subject.otherSentence similarity networken_US
dc.subject.otherNetwork-based sentence centralityen_US
dc.subject.otherSentence rankingen_US
dc.title應用語句關係網路計算語句向心性之新聞事件摘要方法zh_TW
dc.titleExtraction-based News Summarization Using Sentence Centrality in the Sentence Similarity Networken_US

Files

Collections