以答案驗證方法為基礎之生醫相關問答系統

dc.contributor侯文娟zh_TW
dc.contributorHOU,Wen-Juanen_US
dc.contributor.author蔡秉翰zh_TW
dc.contributor.authorCAI,Bing-Hanen_US
dc.date.accessioned2019-09-05T11:17:01Z
dc.date.available2015-8-7
dc.date.available2019-09-05T11:17:01Z
dc.date.issued2013
dc.description.abstract本論文的研究,是以阿茲海默症為主題,實現一個問答系統來回答問題。目的在於能夠讀取一篇測試文章,回答相關文章的問題,正確理解測試問題的涵意,並擷取文章中相關字句資訊作評分計算,從中得到正確的答案,達成一個高精準度的問答系統。 本論文的測試資料共包含了四個主題為阿茲海默症的測試資料集,每個測試集包含一篇測試文章、10個關於該文章的測試問題,每個問題都有五個選項供選擇,問題答案皆為單選題。另外使用到背景知識庫,資料來源包含從Pubmed Central得到關於阿茲海默症的醫學文獻資料庫(Medical Literature Analysis and Retrieval System Online, Medline)的文章,以及美國麻薩諸塞州的阿茲海默症研究中心(Massachusetts Alzheimer’s Disease Research Center)所提供關於阿茲海默症的生物文章及摘要。我們也從線上人類孟德爾遺傳學(Online Mendelian Inheritance in Man, OMIM)的網站針對阿茲海默症作為關鍵字,擷取此疾病的相對應基因名稱,再利用連結內文來建立基因關係。 此研究首先以人類回答選擇題時最常使用的方式為模式:當接收到一個問題,會先閱讀並搜尋文章中與問題相關的句子尋求解答,接著再觀察答案選項與這些句子何者最相似、相關,最後回答覺得最可信的答案。再來我們嘗試答案驗證的方法,將問題與該問題對應可能的答案選項預先結合產生出假設(Hypothesis),再利用這些假設到文章裡閱讀並搜尋相關的句子尋求解答,相關的句子根據假設中相符的字來找尋並且用TFIDF的方法給予評分。而根據假設得到分數越高的句子就代表與該篇測試文章的主題內容越相符。最後再根據這些句子的分數給予每個假設評分,最高分的假設代表該假設所包含的答案選項為最後回答中覺得最可信的答案。在研究中分為以字為單位以及以詞彙為單位來進行實驗。此外,研究中另外使用背景知識庫以及OMIM網站取得的資源來達成詞語擴充的方法。 最後,我們將所有方法的組合進行23種實驗,前幾個實驗方法因為忽略答案選項中重要的資訊而使準確率大約只有一到兩成。再來我們改以答案驗證方法實驗,準確率就得到了高度的提升。之後加上詞彙的輔助、重要語句挑選以及字詞擴充,並分析評估這些方法如何使用及其影響,慢慢就達成準確率的上升,最後甚至能夠提升到五成左右,與使用相同測試資料的眾多研究相比較,此結果為不錯的成果。zh_TW
dc.description.abstractIn our study, we use Alzheimer’s disease as a subject to implement a question answering system. The purpose of the thesis is to read a document and identify the answers to a set of document-related questions. We try to realize the meaning of the questions and extract related sentences from document. How to get the correct answer and achieve a high-precision question answering system is our goal. The test set is composed of 4 reading tests. Each reading test consists of one document, with 10 questions and a set of five choices per question. There always is one and only one correct option. We also use background collections from the articles of Medical Literature Analysis and Retrieval System Online, called Medline, and Massachusetts Alzheimer’s Disease Research Center. Besides, we reference to the website named “Online Mendelian Inheritance in Man, OMIM” and use “Alzheimer” as a key word to extract the gene names, then we use the content to build gene-gene relations. First, our system is similar to the scenario of human’s answering a multiple choice question. When we receive a question, we will read and retrieve sentences from document which may be related to the question. After that, we read all the choices to choose the one most similar to the related sentences. Second, we use the method of “Answer Validation” to combine the question part and answer part ashypothesis, and find answers in the document according to the hypothesis. Relevant sentences are retrieved from the associated document based on TFIDF of the matching words. The higher score the hypothesis gets, the more consistent of the subject matches in test document. Finally, we compute every hypothesis’ score based on the weight of related sentences. The hypothesis which gets the highest score is the most confident answer at last. This study divides in words as well as phrases as a unit to carry out experiments. In addition, we use background collections and OMIM terms as other resources to implement query expanded methods. We consist of all the 23 kinds of methods as results in our experiment. The accuracy of the first few experiments is only about ten to twenty percent because of our ignoring important information in the answer options. Then we use the method of Answer Validation and get higher accuracy. After that we add the assistance of phrases, top related sentence choosing and query expansion. Also, we try to evaluate these experiments and their impact. Gradually, the accuracy rises again, and approaching about fifty percent. It shows a pretty good result comparing to the other researches which use the same test set as our study.en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifierGN060047039S
dc.identifier.urihttp://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN060047039S%22.&%22.id.&
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106553
dc.language中文
dc.subject答案驗證zh_TW
dc.subject機器閱讀問答系統評估zh_TW
dc.subject跨語言評估會議zh_TW
dc.subject字詞擴充zh_TW
dc.subject線上人類孟德爾遺傳學zh_TW
dc.subject阿茲海默zh_TW
dc.subjectAnswer validationen_US
dc.subjectQA4MREen_US
dc.subjectCLEFen_US
dc.subjectQuery expansionen_US
dc.subjectOMIMen_US
dc.subjectAlzheimeren_US
dc.title以答案驗證方法為基礎之生醫相關問答系統zh_TW
dc.titleBiomedical Related Question Answering System Based on Answer Validation Approachen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
n060047039s01.pdf
Size:
1.58 MB
Format:
Adobe Portable Document Format

Collections