以字詞類別概念輔助部落格文件分群之研究

dc.contributor柯佳伶zh_TW
dc.contributorJia-Ling Kohen_US
dc.contributor.author范喬彬zh_TW
dc.contributor.authorChou-Bin Fanen_US
dc.date.accessioned2019-09-05T11:35:24Z
dc.date.available2012-8-16
dc.date.available2019-09-05T11:35:24Z
dc.date.issued2010
dc.description.abstract本論文研究使用ODP (Open Directory Project)目錄結構做為外部知識來源,透過ODP的查詢功能得到字詞的所屬類別作為特徵,結合文章中所有字詞所屬的類別及比重值來建構出特徵向量,希望改進單純以關鍵字擷取建立特徵向量的缺點,進而達到較好的主題式文章分群效果。此外,每個部落格中文章內容主題的集中度不同,在以K-Means演算法進行分群時,經常遇到的問題是不知道如何設定適當的聚落數目K值,本論文研究亦提出根據文章集合中各文章的特徵向量自動決定K-Means演算法的聚落數目及初始代表點,使部落格文章分群能更自動化。 我們將類別特徵向量法與字詞特徵向量法分別套用在文章分群實驗上,並將分群結果以Accuracy及Purity值進行評估,評估結果顯示類別特徵向量法在測試集中大多數的部落格皆能得到比字詞特徵向量法更好的分群結果。此外,實驗顯示結合文章的標題詞與複合詞類別特徵向量可進一步提升文章分群的效果。zh_TW
dc.description.abstractOur approach uses ODP (Open Directory Project) directory structure as the external knowledge. Through the query function of ODP, we can get categories of query word, and we set those categories as word feature. To build category feature vector of post, we merging all of categories of post words and corresponding weight of words. We hope to improve the drawback of using keyword frequency to build feature vector, and achieve better topic based clustering result. We propose a method to assist the decision of K value in K-means algorithm. We take the category relation between each posts of a blog into consideration which makes clustering more automation. We compare the clustering result of our approach with term based feature vector in Purity and Accuracy measure. The experiments show that our approach is better than term based feature vector approach. We also combine the title and phrase of a post as other feature vectors, and prove these two features can assist clustering effectively.en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifierGN0697470509
dc.identifier.urihttp://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN0697470509%22.&%22.id.&
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106796
dc.language中文
dc.subject資料探勘zh_TW
dc.subject部落格文章分群zh_TW
dc.subject類別特徵向量zh_TW
dc.subjectData Miningen_US
dc.subjectBlog Post Clusteringen_US
dc.subjectCategory Feature Vectoren_US
dc.title以字詞類別概念輔助部落格文件分群之研究zh_TW
dc.titleAn Effective Approach for Weblog Documents Clustering based on Categorical Concepts of Wordsen_US

Files

Collections