以樹首遞迴投影探勘常見子樹結構之研究

No Thumbnail Available

Date

2004

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

  本論文針對由具項目標示有序樹結構的資料庫中探勘常見子樹的問題,提出以樹首遞迴投影方式的方法,稱為Tree-Projection演算法,可在探勘過程避免產生不必要之候選子樹。此外,我們擴展Tree-Projection演算法,根據使用者所給定的初始最小支持度門檻值及K值,探勘出資料庫內前K個具最大支持度計數值的封閉常見子樹。依選取常見子樹為樹首做投影的探勘順序不同,及探勘過程中最小支持度門檻值提高與否,發展出DFS、DFS-TC、及SS-TC三種Tree-Projection演算法的變型演算法。由實驗結果顯示,在資料庫特定大小的範圍內,Tree-Projection演算法執行效率不易受最小支持度門檻值及資料產生參數值設定的影響,且效率皆優於之前已提出的常見子樹探勘演算法(TreeMiner演算法) 。此外,以支持度計數值由大而小選取常見子樹為樹首做投影,並結合在探勘過程中提高最小支持度門檻值的SS-TC法,在K遠小於所有封閉常見子樹的情況下,可大量減少資料庫投影次數,大幅提昇Tree-Projection演算法探勘前K個具最大支持度計數值的封閉常見子樹之執行效率。
  A new approach, called Tree-Projection algorithm, is proposed in this thesis for mining embedded frequent sub-trees in a forest of rooted, labeled, and ordered trees. This approach greatly reduces the efforts of generating candidate sub-trees. Moreover, Tree-Projection algorithm is extended to solve an additional problem: mining top-K frequent closed sub-trees, where K is the desired number of closed sub-trees to be mined specified by users. According to the various orders of performing tree-projec-tions and whether the dynamic minimum-support raising strategy is applied, three va-rieties of Tree-Projection algorithm are designed: they are named DFS, DFS-TC, and SS-TC, respectively. The experimental results show that, within a limited database size, the increment of runtime for Tree-Projection algorithm keeps stable under the various minimum-supports setting at run-time and arguments setting of data sets. In most cases, Tree-Projection algorithm outperforms the existing TreeMiner algorithm in terms of the execution time. Furthermore, the SS-TC variety of Tree-Projection al-gorithm adopts support descending order to perform tree-projections and raises minimum-support dynamically. When K is small proportional to the number of all closed sub-trees, the SS-TC variety could discover the top-K frequent closed sub-trees by pruning the unnecessary tree-projections dramatically and provide excellent mining efficiency.

Description

Keywords

具項目標示有序樹, 常見子樹, 封閉常見子樹, rooted, labeled, and ordered trees, frequent sub-trees, closed sub-trees

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By