以FP-tree結構為基礎之近似常見項目集探勘法
No Thumbnail Available
Date
2009
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
本論文針對交易資料庫運用FP-tree結構可壓縮儲存交易資料的特性,提出以FP-tree儲存結構為基礎之近似常見項目集探勘法,稱為 FP-AFI演算法( FP-tree Approximate Frequent Itemsets mining algorithm )。透過分析容錯包含項目集之交易資料集合間的遞迴關係,擴展FP-tree執行投影的方法,可分別找出包含某個項目與不包含某個項目的conditional FP-tree。FP-AFI演算法以深先搜尋的方式,從Header Table中取出符合核心樣式門檻值的項目產生候選項目集,系統化地建構出其對應的conditional FP-tree,並透過conditional FP-tree根節點所記錄的計數值及conditional FP-tree的編碼資訊,快速獲得該項目集的容錯支持度及各項目支持度,以確認是否為一近似常見項目集。在探勘的過程中僅需掃瞄資料庫兩次,可省去大量讀取交易資料所需的時間。由實驗結果顯示,當最小支持度門檻值訂為較小或交易資料的筆數較多時,此方法較之前已提出的近似常見項目集探勘演算法FT-Apriori及AFI有顯著的執行效率增進。
In this thesis, an algorithm called FP-AFI( FP-tree Approximate Frequent Itemsets mining ) is proposed for mining approximate frequent itemsets. The FP-AFI applies the characteristics of FP-tree structure to compress transaction data. Through analyzing the recursive relationship of the sets of transactions which fault tolerant contain an itemset, the projection operation on FP-tree is extended to obtain the conditional FP-tree which contains a specific item or not. The FP-AFI algorithm applies a depth-first search strategy to generate candidate itemsets by checking the threshold value of a core pattern from the item supports counted in the Header Table. For each candidate itemset, its corresponding fault-tolerant conditional FP-trees are constructed systematically. Accordingly, the approximate support of the candidate and the item supports of each item in the candidate are obtained easily from the counter stored in the root node and the coding vectors of the fault-tolerant conditional FP-trees. Such that, a candidate itemset is confirmed to be an approximate frequent itemset or not efficiently. The experimental results show that, when there are many transactions or the support is small, the performance efficiency of FP-AFI algorithm is better than the FT-Apriori and AFI algorithms proposed previously.
In this thesis, an algorithm called FP-AFI( FP-tree Approximate Frequent Itemsets mining ) is proposed for mining approximate frequent itemsets. The FP-AFI applies the characteristics of FP-tree structure to compress transaction data. Through analyzing the recursive relationship of the sets of transactions which fault tolerant contain an itemset, the projection operation on FP-tree is extended to obtain the conditional FP-tree which contains a specific item or not. The FP-AFI algorithm applies a depth-first search strategy to generate candidate itemsets by checking the threshold value of a core pattern from the item supports counted in the Header Table. For each candidate itemset, its corresponding fault-tolerant conditional FP-trees are constructed systematically. Accordingly, the approximate support of the candidate and the item supports of each item in the candidate are obtained easily from the counter stored in the root node and the coding vectors of the fault-tolerant conditional FP-trees. Such that, a candidate itemset is confirmed to be an approximate frequent itemset or not efficiently. The experimental results show that, when there are many transactions or the support is small, the performance efficiency of FP-AFI algorithm is better than the FT-Apriori and AFI algorithms proposed previously.
Description
Keywords
FP-tree, 探勘, 常見項目集, 近似常見項目集, 誤差容忍參數, 核心樣式參數, mining, FP-growth, approximate frequent itemset, FP-tree