近似探勘資料流中最近重覆樣式方法之研究

周蓓旻

近似探勘資料流中最近重覆樣式方法之研究

dc.contributor	柯佳伶博士	zh_TW
dc.contributor.author	周蓓旻	zh_TW
dc.date.accessioned	2019-08-29T07:56:13Z
dc.date.available	2007-8-14
dc.date.available	2019-08-29T07:56:13Z
dc.date.issued	2006
dc.description.abstract	重覆樣式可以顯示資料項出現的前後關聯性，應用於資料摘要與資訊預測的依據。愈來愈多的應用之資料輸入方式形成資料流型態，傳統對靜態資料庫探勘重覆樣式的探勘方法已不適用。此外，在資料流的動態環境下，若從整個歷史資料序列中探勘出重覆樣式，則無法反應資料流中的最新趨勢。因此，本論文提出有效率偵測動態資料流中的最近重覆樣式的兩個演算法，分別稱為出現位元序列漸進探勘法及保留樣式估算法。出現位元序列漸進探勘法運用出現位元序列表示法計算出資料樣式的出現次數，並保留最大重覆樣式的出現位元序列資訊。當最近視窗序列內容改變，將運用所記錄之最大重覆樣式的出現位元序列方式資訊進行新重覆樣式之漸進探勘，以減少探勘計算成本。保留樣式估算法則保留重覆樣式、潛在候選樣式、及2-資訊樣式，並運用分段計算方式記錄資料樣式最近出現次數，架構一個可有效率存取樣式的保留樣式資料結構，由最大字首子樣式及最大字尾子樣式估算出未保留樣式的出現次數，達到近似探勘出所有資料流中最近視窗內最近重覆樣式的方式。實驗結果顯示出現位元序列漸進探勘法可有效率的正確探勘出最近視窗序列中的最近重覆樣式，而保留樣式估算法則可以更快速的近似探勘出資料流中的最近重覆樣式。	zh_TW
dc.description.abstract	Repeating patterns represent temporal relations among data items, which could be used for data summarization and data prediction. More and more data of various applications is generated as a data stream. Accordingly, the traditional strategies for mining repeating patterns on static database are not suitable in a data stream environment. Besides, in the dynamic environment of a data stream, mining the repeating patterns from the whole history data sequence does not extract the newest trend of patterns in the data stream. For this reason, two algorithms for efficiently mining recently repeating patterns in a data stream are proposed in this thesis. One is named the appearing-bit-sequence-based incremental mining algorithm and the other one is named the basic-patterns estimating-based algorithm. The incremental mining approach applies appearing bit sequences to compute the frequencies of data patterns efficiently within the sliding window. By maintaining the appearing bit sequences of maximal repeating patterns, the newly generated recently repeating patterns are mined from the maintained information to reduce processing cost when the window slides. The estimating-based method maintains the repeating patterns, potential repeating patterns, and 2-item patterns, a partition-based scheme is used to count the frequencies of patterns. By constructing a data structure to support efficiently access of remained patterns, the frequency of an unretained pattern is estimated according to the frequencies of its maximum prefix-subpattern and suffix-subpattern. The experimental results show that the incremental mining method is an efficient way for mining recently repeating patterns correctly. And the estimating-based method provides a even more faster way to discover recently repeating patterns from a data stream approximately.	en_US
dc.description.sponsorship	資訊教育研究所	zh_TW
dc.identifier	GN0693080344
dc.identifier.uri	http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN0693080344%22.&%22.id.&
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/92921
dc.language	中文
dc.subject	重覆樣式	zh_TW
dc.subject	資料流	zh_TW
dc.subject	資料探勘	zh_TW
dc.subject	Repeating pattern	en_US
dc.subject	Data Stream	en_US
dc.subject	Data Mining	en_US
dc.title	近似探勘資料流中最近重覆樣式方法之研究	zh_TW
dc.title	Approximately Mining Recently Repeating Patterns on Data Streams	en_US