結構資料的再次使用:語意、連結與實作
No Thumbnail Available
Date
2017-04-??
Journal Title
Journal ISSN
Volume Title
Publisher
國立台灣師範大學圖書資訊學研究所
Graduate institute of library and information studies ,NTNU
Graduate institute of library and information studies ,NTNU
Abstract
持續創造資料的語意與連結,藉由全球資訊網散布同時可由常人和機器處理並理解的結構性資料,進而增進資料集的「再次使用價值」(reuse value)是目前廣受重視的課題,也是本研究由理論探討邁向系統實作的動力與目的。本文簡述與「開放資料連結」(Linked Open Data, LOD)相關國際計畫與技術發展,介紹以「開放資料連結」方式建置的五項跨領域知識庫和七項專業知識庫,並解析資料品質、後設資料(Metadata)及資料溯源(Provenance)的關聯脈絡。本研究同時進行實作網站data.odw.tw,收納典藏品目錄資料,並設計知識本體(voc4odw)轉換半結構式資料為富語意結構的連結式資料。一方面擴充CKAN(The Comprehensive Knowledge Archive Network)資料集管理系統,作為連結式資料的儲存與展示平台,進而強調從原始目錄資料到語意連結資料的分段轉換步驟,最後將各步驟轉換程式以及CKAN 軟體程式碼以「開放原始碼」(Open Source)方式釋出。另一方面,由於研究資料來源採「創用CC」(Creative Commons)公眾授權,因此研究成果亦以相同方式釋出,在開放基礎上促使資料與程式碼的保存與發展,可被自由再次使用與擴散。
In order to increase the reuse value of existing datasets, it is now becoming a general practice to add semantic links among the records in a dataset, and to link these records to external resources. The enriched datasets are published on the web for both human and machine to consume and re‐purpose. In this paper, we make use of publicly available structured records from a digital archive catalogue, and we demonstrate a principled approach to converting the records into semantically rich and interlinked resources for all to reuse. While exploring the various issues involved in the process of reusing and re‐purposing existing datasets, we review the recent progress in the field of Linked Open Data (LOD), and examine twelve well‐known knowledge bases built with a Linked Data approach. We also discuss the general issues of data quality, metadata vocabularies, and data provenance. The concrete outcome of this research work is the following: (1) a website data.odw.tw that hosts more than 840,000 semantically enriched catalogue records across multiple subject areas, (2) a lightweight ontology voc4odw for describing data reuse and provenance, among others, and (3) a set of open source software tools available to all to perform the kind of data conversion and enrichment we did in this research. We have used and extended CKAN (The Comprehensive Knowledge Archive Network) as a platform to host and publish Linked Data. Our extensions to CKAN is open sourced as well. As the records we drawn from the originally catalogue are released under the Creative Commons licenses, the semantically enriched resources we now re‐publish on the Web are free for all to reuse as well.
In order to increase the reuse value of existing datasets, it is now becoming a general practice to add semantic links among the records in a dataset, and to link these records to external resources. The enriched datasets are published on the web for both human and machine to consume and re‐purpose. In this paper, we make use of publicly available structured records from a digital archive catalogue, and we demonstrate a principled approach to converting the records into semantically rich and interlinked resources for all to reuse. While exploring the various issues involved in the process of reusing and re‐purposing existing datasets, we review the recent progress in the field of Linked Open Data (LOD), and examine twelve well‐known knowledge bases built with a Linked Data approach. We also discuss the general issues of data quality, metadata vocabularies, and data provenance. The concrete outcome of this research work is the following: (1) a website data.odw.tw that hosts more than 840,000 semantically enriched catalogue records across multiple subject areas, (2) a lightweight ontology voc4odw for describing data reuse and provenance, among others, and (3) a set of open source software tools available to all to perform the kind of data conversion and enrichment we did in this research. We have used and extended CKAN (The Comprehensive Knowledge Archive Network) as a platform to host and publish Linked Data. Our extensions to CKAN is open sourced as well. As the records we drawn from the originally catalogue are released under the Creative Commons licenses, the semantically enriched resources we now re‐publish on the Web are free for all to reuse as well.