整合全局場景與局部注意的自監督多標籤分類

陳俊彥; Chen, Chun-Yen

整合全局場景與局部注意的自監督多標籤分類

dc.contributor	葉梅珍	zh_TW
dc.contributor	Yeh, Mei-Chen	en_US
dc.contributor.author	陳俊彥	zh_TW
dc.contributor.author	Chen, Chun-Yen	en_US
dc.date.accessioned	2023-12-08T08:02:50Z
dc.date.available	2023-08-09
dc.date.available	2023-12-08T08:02:50Z
dc.date.issued	2023
dc.description.abstract	自監督學習在各種計算機視覺任務中取得了顯著的成果，證明了其在廣泛應用中的有效性。然而，儘管取得了這些成功，針對多標籤分類的挑戰的研究工作仍相對有限。該領域尚待深入探討，需要進一步研究以充分利用自監督學習技術進行多標籤分類任務。在這篇論文中，我們提出了一個適用於自監督多標籤分類的多層次表徵學習（GOLANG）框架，同時捕捉圖像的場景和物件資訊。我們的方法結合了全局場景和局部對齊，以捕捉圖像中不同層次的語義信息。框架的全局模組通過對輸出特徵進行平均池化來學習整個圖像，而局部對齊模組通過學習關注來消除與對象無關的干擾。通過整合兩個模組，我們的模型能從影像中有效地學習各種層次的語義信息。為了進一步提高模型提取物件-場景關係的能力，我們引入了全局和局部交換預測技術，有效捕捉圖像中各種物件和場景之間的複雜關係。GOLANG框架在自監督多標籤分類的實驗上展示了優秀的性能，凸顯了其在在多標籤影像中捕捉多個物件和場景之間錯綜複雜關係的有效性。	zh_TW
dc.description.abstract	Self-supervised learning has shown promising results in various computer vision tasks, proving its effectiveness in a wide range of applications. However, despite these successes, there has been limited work specifically addressing the challenges of multi-label classification. This area remains relatively underexplored, and further research is needed to fully harness the potential of self-supervised learning techniques for multi-label classification tasks.In this paper, we present a multi-level representation learning (GOLANG) framework for self-supervised multi-label classification, which captures the image context and object information simultaneously. Our approach combines global context learning and local alignment to capture different levels of semantic information in images. The global context learning module learns from the whole image, while the local alignment module eliminates object-irrelevant nuisances by learning where to learn.By integrating both modules, our model effectively learns diverse levels of semantic information to facilitate the multi-label classification task. To further enhance the model's ability to extract object-scene relationships, we introduce cross-level prediction, which effectively captures the intricate interplay between various objects and scenes within images. The GOLANG framework demonstrates state-of-the-art performance on self-supervised multi-label classification tasks, highlighting its effectiveness in capturing the intricate relationships between multiple objects and scenes in images.	en_US
dc.description.sponsorship	資訊工程學系	zh_TW
dc.identifier	61047073S-43903
dc.identifier.uri	https://etds.lib.ntnu.edu.tw/thesis/detail/4872aea7b7f5bb72952ea67492c9b340/
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/121635
dc.language	英文
dc.subject	自監督學習	zh_TW
dc.subject	對比學習	zh_TW
dc.subject	多標籤分類	zh_TW
dc.subject	Self-supervised learning	en_US
dc.subject	Contrastive learning	en_US
dc.subject	Multi-label classification	en_US
dc.title	整合全局場景與局部注意的自監督多標籤分類	zh_TW
dc.title	From Whole to Parts: Integrating Global Context and Local Attention for Self-Supervised Multi-Label Classification	en_US
dc.type	etd

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 202300043903-106246.pdf
Size:: 644 KB
Format:: Adobe Portable Document Format
Description:: etd

Download

Collections

學位論文