整合全局場景與局部注意的自監督多標籤分類

dc.contributor葉梅珍zh_TW
dc.contributorYeh, Mei-Chenen_US
dc.contributor.author陳俊彥zh_TW
dc.contributor.authorChen, Chun-Yenen_US
dc.date.accessioned2023-12-08T08:02:50Z
dc.date.available2023-08-09
dc.date.available2023-12-08T08:02:50Z
dc.date.issued2023
dc.description.abstract自監督學習在各種計算機視覺任務中取得了顯著的成果,證明了其在廣泛應用中的有效性。然而,儘管取得了這些成功,針對多標籤分類的挑戰的研究工作仍相對有限。該領域尚待深入探討,需要進一步研究以充分利用自監督學習技術進行多標籤分類任務。在這篇論文中,我們提出了一個適用於自監督多標籤分類的多層次表徵學習(GOLANG)框架,同時捕捉圖像的場景和物件資訊。我們的方法結合了全局場景和局部對齊,以捕捉圖像中不同層次的語義信息。框架的全局模組通過對輸出特徵進行平均池化來學習整個圖像,而局部對齊模組通過學習關注來消除與對象無關的干擾。通過整合兩個模組,我們的模型能從影像中有效地學習各種層次的語義信息。為了進一步提高模型提取物件-場景關係的能力,我們引入了全局和局部交換預測技術,有效捕捉圖像中各種物件和場景之間的複雜關係。GOLANG框架在自監督多標籤分類的實驗上展示了優秀的性能,凸顯了其在在多標籤影像中捕捉多個物件和場景之間錯綜複雜關係的有效性。zh_TW
dc.description.abstractSelf-supervised learning has shown promising results in various computer vision tasks, proving its effectiveness in a wide range of applications. However, despite these successes, there has been limited work specifically addressing the challenges of multi-label classification. This area remains relatively underexplored, and further research is needed to fully harness the potential of self-supervised learning techniques for multi-label classification tasks.In this paper, we present a multi-level representation learning (GOLANG) framework for self-supervised multi-label classification, which captures the image context and object information simultaneously. Our approach combines global context learning and local alignment to capture different levels of semantic information in images. The global context learning module learns from the whole image, while the local alignment module eliminates object-irrelevant nuisances by learning where to learn.By integrating both modules, our model effectively learns diverse levels of semantic information to facilitate the multi-label classification task. To further enhance the model's ability to extract object-scene relationships, we introduce cross-level prediction, which effectively captures the intricate interplay between various objects and scenes within images. The GOLANG framework demonstrates state-of-the-art performance on self-supervised multi-label classification tasks, highlighting its effectiveness in capturing the intricate relationships between multiple objects and scenes in images.en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifier61047073S-43903
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/4872aea7b7f5bb72952ea67492c9b340/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/121635
dc.language英文
dc.subject自監督學習zh_TW
dc.subject對比學習zh_TW
dc.subject多標籤分類zh_TW
dc.subjectSelf-supervised learningen_US
dc.subjectContrastive learningen_US
dc.subjectMulti-label classificationen_US
dc.title整合全局場景與局部注意的自監督多標籤分類zh_TW
dc.titleFrom Whole to Parts: Integrating Global Context and Local Attention for Self-Supervised Multi-Label Classificationen_US
dc.typeetd

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
202300043903-106246.pdf
Size:
644 KB
Format:
Adobe Portable Document Format
Description:
etd

Collections