整合全局場景與局部注意的自監督多標籤分類
No Thumbnail Available
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
自監督學習在各種計算機視覺任務中取得了顯著的成果,證明了其在廣泛應用中的有效性。然而,儘管取得了這些成功,針對多標籤分類的挑戰的研究工作仍相對有限。該領域尚待深入探討,需要進一步研究以充分利用自監督學習技術進行多標籤分類任務。在這篇論文中,我們提出了一個適用於自監督多標籤分類的多層次表徵學習(GOLANG)框架,同時捕捉圖像的場景和物件資訊。我們的方法結合了全局場景和局部對齊,以捕捉圖像中不同層次的語義信息。框架的全局模組通過對輸出特徵進行平均池化來學習整個圖像,而局部對齊模組通過學習關注來消除與對象無關的干擾。通過整合兩個模組,我們的模型能從影像中有效地學習各種層次的語義信息。為了進一步提高模型提取物件-場景關係的能力,我們引入了全局和局部交換預測技術,有效捕捉圖像中各種物件和場景之間的複雜關係。GOLANG框架在自監督多標籤分類的實驗上展示了優秀的性能,凸顯了其在在多標籤影像中捕捉多個物件和場景之間錯綜複雜關係的有效性。
Self-supervised learning has shown promising results in various computer vision tasks, proving its effectiveness in a wide range of applications. However, despite these successes, there has been limited work specifically addressing the challenges of multi-label classification. This area remains relatively underexplored, and further research is needed to fully harness the potential of self-supervised learning techniques for multi-label classification tasks.In this paper, we present a multi-level representation learning (GOLANG) framework for self-supervised multi-label classification, which captures the image context and object information simultaneously. Our approach combines global context learning and local alignment to capture different levels of semantic information in images. The global context learning module learns from the whole image, while the local alignment module eliminates object-irrelevant nuisances by learning where to learn.By integrating both modules, our model effectively learns diverse levels of semantic information to facilitate the multi-label classification task. To further enhance the model's ability to extract object-scene relationships, we introduce cross-level prediction, which effectively captures the intricate interplay between various objects and scenes within images. The GOLANG framework demonstrates state-of-the-art performance on self-supervised multi-label classification tasks, highlighting its effectiveness in capturing the intricate relationships between multiple objects and scenes in images.
Self-supervised learning has shown promising results in various computer vision tasks, proving its effectiveness in a wide range of applications. However, despite these successes, there has been limited work specifically addressing the challenges of multi-label classification. This area remains relatively underexplored, and further research is needed to fully harness the potential of self-supervised learning techniques for multi-label classification tasks.In this paper, we present a multi-level representation learning (GOLANG) framework for self-supervised multi-label classification, which captures the image context and object information simultaneously. Our approach combines global context learning and local alignment to capture different levels of semantic information in images. The global context learning module learns from the whole image, while the local alignment module eliminates object-irrelevant nuisances by learning where to learn.By integrating both modules, our model effectively learns diverse levels of semantic information to facilitate the multi-label classification task. To further enhance the model's ability to extract object-scene relationships, we introduce cross-level prediction, which effectively captures the intricate interplay between various objects and scenes within images. The GOLANG framework demonstrates state-of-the-art performance on self-supervised multi-label classification tasks, highlighting its effectiveness in capturing the intricate relationships between multiple objects and scenes in images.
Description
Keywords
自監督學習, 對比學習, 多標籤分類, Self-supervised learning, Contrastive learning, Multi-label classification