基於半監督式骨架動作辨識模型之圖資料增強方法

No Thumbnail Available

Date

2023

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

近年來,基於骨架資訊之骨架動作辨識在圖卷積架構的導入下獲得顯著的效能提升。不同於傳統RGB影像動作辨識,骨架動作辨識的輸入資料為人體的關節點資訊,這種輸入資料的特點為不易受到現實中的背景雜訊影響,進而取得更有效率及精確性的動作辨識結果。然而,製作人體關節點的資料需要大量人力資源,這導致在現實應用環境中缺少標註樣本資料進行訓練。另外,採用預訓練好的模型亦需要花費相當的時間成本進行參數調整,成為應用的一個瓶頸。為此,本研究中我們提出多種骨架動作資料的資料強化方法以解決少量標註資料的問題,並結合半監督學習策略有效利用未標註樣本,進而提高骨架動作辨識模型在少量標註資料環境下的辨識能力。我們提出的資料強化方法能在低成本的額外運算下,有效提高資料的多樣性,使模型可以提取更多不同的特徵資訊。在半監督學習策略中,我們採用兩種強度不同的資料增強方法作為輸入,透過計算經不同強化方法產生的辨識結果之相似度作為損失函數以強化模型對於辨識結果的一致性,並期望模型可以學習更多關於辨識決策的有效資訊。此外,我們還透過調整非標註資料加入網路訓練的時間點,在確保準確率的同時,也顯著地降低了模型訓練所需時間。實驗結果顯示,我們提出的架構在NTU RGB+D大型資料集的低資料環境實驗中,達到了84.16%的準確率,相較於原始方法的77.5%的準確率,提升了6.66%;研究結果表明我們所提出之方法在少量標註資料的情況下可以有效提升模型之辨識準確率及泛化能力,為解決實際應用中資料稀缺和降低模型的調整成本問題中提供一個有效的解決方案。
In recent years, skeleton-based action recognition has been significantly improved by the introduction of graph convolutional networks. Unlike conventional video, the input data for skeleton-based action recognition is the location of human joints, which is less susceptible to background noise, leading to more efficient and accurate action recognition results. However, preparing joint data requires a lot of human resources, causing a lack of labeled training data. On the other hand, turning the parameters of pre-trained models requires time and human resource. The above reasons become bottlenecks for skeleton-based action recognition applications. To overcome these problems, we propose several data augmentation methods specifically designed for skeleton-based data. With these approaches, we can effectively leverage the limited amount of labeled data. Additionally, we introduce a semi-supervised learning strategy to exploit the useful information of unlabeled data. For the semi-supervised learning strategy, we apply two difference strengths of data augmentation to the two streams of unlabeled data inputs. We believe that by using of the loss of consistency between the outputs of different transformations as part of the model loss, the robustness of the model can be improved and more recognition information can be obtained. Moreover, we optimize the schedule for incorporating unlabeled data into the model training process. This approach ensures both model performance and reduces training time effectively. The experimental results show that our proposed framework achieves 84.16% accuracy in a low-volume data environment experiment with NTU RGB+D large data sets, which is a 6.66% improvement over the 77.5% accuracy of the original method. The results show that our proposed method can effectively improve the recognition accuracy and generalization ability of the model with a small amount of annotated data, which provides an effective solution to solve the problem of data scarcity and reduce the adjustment cost of the model in practical applications.

Description

Keywords

動作辨識, 圖卷積神經網路, 半監督式學習, 圖資料強化, action recognition, Graph Convolution Network, Semi-Supervised Learning, graph data augmentation

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By