基於改良式時序動作提名生成網路之即時動作偵測
No Thumbnail Available
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
大多數的動作辨識(Action Recognition)方法在應用於連續動作辨識時,會有不穩定的預測,這是因為該些方法大都利用只有單一動作的短視頻(Short Video Clip)去訓練,如果輸入是連續讀入的即時影像時,由於無法取樣到動作開始與結束的幀,造成輸入模型的影像序列與訓練資訊大相逕庭,造成辨識的錯誤。為解決上述問題,本論文提出一即時動作偵測(Online Action Detection)方法,在串流影像當中找出動作的開始與結束,作法是先以Inflated 3D ConvNet (I3D)提取出RGB及Optical Flow影像的特徵,再利用Boundary Sensitive Network (BSN)中的Temporal Evaluation Module (TEM)模組,來找出動作開始、動作結束的機率。此外,本文改良了傳統BSN,使其從離線運行轉變成可以即時運行來找出開始與結束的機率,以得到目標動作較有可能發生的區間。在動作開始後,本文應用動態取樣方法來獲得有效樣本並送入I3D以進行動作識別。實驗結果顯示,所提出的方法可以更好地處理各種連續時間的目標動作影片,提高串流影片中動作辨識的準確度。
Most learning-based action recognition methods cannot make stable predictions when they are applied to online action recognition tasks. This is because these methods are trained on trimmed video clips. Hence, without the prior knowledge of the occurrence time of the target action, the sampled frames can be significantly different from the ones in training data. In this work, we propose an online action detection method that detects the starting and ending time of actions in streaming videos before action recognition. First, Inflated 3D ConvNet (I3D) is applied to extract spatial and temporal features of input video frames. Second, those features are fed into the Temporal Evaluation Module (TEM) of Boundary Sensitive Network (BSN) to generate the probability of action starting time. In addition, we modified the traditional BSN from offline operation to real-time operation. Then, the starting time of an action can be located according to a threshold. Last, we apply a dynamic-sampling method to obtain valid samples for action recognition by I3D. Experimental results show that the proposed method can better deal with videos that cascade various durations of target actions. Therefore, the method proposed in this papercan improve the accuracy of action recognition in streaming videos, which is beneficial to smart surveillance systems.
Most learning-based action recognition methods cannot make stable predictions when they are applied to online action recognition tasks. This is because these methods are trained on trimmed video clips. Hence, without the prior knowledge of the occurrence time of the target action, the sampled frames can be significantly different from the ones in training data. In this work, we propose an online action detection method that detects the starting and ending time of actions in streaming videos before action recognition. First, Inflated 3D ConvNet (I3D) is applied to extract spatial and temporal features of input video frames. Second, those features are fed into the Temporal Evaluation Module (TEM) of Boundary Sensitive Network (BSN) to generate the probability of action starting time. In addition, we modified the traditional BSN from offline operation to real-time operation. Then, the starting time of an action can be located according to a threshold. Last, we apply a dynamic-sampling method to obtain valid samples for action recognition by I3D. Experimental results show that the proposed method can better deal with videos that cascade various durations of target actions. Therefore, the method proposed in this papercan improve the accuracy of action recognition in streaming videos, which is beneficial to smart surveillance systems.
Description
Keywords
深度學習, 動作辨識, 即時動作偵測, 時序動作偵測, 時序動作提名生成, deep learning, action recognition, online action detection, temporal action detection, temporal action proposal generation