Quantitation of new arbitrary view dynamic human action recognition framework
Indonesian Journal of Electrical Engineering and Computer Science

Abstract
Dynamic action recognition has attracted many researchers due to its applications. Nevertheless, it is still a challenging problem because the diversity of camera setups in the training phases are not similar to the testing phases, and/or the arbitrary view actions are captured from multiple viewpoints of cameras. In fact, some recent dynamic gesture approaches focus on multiview action recognition, but they are not resolved in novel viewpoints. In this research, we propose a novel end-to-end framework for dynamic gesture recognition from an unknown viewpoint. It consists of three main components: (i) a synthetic video generation with generative adversarial network (GAN)-based architecture named ArVi-MoCoGAN model; (i) a feature extractor part which is evaluated and compared by various 3D CNN backbones; and (iii) a channel and spatial attention module. The ArVi-MoCoGAN generates the synthetic videos at multiple fixed viewpoints from a real dynamic gesture at an arbitrary viewpoint. These synthetic videos will be extracted in the next component by various three-dimensional (3D) convolutional neural network (CNN) models. These feature vectors are then processed in the final part to focus on the attention features of dynamic actions. Our proposed framework is compared to the SOTA approaches in accuracy that is extensively discussed and evaluated on four standard dynamic action datasets. The experimental results of our proposed method are higher than the recent solutions, from 0.01% to 9.59% for arbitrary view action recognition.
Discover Our Library
Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.
