TY - GEN
T1 - Spatio-Temporal Difference Descriptor for Skeleton-Based Action Recognition
AU - Ding, Chongyang
AU - Liu, Kai
AU - Korhonen, Jari
AU - Belyaev, Evgeny
PY - 2021/5/18
Y1 - 2021/5/18
N2 - In skeletal representation, intra-frame differences between body joints, as well as inter-frame dynamics between body skeletons contain discriminative information for action recognition. Conventional methods for modeling human skeleton sequences generally depend on motion trajectory and body joint dependency information, thus lacking the ability to identify the inherent differences of human skeletons. In this paper, we propose a spatio-temporal difference descriptor based on a directional convolution architecture that enables us to learn the spatio-temporal differences and contextual dependencies between different body joints simultaneously. The overall model is built on a deep symmetric positive definite (SPD) metric learning architecture designed to learn discriminative manifold features with the well-designed non-linear mapping operation. Experiments on several action datasets show that our proposed method achieves up to 3% accuracy improvement over state-of-the-art methods.
AB - In skeletal representation, intra-frame differences between body joints, as well as inter-frame dynamics between body skeletons contain discriminative information for action recognition. Conventional methods for modeling human skeleton sequences generally depend on motion trajectory and body joint dependency information, thus lacking the ability to identify the inherent differences of human skeletons. In this paper, we propose a spatio-temporal difference descriptor based on a directional convolution architecture that enables us to learn the spatio-temporal differences and contextual dependencies between different body joints simultaneously. The overall model is built on a deep symmetric positive definite (SPD) metric learning architecture designed to learn discriminative manifold features with the well-designed non-linear mapping operation. Experiments on several action datasets show that our proposed method achieves up to 3% accuracy improvement over state-of-the-art methods.
KW - Video Understanding & Activity Analysis
UR - https://slideslive.com/38948056/spatiotemporal-difference-descriptor-for-skeletonbased-action-recognition?ref=account-79851-presentations
U2 - 10.1609/aaai.v35i2.16210
DO - 10.1609/aaai.v35i2.16210
M3 - Published conference contribution
SN - 978-1-57735-866-4
VL - 35
T3 - AAAI Conference on Artificial Intelligence
SP - 1227
EP - 1235
BT - The Thirty-Fifth AAAI Conference on Artificial Intelligence, The Thirty-Third Conference on Innovative Applications of Artificial Intelligence and The Eleventh Symposium on Educational Advances in Artificial Intelligence
PB - ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE
CY - Palo Alto, California
T2 - 35th AAAI Conference on Artificial Intelligence / 33rd Conference on Innovative Applications of Artificial Intelligence / 11th Symposium on Educational Advances in Artificial Intelligence
Y2 - 2 February 2021 through 9 February 2021
ER -