Temporal regions for activity recognition

João Paulo Aires; Juarez Monteiro; Roger Granada; Felipe Meneguzzi; Rodrigo C. Barros

Temporal regions for activity recognition

João Paulo Aires^*, Juarez Monteiro, Roger Granada, Felipe Meneguzzi, Rodrigo C. Barros

^*Corresponding author for this work

Computing Science

Pontifícia Universidade Católica do Rio Grande do Sul

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Abstract

Recognizing activities in videos is an important task for humans, since it helps the identification of different types of interactions with other agents. To perform such task, we need an approach that is able to process the frames of a video and extract enough information in order to determine the activity. When dealing with activity recognition we also have to consider the temporal aspect of videos since activities tend to occur through the frames. In this work, we propose an approach to obtain temporal information from a video by dividing its frames into regions. Thus, instead of classifying an activity using only the information from each image frame, we extract and merge the information from several regions of the video in order to obtain its temporal aspect. To make a composition of different parts of the video, we take one frame of each region and either concatenate or take the mean of their features. For example, consider a video divided into three regions and each frame containing ten features, the resulting vector of a concatenation will contain thirty features, while the resulting vector of the mean will contain ten features. Our pipeline includes pre-processing, which consists of resizing images to a fixed resolution of 256×256; Convolutional Neural Networks, which extract features from the activity in each frame; region divisions, which divides each sequence of frames of a video into n regions of the same size; and classification, where we apply a Support Vector Machine (SVM) on the features from the concatenation or mean phase in order to predict the activity. Experiments are performed using The DogCentric Activity dataset [1] that contains videos with 10 different activities performed by 4 dogs, showing that our approach can improve the activity recognition task. We test our approach using two networks AlexNet and GoogLeNet, increasing up to 10% of precision when using regions to classify activities.

Original language	English
Title of host publication	Proceedings: 26th International Conference on Artificial Neural Networks
Editors	Paul F. Verschure, Alessandra Lintas, Alessandro E. Villa, Stefano Rovetta
Publisher	Springer Verlag
Pages	424
Number of pages	1
ISBN (Print)	9783319685991
Publication status	Published - 2017
Event	26th International Conference on Artificial Neural Networks, ICANN 2017 - Alghero, Italy Duration: 11 Sept 2017 → 14 Sept 2017

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	10613 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	26th International Conference on Artificial Neural Networks, ICANN 2017
Country/Territory	Italy
City	Alghero
Period	11/09/17 → 14/09/17

Bibliographical note

Publisher Copyright:
© Springer International Publishing AG 2017.

Keywords

Activity recognition
Convolutional neural networks
Neural networks

Cite this

Aires, J. P., Monteiro, J., Granada, R., Meneguzzi, F., & Barros, R. C. (2017). Temporal regions for activity recognition. In P. F. Verschure, A. Lintas, A. E. Villa, & S. Rovetta (Eds.), Proceedings: 26th International Conference on Artificial Neural Networks (pp. 424). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10613 LNCS). Springer Verlag.

Temporal regions for activity recognition. / Aires, João Paulo; Monteiro, Juarez; Granada, Roger et al.
Proceedings: 26th International Conference on Artificial Neural Networks. ed. / Paul F. Verschure; Alessandra Lintas; Alessandro E. Villa; Stefano Rovetta. Springer Verlag, 2017. p. 424 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10613 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Aires, JP, Monteiro, J, Granada, R, Meneguzzi, F & Barros, RC 2017, Temporal regions for activity recognition. in PF Verschure, A Lintas, AE Villa & S Rovetta (eds), Proceedings: 26th International Conference on Artificial Neural Networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10613 LNCS, Springer Verlag, pp. 424, 26th International Conference on Artificial Neural Networks, ICANN 2017, Alghero, Italy, 11/09/17.

Aires JP, Monteiro J, Granada R, Meneguzzi F, Barros RC. Temporal regions for activity recognition. In Verschure PF, Lintas A, Villa AE, Rovetta S, editors, Proceedings: 26th International Conference on Artificial Neural Networks. Springer Verlag. 2017. p. 424. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Aires, João Paulo ; Monteiro, Juarez ; Granada, Roger et al. / Temporal regions for activity recognition. Proceedings: 26th International Conference on Artificial Neural Networks. editor / Paul F. Verschure ; Alessandra Lintas ; Alessandro E. Villa ; Stefano Rovetta. Springer Verlag, 2017. pp. 424 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{6eb4e926fdae4c85b8d4488773949d61,

title = "Temporal regions for activity recognition",

abstract = "Recognizing activities in videos is an important task for humans, since it helps the identification of different types of interactions with other agents. To perform such task, we need an approach that is able to process the frames of a video and extract enough information in order to determine the activity. When dealing with activity recognition we also have to consider the temporal aspect of videos since activities tend to occur through the frames. In this work, we propose an approach to obtain temporal information from a video by dividing its frames into regions. Thus, instead of classifying an activity using only the information from each image frame, we extract and merge the information from several regions of the video in order to obtain its temporal aspect. To make a composition of different parts of the video, we take one frame of each region and either concatenate or take the mean of their features. For example, consider a video divided into three regions and each frame containing ten features, the resulting vector of a concatenation will contain thirty features, while the resulting vector of the mean will contain ten features. Our pipeline includes pre-processing, which consists of resizing images to a fixed resolution of 256×256; Convolutional Neural Networks, which extract features from the activity in each frame; region divisions, which divides each sequence of frames of a video into n regions of the same size; and classification, where we apply a Support Vector Machine (SVM) on the features from the concatenation or mean phase in order to predict the activity. Experiments are performed using The DogCentric Activity dataset [1] that contains videos with 10 different activities performed by 4 dogs, showing that our approach can improve the activity recognition task. We test our approach using two networks AlexNet and GoogLeNet, increasing up to 10% of precision when using regions to classify activities.",

keywords = "Activity recognition, Convolutional neural networks, Neural networks",

author = "Aires, {Jo{\~a}o Paulo} and Juarez Monteiro and Roger Granada and Felipe Meneguzzi and Barros, {Rodrigo C.}",

note = "Publisher Copyright: {\textcopyright} Springer International Publishing AG 2017.; 26th International Conference on Artificial Neural Networks, ICANN 2017 ; Conference date: 11-09-2017 Through 14-09-2017",

year = "2017",

language = "English",

isbn = "9783319685991",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "424",

editor = "Verschure, {Paul F.} and Alessandra Lintas and Villa, {Alessandro E.} and Stefano Rovetta",

booktitle = "Proceedings: 26th International Conference on Artificial Neural Networks",

address = "Germany",

}

TY - GEN

T1 - Temporal regions for activity recognition

AU - Aires, João Paulo

AU - Monteiro, Juarez

AU - Granada, Roger

AU - Meneguzzi, Felipe

AU - Barros, Rodrigo C.

PY - 2017

Y1 - 2017

N2 - Recognizing activities in videos is an important task for humans, since it helps the identification of different types of interactions with other agents. To perform such task, we need an approach that is able to process the frames of a video and extract enough information in order to determine the activity. When dealing with activity recognition we also have to consider the temporal aspect of videos since activities tend to occur through the frames. In this work, we propose an approach to obtain temporal information from a video by dividing its frames into regions. Thus, instead of classifying an activity using only the information from each image frame, we extract and merge the information from several regions of the video in order to obtain its temporal aspect. To make a composition of different parts of the video, we take one frame of each region and either concatenate or take the mean of their features. For example, consider a video divided into three regions and each frame containing ten features, the resulting vector of a concatenation will contain thirty features, while the resulting vector of the mean will contain ten features. Our pipeline includes pre-processing, which consists of resizing images to a fixed resolution of 256×256; Convolutional Neural Networks, which extract features from the activity in each frame; region divisions, which divides each sequence of frames of a video into n regions of the same size; and classification, where we apply a Support Vector Machine (SVM) on the features from the concatenation or mean phase in order to predict the activity. Experiments are performed using The DogCentric Activity dataset [1] that contains videos with 10 different activities performed by 4 dogs, showing that our approach can improve the activity recognition task. We test our approach using two networks AlexNet and GoogLeNet, increasing up to 10% of precision when using regions to classify activities.

AB - Recognizing activities in videos is an important task for humans, since it helps the identification of different types of interactions with other agents. To perform such task, we need an approach that is able to process the frames of a video and extract enough information in order to determine the activity. When dealing with activity recognition we also have to consider the temporal aspect of videos since activities tend to occur through the frames. In this work, we propose an approach to obtain temporal information from a video by dividing its frames into regions. Thus, instead of classifying an activity using only the information from each image frame, we extract and merge the information from several regions of the video in order to obtain its temporal aspect. To make a composition of different parts of the video, we take one frame of each region and either concatenate or take the mean of their features. For example, consider a video divided into three regions and each frame containing ten features, the resulting vector of a concatenation will contain thirty features, while the resulting vector of the mean will contain ten features. Our pipeline includes pre-processing, which consists of resizing images to a fixed resolution of 256×256; Convolutional Neural Networks, which extract features from the activity in each frame; region divisions, which divides each sequence of frames of a video into n regions of the same size; and classification, where we apply a Support Vector Machine (SVM) on the features from the concatenation or mean phase in order to predict the activity. Experiments are performed using The DogCentric Activity dataset [1] that contains videos with 10 different activities performed by 4 dogs, showing that our approach can improve the activity recognition task. We test our approach using two networks AlexNet and GoogLeNet, increasing up to 10% of precision when using regions to classify activities.

KW - Activity recognition

KW - Convolutional neural networks

KW - Neural networks

UR - http://www.scopus.com/inward/record.url?scp=85034264432&partnerID=8YFLogxK

M3 - Published conference contribution

AN - SCOPUS:85034264432

SN - 9783319685991

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 424

BT - Proceedings: 26th International Conference on Artificial Neural Networks

A2 - Verschure, Paul F.

A2 - Lintas, Alessandra

A2 - Villa, Alessandro E.

A2 - Rovetta, Stefano

PB - Springer Verlag

T2 - 26th International Conference on Artificial Neural Networks, ICANN 2017

Y2 - 11 September 2017 through 14 September 2017

ER -

Temporal regions for activity recognition

Abstract

Publication series

Conference

Bibliographical note

Keywords

Other files and links

Fingerprint

Cite this