Temporal regions for activity recognition

João Paulo Aires*, Juarez Monteiro, Roger Granada, Felipe Meneguzzi, Rodrigo C. Barros

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution


Recognizing activities in videos is an important task for humans, since it helps the identification of different types of interactions with other agents. To perform such task, we need an approach that is able to process the frames of a video and extract enough information in order to determine the activity. When dealing with activity recognition we also have to consider the temporal aspect of videos since activities tend to occur through the frames. In this work, we propose an approach to obtain temporal information from a video by dividing its frames into regions. Thus, instead of classifying an activity using only the information from each image frame, we extract and merge the information from several regions of the video in order to obtain its temporal aspect. To make a composition of different parts of the video, we take one frame of each region and either concatenate or take the mean of their features. For example, consider a video divided into three regions and each frame containing ten features, the resulting vector of a concatenation will contain thirty features, while the resulting vector of the mean will contain ten features. Our pipeline includes pre-processing, which consists of resizing images to a fixed resolution of 256×256; Convolutional Neural Networks, which extract features from the activity in each frame; region divisions, which divides each sequence of frames of a video into n regions of the same size; and classification, where we apply a Support Vector Machine (SVM) on the features from the concatenation or mean phase in order to predict the activity. Experiments are performed using The DogCentric Activity dataset [1] that contains videos with 10 different activities performed by 4 dogs, showing that our approach can improve the activity recognition task. We test our approach using two networks AlexNet and GoogLeNet, increasing up to 10% of precision when using regions to classify activities.

Original languageEnglish
Title of host publicationProceedings: 26th International Conference on Artificial Neural Networks
EditorsPaul F. Verschure, Alessandra Lintas, Alessandro E. Villa, Stefano Rovetta
Number of pages1
ISBN (Print)9783319685991
Publication statusPublished - 2017
Event26th International Conference on Artificial Neural Networks, ICANN 2017 - Alghero, Italy
Duration: 11 Sept 201714 Sept 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10613 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference26th International Conference on Artificial Neural Networks, ICANN 2017

Bibliographical note

Publisher Copyright:
© Springer International Publishing AG 2017.


  • Activity recognition
  • Convolutional neural networks
  • Neural networks


Dive into the research topics of 'Temporal regions for activity recognition'. Together they form a unique fingerprint.

Cite this