Recognizing activities in videos is an important task for humans, since it helps the identification of different types of interactions with other agents. To perform such task, we need an approach that is able to process the frames of a video and extract enough information in order to determine the activity. When dealing with activity recognition we also have to consider the temporal aspect of videos since activities tend to occur through the frames. In this work, we propose an approach to obtain temporal information from a video by dividing its frames into regions. Thus, instead of classifying an activity using only the information from each image frame, we extract and merge the information from several regions of the video in order to obtain its temporal aspect. To make a composition of different parts of the video, we take one frame of each region and either concatenate or take the mean of their features. For example, consider a video divided into three regions and each frame containing ten features, the resulting vector of a concatenation will contain thirty features, while the resulting vector of the mean will contain ten features. Our pipeline includes pre-processing, which consists of resizing images to a fixed resolution of 256×256; Convolutional Neural Networks, which extract features from the activity in each frame; region divisions, which divides each sequence of frames of a video into n regions of the same size; and classification, where we apply a Support Vector Machine (SVM) on the features from the concatenation or mean phase in order to predict the activity. Experiments are performed using The DogCentric Activity dataset  that contains videos with 10 different activities performed by 4 dogs, showing that our approach can improve the activity recognition task. We test our approach using two networks AlexNet and GoogLeNet, increasing up to 10% of precision when using regions to classify activities.
|Title of host publication
|Proceedings: 26th International Conference on Artificial Neural Networks
|Paul F. Verschure, Alessandra Lintas, Alessandro E. Villa, Stefano Rovetta
|Number of pages
|Published - 2017
|26th International Conference on Artificial Neural Networks, ICANN 2017 - Alghero, Italy
Duration: 11 Sept 2017 → 14 Sept 2017
|Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
|26th International Conference on Artificial Neural Networks, ICANN 2017
|11/09/17 → 14/09/17
Bibliographical notePublisher Copyright:
© Springer International Publishing AG 2017.
- Activity recognition
- Convolutional neural networks
- Neural networks