Article Preview
TopIntroduction
With the rapid development of computer and Internet technology, mobile Internet technology and mobile communication devices are widely used in various fields. Using mobile phones and other mobile devices to shoot, watch and share videos has become a part of modern people's life and work (“Cisco Annual Internet Report (2018–2023) White Paper,”). Therefore, video becomes an important information carrier and grows at a geometric order of magnitude in the network. In recent years, the application demand for automatic analysis of video content has been expanding. Over the past decade, video content understanding and recognition technologies have shown broad promise in the fields of surveillance (Angadi & Nandyal, 2020; Chakraborty, Bhattacharyya, & Chakraborty, 2018; Ullah et al., 2020), smart home (Dai, Minciullo, Garattoni, Francesca, & Bremond, 2019), autonomous driving (Gao, Xu, Davis, Socher, & Xiong, 2019), and sports video analysis (Akçay, Seymen, Er, Çetin, & Karslıgil, 2019; Karlsson, 2017; Rafiq, Rafiq, Agyeman, Choi, & Jin, 2020). Sports video has the largest number of audiences in all type of videos. A large number of sports videos are recorded every day. The indexing of sports video by sports category is an important means for post-match analysis, coaching tactics formation and the follow-up processing. It is the basis for the realization of sports video summarization, semantic annotation and retrieval, and has great commercial potential and application value (Z. Wang, Yu, & He, 2016).
Video classification technology is an important research direction in the field of computer vision (Wu, Yao, Fu, & Jiang, 2017). Its main purpose is to analyze video content and classify videos into predefined classes according to objects, scenes, action information of objects and evolution information of scenes, so as to achieve the purpose of supervising and classifying videos (Rafiq et al., 2020). In this process, it involves many fields such as object detection, scene detection, image processing, pattern recognition and artificial intelligence, and almost covers all contents of video processing. Therefore, video classification embodies the advanced and cutting-edge video processing technologies. Video can be regarded as a continuous sequence of images, however, as the dynamic characteristics of video sequence, and the related light conditions, background, camera angle, the shade, it is difficult to distinguish between scene change within the large intra-class differences and small inter-class similarities, making video classification problem much more complex than a single image classification. Therefore, video classification has always been a challenging task in the field of video analysis.
Video classification is essentially a pattern recognition problem, which mainly includes two steps of feature extraction and classification. The feature extraction is the core step of the problem. In the past few decades, with the development of feature extraction technology, video classification technology has made some progress, but it is far from satisfying. There is still a huge semantic gap between low-level features and high-level semantics (Guo, 2020). Video is composed of a series of images in a certain order, and the visual information in the images constitutes the visual information of the video. More importantly, the sequential information between images constitutes the temporal information of the video. This temporal information includes the motion of the object, the evolution of the scene and other information unique to the video carrier. The complex temporal structures of video sequence incur understanding and computation difficulties, which should be modeled to improve the video classification performance. However, the existing feature extraction methods can not fully capture the temporal information, or can only capture the short-time low-level action characteristics, resulting in insufficient feature expressions. As video content on the Internet becomes more and more complex and informative, this problem becomes more and more acute. Therefore, it is of great significance to study video features, especially the extraction method of temporal features.