Abdelpakey, Mohamed Hamed (2021) Visual object tracking in dynamic scenes. Doctoral (PhD) thesis, Memorial University of Newfoundland.
[English]
PDF
- Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. Download (16MB) |
Abstract
Visual object tracking is a fundamental task in the field computer vision. Visual object tracking is widely used in numerous applications which include, but are not limited to video surveillance, image understanding, robotics, and human-computer interaction. In essence, visual object tracking is the problem of estimating the states/trajectory of the object of interest over time. Unlike other tasks such as object detection where the number of classes/categories are defined beforehand, the only available information of the object of interest is at the first frame. Even though, Deep Learning (DL) has revolutionised most computer vision tasks, visual object tracking still imposes several challenges. The nature of visual object tracking task is stochastic, where no prior-knowledge is available about the object of interest during the training or testing/inference. Moreover, visual object tracking is a class-agnostic task, as opposed object detection and segmentation tasks. In this thesis, the main objective is to develop and advance the visual object trackers using novel designs of deep learning frameworks and mathematical formulations. To take advantage of different trackers, a novel framework is developed to track moving objects based on a composite framework and a reporter mechanism. The composite framework has built-in trackers and user-defined trackers to track the object of interest. The framework contains a module to calculate the robustness for each tracker and a reporter mechanism serves as a recovery mechanism if trackers fail to locate the object of interest. Different trackers may fail to track the object of interest, thus, a more robust framework based on Siamese network architecture, namely DensSiam, is proposed to use the concept of dense layers and connects each dense layer in the network to all layers in a feed-forward fashion with a similarity-learning function. DensSiam also includes a Self-Attention mechanism to force the network to pay more attention to non-local features during offline training. Generally, Siamese trackers do not fully utilize semantic and objectness information from pre-trained networks that have been trained on an image classification task. To solve this problem a novel architecture design is proposed , dubbed DomainSiam, to learn a Domain-Aware that fully utilizes semantic and objectness information while producing a class-agnostic track using a ridge regression network. Moreover, to reduce the sparsity problem, we solve the ridge regression problem with a differentiable weighted-dynamic loss function. Siamese trackers have high speed and work in real-time, however, they lack high accuracy. To overcome this challenge, a novel dynamic policy gradient Agent-Environment architecture with Siamese network (DP-Siam) is proposed to train the tracker to increase the accuracy and the expected average overlap while running in real-time. DP-Siam is trained offline with reinforcement learning to produce a continuous action that predicts the optimal object location. One of the common design block in most object trackers in the literature is the backbone network, where the backbone network is trained in the feature space. To design a backbone network that maps from feature space to another space (i.e., joint-nullspace) and more suitable for object tracking and classification, a novel framework is proposed. The new framework is called NullSpaceNet has a clear interpretation for the feature representation and the features in this space are more separable. NullSpaceNet is utilized in object tracking by regularizing the discriminative joint-nullspace backbone network. The novel tracker is called NullSpaceRDAR, and encourages the network to have a representation for the target-specific information for the object of interest in the joint-nullspace. In contrast to feature space where objects from a specific class are categorized into one category however, it is insensitive to intra-class variations. Furthermore, we use the NullSpaceNet backbone to learn a tracker, dubbed NullSpaceRDAR, with a regularized discriminative joint-nullspace backbone network that is specifically designed for object tracking. In the regularized discriminative joint-nullspace, the features from the same target-specific are collapsed into one point in the joint-null space and different targetspecific features are collapsed into different points in the joint-nullspace. Consequently, the joint-nullspace forces the network to be sensitive to the variations of the object from the same class (intra-class variations). Moreover, a dynamic adaptive loss function is proposed to select the suitable loss function from a super-set family of losses based on the training data to make NullSpaceRDAR more robust to different challenges.
Item Type: | Thesis (Doctoral (PhD)) |
---|---|
URI: | http://research.library.mun.ca/id/eprint/14936 |
Item ID: | 14936 |
Additional Information: | Includes bibliographical references. |
Keywords: | Deep learning, Computer vision, Machine learning, Object tracking, Artificial intelligence |
Department(s): | Engineering and Applied Science, Faculty of |
Date: | February 2021 |
Date Type: | Submission |
Digital Object Identifier (DOI): | https://doi.org/10.48336/gpj0-2n30 |
Library of Congress Subject Heading: | Machine learning; Movement sequences; Artificial intelligence. |
Actions (login required)
View Item |