Article Preview
TopIntroduction
In recent years, remote sensing (RS) observation technology has developed rapidly, harvesting vast amounts of unprocessed RS images in the global region. Therefore, how to retrieve the required images accurately and efficiently in a large-scale RS image database has become an increasingly noteworthy direction (Tang et al., 2018). Although content-based image retrieval (CBIR) methods have made good research progress in the fields of medical images, traditional images, and video images, there are fewer techniques for content-based remote sensing image retrieval (CBRSIR). CBRSIR mainly includes the extraction of effective features and metrics of inter-image similarity (Demir & Bruzzone, 2014; Lu & Man, 2013). The representative features of RS images are extracted validly, and all similar images are retrieved by measuring the similarity between the query image and the target image. Among them, the hash-based approximate nearest neighbor search method has been widely used because of its high query efficiency and low storage cost (Indyk & Motwani, 1998). The hash algorithm in image retrieval represents the useful features of RS images as a series of binary hash codes, so as to come to reduce the search cost.
Generally, there are broadly two types of image-based hashing methods: supervised hashing and unsupervised hashing. The supervised method mainly combines the feature vector information of the data and the labeled similarity information between data to learn the hash function. Currently, most of the work still revolves around supervised hash learning, and representative methods include supervised discrete hashing (SDH) (Shen et al., 2015), fast supervised hashing (FastH) (Lin et al., 2014), ranking-based supervised hashing (RSH) (Wang et al., 2013), and deep supervised hashing. However, the production of high-quality label information needs significant cost and manual labor in practical application. The unsupervised hashing method does not utilize any supervised information but uses the internal information of all images alone for hash learning. Representative methods include spectral hashing (SH) (Weiss et al., 2008), shift-invariant kernelized locality-sensitive hashing (SKLSH) (Raginsky & Lazebnik, 2009), iterative quantization hashing (ITQ) (Gong et al., 2012), and unsupervised hashing based on deep learning. Unsupervised hashing is simple and easy to implement in the learning process because the label information is not used, which saves the cost of data annotation. In reality, most RS data have no semantic labels, so unsupervised hashing is more effective in future practical applications, which is also the focus of this work.
At present the depth network’s strong fitting ability has become an effective means to extract influence features from the unsupervised hash. Since data has no label information, how to accurately construct the semantic similarity structure between data is still an open problem. Some methods use machine learning methods to construct pseudo-labels as a way to learn similarities, however, the semantic information is weak in pseudo-labels and cannot reach satisfactory performance results (Dong et al., 2020; Hu et al., 2017; Song et al., 2015). Shen et al. (2018) adopted the model and pseudo-label alternate optimization method to update the data similarity graph for the first time; the construction method has a large localization, which affects the search performance. In the traditional image domain, Lin et al. (2021) used a pre-trained CNN model to obtain the initial label matrix, and excavated the potential neighbor relations behind the features during the training process, then updated the similarity pairs in the labels.