Discriminative local features obtained(判别
性局部特征) from activations(激活) of convolutional neural networks(卷积神经网络) have proven to be essential for image retrieval. //To improve retrieval performance, many recent works aim to obtain more powerful and discriminative features. In this work, we propose a new attention layer(注意层) to assess(评估) the importance of local features(局部特征) and assign higher weights to those more discriminative(更具辨别力的特征). //Furthermore, we present(提出) a scale and mask module(掩膜模块) to filter out the meaningless local features and scale the major components. //This module(ˈmɑːdʒuːl) not only reduces the impact of the various scales of the major components in images(ˈɪmədʒəz) by scaling(缩放比例) them on the feature maps, but also filters out the redundant and confusing features with the MAX-Mask(通过最大遮罩滤除了冗余和混淆的特征). //Finally, the features are aggregated(集合) into the image representation(图像表示).Experimental evaluations demonstrate that the proposed method outperforms(优于) the state-of-the-art(最先进的) methods on standard image retrieval datasets(检索数据集).
Visual place recognition(位置识别) is challenging in the urban environment and is usually viewed as a large scale(skeɪl) image retrieval task. // The intrinsic challenges in place recognition(ˌrekəɡˈnɪʃn) exist that the confusing objects such as cars and trees frequently occur in the complex urban scene, and buildings with repetitive(重复性的) structures may cause over-counting(过多计数) and the burstiness problem degrading the image representations(图像质量下降的突发性问题).//To address these problems, we present an Attention-based Pyramid Aggregation Network (APANet), which is trained in an end-to-end(端到端) manner for place recognition.//One main component of APANet, the spatial pyramid pooling(空间金字塔池), can effectively encode the multi-size(多尺寸) buildings containing(包含) geo-information.//The other one, the attention block, is adopted as(作为) a region evaluator(区域评估器) for suppressing the confusing regional features(抑制混淆的区域特征) while(同时) highlighting the discriminative(区分性、辨别性) ones.//When testing(测试), we further propose a simple yet effective PCA power whitening strategy(功率白化策略), which significantly improves the widely used PCA whitening by reasonably limiting the impact of over-counting.//Experimental evaluations demonstrate that the proposed APANet outperforms the state-of-the-art methods on two place recognition benchmarks(两个场所识别基准), and generalizes well on standard image retrieval datasets(在标准图像检索数据集上具有很好的概括性。).
A scheme(一个方案) for the many-objective problem is proposed for feature selection in IDS.//An improved many-objective optimization algorithm(优化算法) (I-NSGA-III) is proposed.//I-NSGA-III can alleviate the imbalance problem in feature selection. Feature selection can improve classification accuracy and decrease the computational complexity(计算复杂度) of classification. Data features in intrusion detection systems (IDS) always present the problem of imbalanced classification in which some classifications only have a few instances(少数实例) while others have many instances. This imbalance can obviously(ˈɑ:bviəsli) limit classification efficiency, but few efforts have been made to address it. In this paper, a scheme for the many-objective problem is proposed for feature selection in IDS, which uses two strategies, namely, a special domination method and a predefined multiple targeted search(多目标搜索), for population evolution.// It can differentiate traffic not only between normal and abnormal(异常的) but also by abnormality type. //Based on our scheme, NSGA-III is used to obtain an adequate(ˈædɪkwət) feature subset(特征子集) with good performance. //An improved many-objective optimization algorithm (I-NSGA-III) is further proposed using a novel niche(niːʃ) preservation procedure(prəˈsiːdʒər). // It consists of a bias(ˈbaɪəs)-selection process(偏向选择过程) that selects the individual with the fewest selected features and a fit-selection process(拟合选择过程) that selects the individual with the maximum sum weight(权重) of its objectives.// Experimental results show that I-NSGA-III can alleviate the imbalance problem with higher classification accuracy for classes having fewer instances.// Moreover, it can achieve both higher classification accuracy and lower computational complexity.
Interpretation of users feedback is independent of(独立于) both the content features and relevance(相关性) feedback schemes(方案), and hence the proposed algorithm can be applicable to any content features and relevance feedback methods;//The RF interpretation is followed by a group of swarmed particles, acting as multiple agents(多个代理) rather than a single query(查询) image in searching for the desirable images(所需图像);//The proposed RF interpretation and learning is exploited not only in reweighting(重新加权) the content similarity measurement(相似性度量), but also in regrouping the database(ˈdeɪtəbeɪs) images. //While providing relevance feedback (RF) by users proves(被证明是) to be an effective method for content-based image retrieval, how to interpret and learn from the user-provided feedback, however, remains(仍然) an unsolved problem. //In this paper, we propose an integrated users-feedback and learning algorithm by screening(筛选) individual elements(个别元素) of content features and driving a group of swarmed particles inside the feature space to provide a possible solution.// In comparison with the existing approaches, the proposed algorithm achieves a number of advantages, which can be highlighted as: (i) interpretation of users' feedback is independent of both the content features and relevance feedback schemes, and hence the proposed algorithm can be applicable to any content features and relevance feedback methods; (ii) the RF interpretation is followed by a group of swarmed particles, acting as multiple(ˈmʌltɪpl) agents rather than a single query(ˈkwɪri) image in searching for the desirable(dɪˈzaɪərəbl) images; (iii) the proposed RF interpretation and learning is exploited not only in reweighting the content similarity measurement, but also in regrouping the database images. //Extensive experiments support that our proposed algorithm outperforms the existing representative(ˌreprɪˈzentətɪv) techniques(tɛkˈniks), providing good potential for further research and development for a wide range of content-based image retrieval applications.
Affective recognition is an important and challenging task for video content analysis. //Affective information in videos is closely related to the viewer's feelings and emotions. //Thus, video affective content analysis has a great potential value. However, most of the previous(ˈpriːviəs) methods are focused on how to effectively extract features from videos for affective analysis. //There are several issues are worth to be investigated(研究). For example, what information is used to express emotions in videos, and which information is useful to affect audiences' emotions.// Taking into account these issues(考虑到这些问题), in this paper, we proposed a new video affective content analysis method based on protagonist information via Convolutional Neural Network (CNN). //The proposed method is evaluated on the largest video emotion dataset and compared with some previous work. //The (元音)experimental results show that our proposed affective analysis method based on protagonist information achieves best performance in emotion classification and prediction.
The scale-invariant(不变的) feature transform(变换) (SIFT) feature(功能) plays a very important role in multimedia content analysis, such as near-duplicate(几乎重复的) image and video retrieval. //However, the storage and query (ˈkwɪri)costs of SIFT become unbearable for large-scale databases.// In this paper, SIFT features are robustly encoded with temporal information(基于时间信息的) by(通过) tracking the SIFT to generate temporal-concentration SIFT (TCSIFT), which highly compresses(压缩) the quantity of local(局部) features to reduce visual redundancy, and keeps the advantages of SIFT as much as possible at the same time. //On the basis of TCSIFT, a novel framework for large-scale video copy retrieval is proposed in which the processes(prəˈsesɪz) of retrieval and validation are implemented at the feature and frame level(帧层次). //Experimental results for two different datasets, i.e., CC_WEB_VIDEO and TRECVID, demonstrate that our method can yield comparable accuracy, compact storage size(紧凑存储容量), and more efficient execution(ˌeksɪˈkjuːʃn) time(执行时间), as well as adapt to various video transformations.