__Sunshine__

CVPR 2018 论文分享会

Deep Learning

Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization

Abstract
Global covariance pooling in convolutional neural networks has achieved impressive improvement over the classical first-order pooling. Recent works have shown matrix square root normalization plays a central role in achieving state-of-the-art performance. However, existing methods depend heavily on eigendecomposition (EIG) or singular value decomposition (SVD), suffering from inefficient training due to limited support of EIG and SVD on GPU. Towards addressing this problem, we propose an iterative matrix square root normalization method for fast end-toend training of global covariance pooling networks. At the core of our method is a meta-layer designed with loopembedded directed graph structure. The meta-layer consists of three consecutive nonlinear structured layers, which perform pre-normalization, coupled matrix iteration and post-compensation, respectively. Our method is much faster than EIG or SVD based ones, since it involves only matrix multiplications, suitable for parallel implementation on GPU. Moreover, the proposed network with ResNet architecture can converge in much less epochs, further accelerating network training. On large-scale ImageNet, we achieve competitive performance superior to existing counterparts. By finetuning our models pre-trained on ImageNet, we establish state-of-the-art results on three challenging finegrained benchmarks. The source code and network models will be available at http://www.peihuali.org/iSQRT-COV.

Introduction
Deep convolutional neural networks (ConvNets) have made significant progress in the past years, achieving recognition accuracy surpassing human beings in large-scale object recognition [7]. The ConvNet models pre-trained on ImageNet [5] have been proven to benefit a multitude of other computer vision tasks, ranging from fine-grained visual categorization (FGVC) [25], object detection [28], semantic segmentation [26] to scene parsing [37], where labeled data are insufficient for training from scratch. The common layers such as convolution, non-linear rectification, pooling and batch normalization [11] have become offthe-shelf commodities, widely supported on devices including workstations, PCs and embedded systems.

Although the architecture of ConvNet has greatly evolved in the past years, its basic layers largely keep unchanged [19, 18]. Recently, researchers have shown increasing interests in exploring structured layers to enhance representation capability of networks [12, 25, 1, 22]. One particular kind of structured layer is concerned with global covariance pooling after the last convolution layer, which has shown impressive improvement over the classical firstorder pooling, successfully used in FGVC [25], visual question answering [15] and video action recognition [34]. Very recent works have demonstrated that matrix square root normalization of global covariance pooling plays a key role in achieving state-of-the-art performance in both large-scale visual recognition [21] and challenging FGVC [24, 32].

For computing matrix square root, existing methods depend heavily on eigendecomposition (EIG) or singular value decomposition (SVD) [21, 32, 24]. However, fast implementation of EIG or SVD on GPU is an open problem, which is limitedly supported on NVIDIA CUDA platform, significantly slower than their CPU counterparts [12, 24]. As such, existing methods opt for EIG or SVD on CPU for computing matrix square root. Nevertheless, current implementations of meta-layers depending on CPU are far from ideal, particularly for multi-GPU configuration. Since GPUs with powerful parallel computing ability have to be interrupted and await CPUs with limited parallel ability, their concurrency and throughput are greatly restricted.

In [24], for the purpose of fast forward propagation (FP), Lin and Maji use Newton-Schulz iteration (called modified Denman-Beavers iteration therein) algorithm, which is proposed in [9], to compute matrix square-root. Unfortunately, for backward propagation (BP), they compute the gradient through Lyapunov equation solution which depends on the GPU unfriendly Schur-decomposition (SCHUR) or EIG. Hence, the training in [24] is expensive though FP which involves only matrix multiplication runs very fast. Inspired by that work, we propose a fast end-to-end training method, called iterative matrix square root normalization of covariance pooling (iSQRT-COV), depending on Newton-Schulz iteration in both forward and backward propagations.

At the core of iSQRT-COV is a meta-layer with loopembedded directed graph structure, specifically designed for ensuring both convergence of Newton-Schulz iteration and performance of global covariance pooling networks. The meta-layer consists of three consecutive structured layers, performing pre-normalization, coupled matrix iteration and post-compensation, respectively. We derive the gradients associated with the involved non-linear layers based on matrix backpropagation theory [12]. The design of sandwiching Newton-Schulz iteration using pre-normalization by Frobenius norm or trace and post-compensation is essential, which, as far as we know, did not appear in previous literature (e.g. in [9] or [24] ). The pre-normalization guarantees convergence of Newton-Schulz (NS) iteration, while post-compensation plays a key role in achieving state-ofthe-art performance with prevalent deep ConvNet architectures, e.g. ResNet [8]. The main differences between our method and other related works1 are summarized in Tab. 1.

Figure 1. Proposed iterative matrix square root normalization of covariance pooling (iSQRT-COV) network. After the last convolution layer, we perform second-order pooling by estimating a covariance matrix. We design a meta-layer with loop-embedded directed graph structure for computing approximate square root of covariance matrix. The meta-layer consists of three nonlinear structured layers, performing pre-normalization, coupled Newton-Schulz iteration and post-compensation, respectively. See Sec. 3 for notations and details.

Conclusion
We presented an iterative matrix square root normalization of covariance pooling (iSQRT-COV) network which can be trained end-to-end. Compared to existing works depending heavily on GPU unfriendly EIG or SVD, our method, based on coupled Newton-Schulz iteration [9], runs much faster as it involves only matrix multiplications, suitable for parallel implementation on GPU. We validated our method on both large-scale ImageNet dataset and challenging fine-grained benchmarks. Given efficiency and promising performance of our iSQRT-COV, we hope global covariance pooling will be a promising alternative to global average pooling in other deep network architectures, e.g., ResNeXt [36], Inception [11] and DenseNet [10].

Interleaved Group Convolutions for Deep Neural Networks

Abstract
In this paper, we present a simple and modularized neural network architecture, named interleaved group convolutional neural networks (IGCNets). The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution. The two group convolutions are complementary: (i) the convolution on each partition in primary group convolution is a spatial convolution, while on each partition in secondary group convolution, the convolution is a point-wise convolution; (ii) the channels in the same secondary partition come from different primary partitions. We discuss one representative advantage: Wider than a regular convolution with the number of parameters and the computation complexity preserved. We also show that regular convolutions, group convolution with summation fusion, and the Xception block are special cases of interleaved group convolutions. Empirical results over standard benchmarks, CIFAR-10, CIFAR-100, SVHN and ImageNet demonstrate that our networks are more efficient in using parameters and computation complexity with similar or higher accuracy.

Introduction
Architecture design in deep convolutional neural networks has been attracting increasing interests. The basic design purpose is efficient in terms of computation and parameter with high accuracy. Various design dimensions have been considered, ranging from small kernels [15, 35, 33, 4, 14], identity mappings [10] or general multi-branch structures [38, 42, 22, 34, 35, 33] for easing the training of very deep networks, and multi-branch structures for increasing the width [34, 4, 14].

Our interest is to reduce the redundancy of convolutional kernels. The redundancy comes from two extents: the spatial extent and the channel extent. In the spatial extent, small kernels are developed, such as 3 × 3, 3 × 1, 1 × 3 [35, 29, 17, 26, 18]. In the channel extent, group convolutions [42, 40] and channel-wise convolutions or separable filters [28, 4, 14], have been studied. Our work belongs to the kernel design in the channel extent.

In this paper, we present a novel network architecture, which is a stack of interleaved group convolution (IGC) blocks. Each block contains two group convolutions: primary group convolution and secondary group convolution, which are conducted on primary and secondary partitions, respectively. The primary partitions are obtained by simply splitting input channels, e.g., L partitions with each containing M channels, and there are M secondary partitions, each containing L channels that lie in different primary partitions. The primary group convolution performs the spatial convolution over each primary partition separately, and the secondary group convolution performs a 1 × 1 convolution (point-wise convolution) over each secondary partition, blending the channels across partitions outputted by primary group convolution. Figure 1 illustrates the interleaved group convolution block.

It is known that a group convolution is equivalent to a regular convolution with sparse kernels: there is no connections across the channels in different partitions. Accordingly, an IGC block is equivalent to a regular convolution with the kernel composed from the product of two sparse kernels, resulting in a dense kernel. We show that under the same number of parameters/computation complexity, an IGC block (except the extreme case that the number of primary partitions, L, is 1) is wider than a regular convolution with the spatial kernel size same to that of primary group convolution. Empirically, we also observe that a network built by stacking IGC blocks under the same computation complexity and the same number of parameters performs better than the network with regular convolutions.

We study the relations with existing related modules. (i) The regular convolution and group convolution with summation fusion [40, 42, 38], are both interleaved group convolutions, where the kernels are in special forms and are fixed in secondary group convolution. (ii) An IGC block in the extreme case where there is only one partition in the secondary group convolution, is very close to Xception [4].

Our main contributions are summarized as follows.
• We present a novel building block, interleaved group convolutions, which is efficient in parameter and computation.
• Weshowthattheproposedbuildingblockiswiderthan a regular group convolution while keeping the network size and computational complexity, showing superior empirical performance.
• We discuss the connections to regular convolutions, the Xception block [4], and group convolution with summation fusion, and show that they are specific instances of interleaved group convolutions.

Figure 1. Illustrating the interleaved group convolution, with L = 2 primary partitions and M = 3 secondary partitions. The convolution for each primary partition in primary group convolution is spatial. The convolution for each secondary partition in secondary group convolution is point-wise (1 × 1). Details are given in Section 3.1.

Figure 2. (a) Regular convolution. (b) Four-branch representation of the regular convolution. The shaded part in (b), we call crosssummation, is equivalent to a three-step transformation: permutation, secondary group convolution, and permutation back.

Conclusion
In this paper, we present a novel convolutional neural network architecture, which addresses the redundancy problem of convolutional filters in the channel domain. The main novelty lies in an interleaved group convolution block: channels in the same partition in the secondary group convolution come from different partitions used in the primary group convolution. Experimental results demonstrate that our network is efficient in parameter and computation.

Partial Transfer Learning with Selective Adversarial Networks

Abstract
Adversarial learning has been successfully embedded into deep networks to learn transferable features, which reduce distribution discrepancy between the source and target domains. Existing domain adversarial networks assume fully shared label space across domains. In the presence of big data, there is strong motivation of transferring both classification and representation models from existing big domains to unknown small domains. This paper introduces partial transfer learning, which relaxes the shared label space assumption to that the target label space is only a subspace of the source label space. Previous methods typically match the whole source domain to the target domain, which are prone to negative transfer for the partial transfer problem. We present Selective Adversarial Network (SAN), which simultaneously circumvents negative transfer by selecting out the outlier source classes and promotes positive transfer by maximally matching the data distributions in the shared label space. Experiments demonstrate that our models exceed stateof-the-art results for partial transfer learning tasks on several benchmark datasets.

Introduction
Deep networks have significantly improved the state of the art for a wide variety of machine learning problems and applications. At the moment, these impressive gains in performance come only when massive amounts of labeled data are available. Since manual labeling of sufficient training data for diverse application domains on-the-fly is often prohibitive, for problems short of labeled data, there is strong motivation to establishing effective algorithms to reduce the labeling consumption, typically by leveraging off-the-shelf labeled data from a different but related source domain. This promising transfer learning paradigm, however, suffers from the shift in data distributions across different domains, which poses a major obstacle in adapting classification models to target tasks [22].
Existing transfer learning methods assume shared label space and different feature distributions across the source and target domains. These methods bridge different domains by learning domain-invariant feature representations without using target labels, and the classifier learned from source domain can be directly applied to target domain. Recent studies have revealed that deep networks can learn more transferable features for transfer learning [4, 29], by disentangling explanatory factors of variations behind domains. The latest advances have been achieved by embedding transfer learning in the pipeline of deep feature learning to extract domain-invariant deep representations [26, 15, 6, 27, 17].

In the presence of big data, we can readily access large-scale labeled datasets such as ImageNet-1K. Thus, a natural ambition is to directly transfer both the representation and classification models from large-scale dataset to our target dataset, such as Caltech-256, which are usually small-scale and with unknown categories at training and testing time. From big data viewpoint, we can assume that the large-scale dataset is big enough to subsume all categories of the small-scale dataset. Thus, we introduce a novel partial transfer learning problem, which assumes that the target label space is a subspace of the source label space. As shown in Figure 1, this new problem is more general and challenging than standard transfer learning, since outlier source classes (“sofa”) will result in negative transfer when discriminating the target classes (“soccer-ball” and “binoculars”). Thus, matching the whole source and target domains as previous methods is not an effective solution to this new problem.

This paper presents Selective Adversarial Networks (SAN), which largely extends the ability of deep adversarial adaptation [6] to address partial transfer learning from big domains to small domains. SAN aligns the distributions of source and target data in the shared label space and more importantly, selects out the source data in the outlier source classes. A key improvement over previous methods is the capability to simultaneously promote positive transfer of relevant data and alleviate negative transfer of irrelevant data, which can be trained in an end-to-end framework. Experiments show that our models exceed state-of-the-art results for partial transfer learning on public benchmark datasets.

Figure 2: The architecture of the proposed Selective Adversarial Networks (SAN) for partial transfer learning, where f is the extracted deep features, yˆ is the predicted data label, and dˆ is the predicted
domain label; Gf is the feature extractor, Gy and Ly are the label predictor and its loss, Gkd and Lkd are the domain discriminator and its loss; GRL stands for Gradient Reversal Layer. The blue part shows the class-wise adversarial networks (|Cs| in total) designed in this paper. Best viewed in color.

Conclusion

This paper presented a novel selective adversarial network approach to partial transfer learning. Unlike previous adversarial adaptation methods that match the whole source and target domains based on the shared label space assumption, the proposed approach simultaneously circumvents negative transfer by selecting out the outlier source classes and promotes positive transfer by maximally matching the data distributions in the shared label space. Our approach successfully tackles partial transfer learning where source label space subsumes target label space, which is testified by extensive experiments.

Weakly Supervised Coupled Networks for Visual Sentiment Analysis

Abstract
Automatic assessment of sentiment from visual content has gained considerable attention with the increasing tendency of expressing opinions on-line. In this paper, we solve the problem of visual sentiment analysis using the high-level abstraction in the recognition process. Existing methods based on convolutional neural networks learn sentiment representations from the holistic image appearance. However, different image regions can have a different influence on the intended expression. This paper presents a weakly supervised coupled convolutional network with two branches to leverage the localized information. The first branch detects a sentiment specific soft map by training a fully convolutional network with the cross spatial pooling strategy, which only requires image-level labels, thereby significantly reducing the annotation burden. The second branch utilizes both the holistic and localized information by coupling the sentiment map with deep features for robust classification. We integrate the sentiment detection and classification branches into a unified deep framework and optimize the network in an end-to-end manner. Extensive experiments on six benchmark datasets demonstrate that the proposed method performs favorably against the state-ofthe-art methods for visual sentiment analysis.

Introduction
Visual sentiment analysis from images has attracted significant attention with the increasing tendency of expressing opinions through posting images on social media like Flickr and Twitter. The automatic assessment of image sentiment has many applications, e.g. education, entertainment, advertisement, etc. Recently, with the advances of convolutional neural networks (CNNs), numerous deep approaches have been proposed to predict sentiment [20,31]. The effectiveness of machine learning based deep features have been demonstrated over hand-crafted features (e.g. color, texture, and composition) [17, 28, 34]) on visual sentiment prediction. However, several issues remain when using CNNs to address such an abstract task as follows.

First, visual sentiment analysis is more challenging than conventional recognition tasks due to a higher level of subjectivity in the human recognition process [13]. It is necessary to take more cues into consideration for visual sentiment prediction. Figure 1 shows examples from the EmotionROI dataset [21], which provides the bounding box annotations that invoke sentiment from 15 users. As can be seen, humans’ emotional responses to images are determined by local regions [29]. However, most existing methods employ CNNs to learn feature representations only from entire images [4, 30]. Second, providing more precise annotations (e.g. bounding boxes [11]) than image-level labeling for training generally leads to better performance for recognition tasks.

However, there are two limitations for visual sentiment classification. On the one hand, the increased annotation cost prevents it from widespread use, especially for such a subjective task; on the other hand, different regions contribute differently to the viewer’s evoked sentiment, while crisp proposal boxes only tend to find the foreground objects in an image.
To address these problems, we propose a weakly supervised coupled framework (WSCNet) for joint sentiment detection and classification with two branches. The first branch is designed to generate region proposals evoking sentiment. Instead of extracting multiple crisp proposal boxes, we use a soft sentiment map to represent the probability of evoking the sentiment for each receptive field. In detail, we make use of a Fully Convolutional Network (FCN) followed by the proposed cross-spatial pooling strategy to preserve the spatial information of the convolutional feature maps. Based on this, the sentiment map is generated and utilized to highlight the regions of interest that are informative for classification. The second branch captures the localized representation by coupling the sentiment map with the deep features, which is then combined with the holistic representation to provide a more semantic vector. During the end-to-end training process, our approach only requires image-level sentiment labeling, which significantly reduces the annotation burden.

Our contributions are summarized as follows: First, we present a weakly supervised coupled network to integrate visual sentiment classification and detection into a unified CNN framework, which learns the discriminative representation for visual sentiment analysis in an end-to-end manner. Second, we exploit the sentiment map to provide imagespecific localized information with only the image-level label, with which both holistic and localized representations are fused for robust sentiment classification. Our proposed framework performs favorably against the state-of-the-art methods and off-the-shelf CNN classifiers on six benchmark datasets for visual sentiment analysis.

Figure 1. Examples from the EmotionROI dataset [21]. The normalized bounding boxes indicate the regions that influence the evoked sentiments annotated by 15 users. The first two examples are joy images, and the last two examples are sadness and fear images, respectively. As can be seen, the sentiments can be evoked by specific regions.

Figure 2. Illustration of the proposed WSCNet for visual sentiment analysis. The input image is first fed into the convolutional layers of FCN ResNet-101, and the response feature maps with good spatial resolution are then delivered into two branches. The detection branch employs the cross-spatial pooling strategy to summarize all the information contained in the feature maps for each class. The end-to-end training results in the sentiment map, which is then coupled with the conv feature maps in the classification branch capturing the localized information. Finally, both holistic and localized representations are fused as a semantic vector for sentiment classification.

Figure 3. Overview of the sentiment map generation. The predicted class scores of the input image are mapped back to the classification branch to generate the sentiment map, which can highlight comprehensive sentiment regions.

Figure 5. Detected sentiment map of the proposed WSCNet on the EmotionROI. Given the input (a) with ground truth (b), the detection result and the metrics are shown in (c). The class activation maps and the corresponding predicted scores are given in (d).

Figure 6. Weakly supervised detection results using different methods on the EmotionROI testing set. The input images and the ground truth are given in (a) and (b). The detected regions and metrics of weakly-supervised methods (i.e. CAM, SPN, ours) are shown in the last three columns. By activating the sentiment-related areas, our method is more accurate to the ground truth.

Conclusions
This paper addresses the problem of visual sentiment analysis based on convolutional neural networks, where the sentiments are predicted using multiple affective cues. We present WSCNet, an end-to-end weakly supervised deep architecture, which consists of two branches for discriminative representations learning. The detection branch is designed to automatically exploit the sentiment map, which can provide the localized information of the affective images. Then the classification branch leveraging both holistic and localized representations can predict the sentiments. Experimental results show the effectiveness of our method against the state-of-the-art on six benchmark datasets.

GAN and Synthesis

DA-GAN: Instance-level Image Translation by Deep Attention Generative Adversarial Networks

Abstract
Unsupervised image translation, which aims in translating two independent sets of images, is challenging in discovering the correct correspondences without paired data. Existing works build upon Generative Adversarial Network (GAN) such that the distribution of the translated images are indistinguishable from the distribution of the target set. However, such set-level constraints cannot learn the instance-level correspondences (e.g. aligned semantic parts in object configuration task). This limitation often results in false positives (e.g. geometric or semantic artifacts), and further leads to mode collapse problem. To address the above issues, we propose a novel framework for instance-level image translation by Deep Attention GAN (DA-GAN). Such a design enables DA-GAN to decompose the task of translating samples from two sets into translating instances in a highly-structured latent space. Specifically, we jointly learn a deep attention encoder, and the instancelevel correspondences could be consequently discovered through attending on the learned instance pairs. Therefore, the constraints could be exploited on both set-level and instance-level. Comparisons against several state-ofthe-arts demonstrate the superiority of our approach, and the broad application capability, e.g, pose morphing, data augmentation, etc., pushes the margin of domain translation problem.

Introduction
Can machines possess human ability to relate different image domains and translate them? This question can be formulated as image translation problem. In other words, learning a mapping function, by finding some underlying correspondences (e.g. similar semantics), from one image domain to the other. Years of research have produced powerful translation systems in supervised setting, where example pairs are available, e.g. [14]. However, obtaining paired training data is difficult and expensive.

Therefore, researchers turned to develop unsupervised learning approach which only relies on unpaired data. In the unsupervised setting, we only have two independent sets of samples. The lacking of pairing relationship makes it considered harder in finding the correct correspondences, and therefore it is much more challenging. Existing works typically build upon Generative Adversarial Network (GAN) such that the distribution of the translated samples is indistinguishable from the distribution of the target set. However, we point out that data itself is structured. Such set-level constraint impedes them from finding meaningful instance-level correspondences. By ’instance-level correspondences’, we refer to high-level content involving identifiable objects that shared by a set of samples. These identifiable objects could be adaptively task driven. For example, in Figure 1 (a), the words in the description corresponds to according parts and attributes of the bird image. Therefore, false positives often occur because of the instance-level correspondences missing in existing works. For example, in object configuration, the results just showing changes of color and texture, while fail in geometry changes (Figure 1). In text-to-image synthesis, fine-grained details are often missing (Figure 1).

Driven by this important issue, a question arises: Can we seek an algorithm which is capable of finding meaningful correspondences from both set-level and instance-level under unsupervised setting? To resolve this issue, in this paper, we introduce a dedicated unsupervised domain translation approach builds upon Generative Adversarial Network DA-GAN, which success in a large variety of translating tasks, and achieve visually appealing results.

To achieve these results, we have to address two fundamental challenges: First, how to exploit instance-level constraints while lacking correct pairing relationship in unsupervised setting. We take on this challenge and provide the first solution by decomposing the task of translating samples from two independent sets into translating instances in a highly-structured latent space. Specifically, we integrate the attention mechanism into the learning of the mapping function F , and a compound loss that consists of a consistency term, a symmetry term and a multi-adversarial term is used. Through attending on meaningful correspondences of samples on instance-level, the learned Deep Attention Encoder (DAE) projects samples in a latent space. Then the constraint on instance-level could be exploited in the latent space. We introduce a consistency loss to require the translated samples correspond to correct semantics with samples from the source domain in the latent space. To further enhance the constraint, we also consider the samples from the target domain by adding a symmetry loss that encourages the one-to-one mapping of F. As a result, the instancelevel constraints enable the mapping function to find the meaningful semantic corresponding, and therefore producing true positives and visually appealing results.

Second, how to further strengthen the constraints on set level such that the mode collapse problem could be mitigate. In practical, all input samples will map to the same sample, and optimization fails to make progress. To address this issue, we introduce a multi-adversarial training procedure to encourage different modes achieve fair possibility mass distribution during training and thus providing an effective solution to encourage the mapping function could cover all modes in the target domain, and make progress to achieve the optimal. Our main contributions can be summarized into three-fold:
• We decompose the task to instance-level image translation such that the constraints could be exploited on both instance-level and set-level by adopting the pro-
posed compound loss.
• To the best of our knowledge, we are the first that inte-
grate the attention mechanism into Generative Adver-
sarial Network.
• WeintroduceanovelframeworkDA-GAN,whichpro-
duces visually appealing results and is applicable in a large variety of tasks.

Figure 1: (a) text-to-image generation. (b) object configuration. We can observe that the absence of instance-level correspondences results in both semantic artifacts (labeled by red boxes) exist in StackGAN and geometry artifacts exist in CycleGAN. Our approach successfully produces the correct correspondences (labeled by yellow boxes) because of the proposed instance-level translating. Details can be found in Sec. 1

Figure 2: A pose morphing example for illustration the pipeline of DA-GAN. Given two images of birds from source domain S and target domain T, the goal of pose morphing is to translate the pose of source bird s into the pose of target one t, while still remain the identity of s. The feed-ward process is shown in (a), where two input images are fed into DAE which projects them into a latent space (labeled by dashed box). Then G takes these highly-structured representations (DAE(s) and DAE(t)) from the latent space to generated the translated samples, i.e.s′ = G(DAE(s)), t′ = G(DAE(t)). The details of the proposed DAE (labeled by orange block) is shown in (b). Given an image X, a localization function floc will first predict N attention regions’ coordinates from the feature map of X, (i.e. E(X), where E is an encoder, which can be utilized in any form). Then N attention masks are generated and activated on X to produce N attention regions {Ri}Ni=1. Finally, each region’s feature consists the instance-level representations {Insti}Ni=1. By operating the same way on both S and T, the instance-level correspondences can consequently be found in the latent space. We exploit constraints on both instance-level and set-level for optimization, it is illustrated in (c). All of the notations are listed in (d). [Best viewed in color.]

Figure 4: The attention locations predicted by DAE on birds images and face images from.

Conclusion
In this paper, we propose a novel framework for unsupervised image translation. Our intuition is to decompose the task of translating samples from two sets into translating instances in a highly-structured latent space. The instance-level corresponding could then be found by integrating attention mechanism into GAN. Extensive quantitative and qualitative results validate that, the proposed DA-GAN can significantly improve the state-of-the-arts for image-to-image translation. It is superiority in scalable for broader application, and succeeds in generating visually appealing images. We find that, some failure cases are caused by the incorrect attention results. It is because the instances are learned by a weak supervised attention mechanism, which some time showing a large gap with that learned under fully supervision. To tackle this challenge we may seek for more robust and effective algorithm in the future.

Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition

Attentive Generative Adversarial Network for Raindrop Removal from a Single Image

Abstract
Raindrops adhered to a glass window or camera lens can severely hamper the visibility of a background scene and degrade an image considerably. In this paper, we address the problem by visually removing raindrops, and thus transforming a raindrop degraded image into a clean one. The problem is intractable, since first the regions occluded by raindrops are not given. Second, the information about the background scene of the occluded regions is completely lost for most part. To resolve the problem, we apply an attentive generative network using adversarial training. Our main idea is to inject visual attention into both the generative and discriminative networks. During the training, our visual attention learns about raindrop regions and their surroundings. Hence, by injecting this information, the generative network will pay more attention to the raindrop regions and the surrounding structures, and the discriminative network will be able to assess the local consistency of the restored regions. This injection of visual attention to both generative and discriminative networks is the main contribution of this paper. Our experiments show the effectiveness of our approach, which outperforms the state of the art methods quantitatively and qualitatively.

Introduction
Raindrops attached to a glass window, windscreen or lens can hamper the visibility of a background scene and degrade an image. Principally, the degradation occurs because raindrop regions contain different imageries from those without raindrops. Unlike non-raindrop regions, rain
drop regions are formed by rays of reflected light from a wider environment, due to the shape of raindrops, which is similar to that of a fish-eye lens. Moreover, in most cases, the focus of the camera is on the background scene, making the appearance of raindrops blur.

In this paper, we address this visibility degradation problem. Given an image impaired by raindrops, our goal is to remove the raindrops and produce a clean background as shown in Fig. 1. Our method is fully automatic. We consider that it will benefit image processing and computer vision applications, particularly for those suffering from raindrops, dirt, or similar artifacts.

A few methods have been proposed to tackle the raindrop detection and removal problems. Methods such as [17, 18, 12] are dedicated to detecting raindrops but not removing them. Other methods are introduced to detect and remove raindrops using stereo [20], video [22, 25], or specifically designed optical shutter [6], and thus are not applicable for a single input image taken by a normal camera. A method by Eigen et al. [1] has a similar setup to ours. It attempts to remove raindrops or dirt using a single image via deep learning method. However, it can only handle small raindrops, and produce blurry outputs [25]. In our experimental results (Sec. 6), we will find that the method fails to handle relatively large and dense raindrops.

In contrast to [1], we intend to deal with substantial presence of raindrops, like the ones shown in Fig. 1. Generally, the raindrop-removal problem is intractable, since first the regions which are occluded by raindrops are not given. Second, the information about the background scene of the occluded regions is completely lost for most part. The problem gets worse when the raindrops are relatively large and distributed densely across the input image. To resolve the problem, we use a generative adversarial network, where our generated outputs will be assessed by our discriminative network to ensure that our outputs look like real images. To deal with the complexity of the problem, our generative network first attempts to produce an attention map. This attention map is the most critical part of our network, since it will guide the next process in the generative network to focus on raindrop regions. This map is produced by a recurrent network consisting of deep residual networks (ResNets) [8] combined with a convolutional LSTM [21] and a few standard convolutional layers. We call this attentive-recurrent network.

The second part of our generative network is an autoencoder, which takes both the input image and the attention map as the input. To obtain wider contextual information, in the decoder side of the autoencoder, we apply multi-scale losses. Each of these losses compares the difference between the output of the convolutional layers and the corresponding ground truth that has been downscaled accordingly. The input of the convolutional layers is the features from a decoder layer. Besides these losses, for the final output of the autoencoder, we apply a perceptual loss to obtain a more global similarity to the ground truth. This final output is also the output of our generative network.

Having obtained the generative image output, our discriminative network will check if it is real enough. Like in a few inpainting methods (e.g. [9, 13]), our discriminative network validates the image both globally and locally. However, unlike the case of inpainting, in our problem and particularly in the testing stage, the target raindrop regions are not given. Thus, there is no information on the local regions that the discriminative network can focus on. To address this problem, we utilize our attention map to guide the discriminative network toward local target regions.

Overall, besides introducing a novel method of raindrop removal, our other main contribution is the injection of the attention map into both generative and discriminative networks, which is novel and works effectively in removing raindrops, as shown in our experiments in Sec. 6. We will release our code and dataset.

The rest of the paper is organized as follows. Section 2 discusses the related work in the fields of raindrop detection and removal, and in the fields of the CNN-based image inpainting. Section 3 explains the raindrop model in an image, which is the basis of our method. Section 4 describes our method, which is based on the generative adversarial network. Section 5 discusses how we obtain our synthetic and real images used for training our network. Section 6 shows our evaluations quantitatively and qualitatively. Finally, Section 7 concludes our paper.

Figure 2. The architecture of our proposed attentive GAN.The generator consists of an attentive-recurrent network and a contextual autoencoder with skip connections. The discriminator is formed by a series of convolution layers and guided by the attention map. Best viewed in color.

Figure 4. The architecture of our contextual autoencoder. Multiscale loss and perceptual loss are used to help train the autoencoder.

Conclusion
We have proposed a single-image based raindrop removal method. The method utilizes a generative adversarial network, where the generative network produces the attention map via an attentive-recurrent network and applies this map along with the input image to generate a raindrop-free image through a contextual autoencoder. Our discriminative network then assesses the validity of the generated output globally and locally. To be able to validate locally, we inject the attention map into the network. Our novelty lies on the use of the attention map in both generative and discriminative network. We also consider that our method is the first method that can handle relatively severe presence of raindrops, which the state of the art methods in raindrop removal fail to handle.

Person Re-identification

Multi-shot Pedestrian Re-identification via Sequential Decision Making

Abstract
Multi-shot pedestrian re-identification problem is at the core of surveillance video analysis. It matches two tracks of pedestrians from different cameras. In contrary to existing works that aggregate single frames features by time series model such as recurrent neural network, in this paper, we propose an interpretable reinforcement learning based approach to this problem. Particularly, we train an agent to verify a pair of images at each time. The agent could choose to output the result (same or different) or request another pair of images to verify (unsure). By this way, our model implicitly learns the difficulty of image pairs, and postpone the decision when the model does not accumulate enough evidence.

Moreover, by adjusting the reward for unsure action, we can easily trade off between speed and accuracy. In three open benchmarks, our method are competitive with the state-of-the-art methods while only using 3% to 6% images. These promising results demonstrate that our method is favorable in both efficiency and performance.

Introduction
Pedestrian Re-identification (re-id) aims at matching pedestrians in different tracks from multiple cameras. It helps to recover the trajectory of a certain person in a broad area across different non-overlapping cameras. Thus, it is a fundamental task in a wide range of applications such as video surveillance for security and sports video analysis. The most popular setting for this task is single shot re-id, which judges whether two persons at different video frames are the same one. This setting has been extensively studied in recent years[7, 1, 16, 28, 17]. On the other hand, multishot re-id (or a more strict setting, video based re-id) is a more realistic setting in practice, however it is still at its early age compared with single shot re-id task.

Currently, the main stream of solving multi-shot re-id task is first to extract features from single frames, and then aggregate these image level features. Consequently, the key lies in how to leverage the rich yet possibly redundant and noisy information resided in multiple frames to build track level features from image level features. A common choice is pooling[37] or bag of words[38]. Furthermore, if the input tracks are videos (namely, the temporal order of frames is preserved), optical flow[5] or recurrent neural network (RNN)[24, 39] are commonly adopted to utilize the motion cues. However, most of these methods have two main problems: the first one is that it is computationally inefficient to use all the frames in each track due to the redundancy. The second one is there could be noisy frames caused by occlusion, blur or incorrect detections. These noisy frames may significantly deteriorate the performance.

To solve the aforementioned problems, we formulate multi-shot re-id problem as a sequential decision making task. Intuitively, if the agent is confident enough about existing evidences, it could output the result immediately. Otherwise, it needs to ask for another pair to verify. To model such human like decision process, we feed a pair of images from the two tracks to a verification agent at each time step. Then, the agent could output one of three actions: same, different or unsure. By adjusting the rewards of these three actions, we could trade off between the number of images used and final accuracy. We depict several examples in Fig. 1. In case of easy examples, the agent could decide using only one pair of images, while when the cases are hard, the agent chooses to see more pairs to accumulate evidences. In contrast to previous works that explicitly deduplicate redundant frames[6] or distinguish high quality from low quality frames[21], our method could implicitly consider these factors in a data driven end-to-end manner. Moreover, our method is general enough to accommodate all single shot re-id methods as image level feature extractor even those non-deep learning based methods.

The main contributions of our work are listed as following:
• We are the first to introduce reinforcement learning into multi-shot re-id problem. We train an agent to either output results or request to see more samples. Thus, the agent could early stop or postpone the decision as needed. Thanks to this behavior, we could balance speed and accuracy by only adjusting the rewards.
• We verify the effectiveness and efficiency on three popular multi-shot re-id datasets. Along with the deliberately designed image feature extractor, our method could outperform the state-of-the-art methods while only using 3% to 6% images without resorting to other post-processing or additional metric learning methods.
• We empirically demonstrate that the Q function could implicitly indicate the difficulties of samples. This desirable property makes the results of our method more interpretable.

Figure 1: Examples to demonstrate the motivation of our work. For most tracks, several even only one pair of images are enough to make confident prediction. However, in other hard cases, it is necessary to use more pairs to alleviate the influence of these samples of bad quality.

Figure 2: An illustration of our proposed method. Firstly we train an image level feature extractor (the left part) and then aggregate sequence level feature with an agent (the right part). The agent takes several kinds of features of one pair of images,
and take one of three possible actions. If the taken action is “unsure”, the above process is repeated again.

Conclusion
In this paper we have introduced a novel approach for multi-shot pedestrian re-identification problem by casting it as a pair by pair decision making process. Thanks to reinforcement learning, we could train an agent for such task. Specifically, it receives image pairs sequentially, and output one of the three actions: same, different or unsure. By early stop or decision postponing, the agent could adjust the budget needs to make confident decision according to the difficulties of the tracks.

We have tested our method on three different multi-shot pedestrian re-id datasets. Experimental results have shown our model can yield competitive or even better results with state-of-the-art methods using only 3% to 6% of images. Furthermore, the Q values outputted by the agent is a good indicator of the difficulty of image pairs, which makes our decision process is more interpretable.

Currently, the weight for each frame is determined by the Q value heuristically, which means the weight is not guided fully by the final objective function. More advanced mechanism such as attention can be easily incorporated into our framework. We leave this as our future work.

Person Transfer GAN to Bridge Domain Gap for Person Re-Identification

Although the performance of person Re-Identification (ReID) has been significantly boosted, many challenging issues in real scenarios have not been fully investigated, e.g., the complex scenes and lighting variations, viewpoint and pose changes, and the large number of identities in a camera network. To facilitate the research towards conquering those issues, this paper contributes a new dataset called MSMT17 with many important features, e.g., 1) the raw videos are taken by an 15-camera network deployed in both indoor and outdoor scenes, 2) the videos cover a long period of time and present complex lighting variations, and 3) it contains currently the largest number of annotated identities, i.e., 4,101 identities and 126,441 bounding boxes. We also observe that, domain gap commonly exists between datasets, which essentially causes severe performance drop when training and testing on different datasets. This results in that available training data cannot be effectively leveraged for new testing domains. To relieve the expensive costs of annotating new training samples, we propose a Person Transfer Generative Adversarial Network (PTGAN) to bridge the domain gap. Comprehensive experiments show that the domain gap could be substantially narrowed-down by the PTGAN.

Introduction
Person Re-Identification (ReID) targets to match and return images of a probe person from a large-scale gallery set collected by camera networks. Because of its important applications in security and surveillance, person ReID has been drawing lots of attention from both academia and industry. Thanks to the development of deep learning and the availability of many datasets, person ReID performance has been significantly boosted. For example, the Rank-1 accuracy of single query on Market1501 [38] has been improved from 43.8% [21] to 89.9% [30]. The Rank-1 accuracy on CUHK03 [20] labeled dataset has been improved from 19.9% [20] to 88.5% [27]. A more detailed review of current approaches will be given in Sec. 2.

Although the performance on current person ReID datasets is pleasing, there still remain several open issues hindering the applications of person ReID. First, existing public datasets differ from the data collected in real scenarios. For example, current datasets either contain limited number of identities or are taken under constrained environments. The currently largest DukeMTMC-reID [40] contains less than 2,000 identities and presents simple lighting conditions. Those limitations simplify the person ReID task and help to achieve high accuracy. In real scenarios, person ReID is commonly executed within a camera network deployed in both indoor and outdoor scenes and processes videos taken by a long period of time. Accordingly, real applications have to cope with challenges like a large number of identities and complex lighting and scene variations, which current algorithms might fail to address.

Another challenge we observe is that, there exists domain gap between different person ReID datasets, i.e., training and testing on different person ReID datasets results in severe performance drop. For example, the model trained on CUHK03 [20] only achieves the Rank-1 accuracy of 2.0% when tested on PRID [10]. As shown in Fig. 1, the domain gap could be caused by many reasons like different lighting conditions, resolutions, human race, seasons, backgrounds, etc. This challenge also hinders the applications of person ReID, because available training samples cannot be effectively leveraged for new testing domains. Since annotating person ID labels is expensive, research efforts are desired to narrow-down or eliminate the domain gap.

Aiming to facilitate the research towards applications in realistic scenarios, we collect a new Multi-Scene MultiTime person ReID dataset (MSMT17). Different from existing datasets, MSMT17 is collected and annotated to present several new features. 1) The raw videos are taken by an 15camera network deployed in both the indoor and outdoor scenes. Therefore, it presents complex scene transformations and backgrounds. 2) The videos cover a long period of time, e.g., four days in a month and three hours in the morning, noon, and afternoon, respectively in each day, thus present complex lighting variations. 3) It contains currently the largest number of annotated identities and bounding boxes, i.e., 4,101 identities and 126,441 bounding boxes. To our best knowledge, MSMT17 is currently the largest and most challenging public dataset for person ReID. More detailed descriptions will be given in Sec. 3.

To address the second challenge, we propose to bridge the domain gap by transferring persons in dataset A to another dataset B. The transferred persons from A are desired to keep their identities, meanwhile present similar styles, e.g., backgrounds, lightings, etc., with persons in B. We model this transfer procedure with a Person Transfer Generative Adversarial Network (PTGAN), which is inspired by the Cycle-GAN [41]. Different from Cycle-GAN [41], PTGAN considers extra constraints on the person foregrounds to ensure the stability of their identities during transfer. Compared with Cycle-GAN, PTGAN generates high quality person images, where person identities are kept and the styles are effectively transformed. Extensive experimental results on several datasets show PTGAN effectively reduces the domain gap among datasets.

Our contributions can be summarized into three aspects. 1) A new challenging large-scale MSMT17 dataset is collected and will be released. Compared with existing datasets, MSMT17 defines more realistic and challenging person ReID tasks. 2) We propose person transfer to take advantages of existing labeled data from different datasets. It has potential to relieve the expensive data annotations on new datasets and make it easy to train person ReID systems in real scenarios. An effective PTGAN model is presented for person transfer. 3) This paper analyzes several issues hindering the applications of person ReID. The proposed MSMT17 and algorithms have potential to facilitate the future research on person ReID.

Figure 1: Illustration of the domain gap between CUHK03 and PRID. It is obvious that, CUHK03 and PRID present different styles, e.g., distinct lightings, resolutions, human race, seasons, backgrounds, etc., resulting in low accuracy when training on CUHK03 and testing on PRID.

Conclusions and Discussions
This paper contributes a large-scale MSMT17 dataset. MSMT17 presents substantially variants on lightings, scenes, backgrounds, human poses, etc., and is currently the largest person ReID dataset. Compared with existing datasets, MSMT17 defines a more realistic and challenging person ReID task.

PTGAN is proposed as an original work on person transfer to bridge the domain gap among datasets. Extensive experiments show PTGAN effectively reduces the domain gap. Different cameras may present different styles, making it difficult to perform multiple style transfer with one mapping function. Therefore, the person transfer strategy in Sec. 5.4.2 and Sec. 5.5 is not yet optimal. This also explains why PTGAN learned on each individual target camera performs better in Sec. 5.4.1. A better strategy is to consider the style differences among cameras to get more stable mapping functions. Our future work would continue to study more effective and efficient person transfer strategies for large datasets.

vision and language

Learning Semantic Concepts and Order for Image and Sentence Matching

Abstract
Image and sentence matching has made great progress recently, but it remains challenging due to the large visualsemantic discrepancy. This mainly arises from that the representation of pixel-level image usually lacks of high-level semantic information as in its matched sentence. In this work, we propose a semantic-enhanced image and sentence matching model, which can improve the image representation by learning semantic concepts and then organizing them in a correct semantic order. Given an image, we first use a multi-regional multi-label CNN to predict its semantic concepts, including objects, properties, actions, etc. Then, considering that different orders of semantic concepts lead to diverse semantic meanings, we use a context-gated sentence generation scheme for semantic order learning. It simultaneously uses the image global context containing concept relations as reference and the groundtruth semantic order in the matched sentence as supervision. After obtaining the improved image representation, we learn the sentence representation with a conventional LSTM, and then jointly perform image and sentence matching and sentence generation for model learning. Extensive experiments demonstrate the effectiveness of our learned semantic concepts and order, by achieving the state-of-the-art results on two public benchmark datasets.

Introduction
The task of image and sentence matching refers to measuring the visual-semantic similarity between an image and a sentence. It has been widely applied to the application of image-sentence cross-modal retrieval, e.g., given an image query to find similar sentences, namely image annotation, and given a sentence query to retrieve matched images, namely text-based image search.

Although much progress in this area has been achieved,
it is still nontrivial to accurately measure the similarity between image and sentence, due to the existing huge visualsemantic discrepancy. Taking an image and its matched sentence in Figure 1 for example, main objects, properties and actions appearing in the image are: {cheetah, gazelle, grass}, {quick, young, green} and {chasing, running}, respectively. These high-level semantic concepts are the essential content to be compared with the matched sentence, but they cannot be easily represented from the pixel-level image. Most existing methods [11, 14, 20] jointly represent all the concepts by extracting a global CNN [28] feature vector, in which the concepts are tangled with each other. As a result, some primary foreground concepts tend to be dominant, while other secondary background ones will probably be ignored, which is not optimal for finegrained image and sentence matching. To comprehensively predict all the semantic concepts for the image, a possible way is to adaptively explore the attribute learning frameworks [6, 35, 33]. But such a method has not been well investigated in the context of image and sentence matching.

In addition to semantic concepts, how to correctly organize them, namely semantic order, plays an even more important role in the visual-semantic discrepancy. As illustrated in Figure 1, given the semantic concepts mentioned above, if we incorrectly set their semantic order as: a quick gazelle is chasing a young cheetah on grass, then it would have completely different meanings compared with the image content and matched sentence. But directly learning the correct semantic order from semantic concepts is very difficult, since there exist various incorrect orders that semantically make sense. We could resort to the image global context, since it already indicates the correct semantic order from the appearing spatial relations among semantic concepts, e.g., the cheetah is on the left of the gazelle. But it is unclear how to suitably combine them with the semantic concepts, and make them directly comparable to the semantic order in the sentence.

Alternatively, we could generate a descriptive sentence from the image as its representation. However, the imagebased sentence generation itself, namely image captioning, is also a very challenging problem. Even those state-ofthe-art image captioning methods cannot always generate very realistic sentences that capture all image details. The image details are essential to the matching task, since the global image-sentence similarity is aggregated from local similarities in image details. Accordingly, these methods cannot achieve very high performance for image and sentence matching [30, 3].

In this work, to bridge the visual-semantic discrepancy between image and sentence, we propose a semanticenhanced image and sentence matching model, which improves the image representation by learning semantic concepts and then organizing them in a correct semantic order. To learn the semantic concepts, we exploit a multiregional multi-label CNN that can simultaneously predict multiple concepts in terms of objects, properties, actions, etc. The inputs of this CNN are multiple selectively extracted regions from the image, which can comprehensively capture all the concepts regardless of whether they are primary foreground ones. To organize the extracted semantic concepts in a correct semantic order, we first fuse them with the global context of the image in a gated manner. The context includes the spatial relations of all the semantic concepts, which can be used as the reference to facilitate the semantic order learning. Then we use the groundtruth semantic order in the matched sentence as the supervision, by forcing the fused image representation to generate the matched sentence.

After enhancing the image representation with both semantic concepts and order, we learn the sentence representation with a conventional LSTM [10]. Then the representations of image and sentence are matched with a structured objective, which is in conjunction with another objective of sentence generation for joint model learning. To demonstrate the effectiveness of the proposed model, we perform several experiments of image annotation and retrieval on two publicly available datasets, and achieve the state-of-theart results.

Conclusions and Future Work
In this work, we have proposed a semantic-enhanced image and sentence matching model. Our main contribution is improving the image representation by learning semantic concepts and then organizing them in a correct semantic order. This is accomplished by a series of model components in terms of multi-regional multi-label CNN, gated fusion unit, and joint matching and generation learning. We have systematically studied the impact of these components on the image and sentence matching, and demonstrated the effectiveness of our model by achieving significant performance improvements.

In the future, we will replace the used VGGNet with ResNet in the multi-regional multi-label CNN to predict the semantic concepts more accurately, and jointly train it with the rest of our model in an end-to-end manner. Our model can perform image and sentence matching and sentence generation, so we would like to extend it for the image captioning task. Although Pan et al. [24] have shown the effectiveness of using visual-semantic embedding for video captioning, yet in the context of image captioning, its effectiveness has not been well investigated.

Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

Abstract
The Visual Dialogue task requires an agent to engage in a conversation about an image with a human. It represents an extension of the Visual Question Answering task in that the agent needs to answer a question about an image, but it needs to do so in light of the previous dialogue that has taken place. The key challenge in Visual Dialogue is thus maintaining a consistent, and natural dialogue while continuing to answer questions correctly. We present a novel approach that combines Reinforcement Learning and Generative Adversarial Networks (GANs) to generate more human-like responses to questions. The GAN helps overcome the relative paucity of training data, and the tendency of the typical MLE-based approach to generate overly terse answers. Critically, the GAN is tightly integrated into the attention mechanism that generates humaninterpretable reasons for each answer. This means that the discriminative model of the GAN has the task of assessing whether a candidate answer is generated by a human or not, given the provided reason. This is significant because it drives the generative model to produce high quality answers that are well supported by the associated reasoning. The method also generates the state-of-the-art results on the primary benchmark.

Introduction
The combined interpretation of vision and language has enabled the development of a range of applications that have made interesting steps towards Artificial Intelligence, including Image Captioning [11, 34, 37], Visual Question Answering (VQA) [1, 22, 38], and Referring Expressions [10, 12, 41]. VQA, for example, requires an agent to answer a previously unseen question about a previously unseen image, and is recognised as being an AI-Complete problem [1]. Visual Dialogue [5] represents an extension to the VQA problem whereby an agent is required to engage in a dialogue about an image. This is significant because it demands that the agent is able to answer a series of questions, each of which may be predicated on the previous questions and answers in the dialogue. Visual Dialogue thus reflects one of the key challenges in AI and Robotics, which is to enable an agent capable of acting upon the world, that we might collaborate with through dialogue.

Due to the similarity between the VQA and Visual Dialog tasks, VQA methods [19, 40] have been directly applied to solve the Visual Dialog problem. The fact that the Visual Dialog challenge requires an ongoing conversation, however, demands more than just taking into consideration the state of the conversation thus far. Ideally, the agent should be an engaged participant in the conversation, cooperating towards a larger goal, rather than generating single word answers, even if they are easier to optimise. Figure 1 provides an example of the distinction between the type of responses a VQA agent might generate and the more involved responses that a human is likely to generate if they are engaged in the conversation. These more human-like responses are not only longer, they provide reasoning information that might be of use even though it is not specifically asked for.

Previous Visual Dialog systems [5] follow a neural translation mechanism that is often used in VQA, by predicting the response given the image and the dialog history using the maximum likelihood estimation (MLE) objective function. However, because this over-simplified training objective only focus on measuring the word-level correctness, the produced responses tend to be generic and repetitive. For example, a simple response of ‘yes’,‘no’, or ‘I don’t know’ can safely answer a large number of questions and lead to a high MLE objective value. Generating more comprehensive answers, and a deeper engagement of the agent in the dialogue, requires a more engaged training process.

A good dialogue generation model should generate responses indistinguishable from those a human might produce. In this paper, we introduce an adversarial learning strategy, motivated by the previous success of adversarial learning in many computer vision [3, 21] and sequence generation [4, 42] problems. We particularly frame the task as a reinforcement learning problem that we jointly train two sub-modules: a sequence generative model to produce response sentences on the basis of the image content and the dialog history, and a discriminator that leverages previous generator’s memories to distinguish between the humangenerated dialogues and the machine-generated ones. The generator tends to generate responses that can fool the discriminator into believing that they are human generated, while the output of the discriminative model is used as a reward to the generative model, encouraging it to generate more human-like dialogue.

Although our proposed framework is inspired by generative adversarial networks (GANs) [9], there are several technical contributions that lead to the final success on the visual dialog generation task. First, we propose a sequential co-attention generative model that aims to ensure that attention can be passed effectively across the image, question and dialog history. The co-attended multi-modal features are combined together to generate a response. Secondly, and significantly, within the structure we propose the discriminator has access to the attention weights the generator used in generating its response. Note that the attention weights can be seen as a form of ‘reason’ for the generated response. For example, it indicates which region should be focused on and what dialog pairs are informative when generating the response. This structure is important as it allows the discriminator to assess the quality of the response, given word answers, even if they are easier to optimise. Figure 1 provides an example of the distinction between the type of responses a VQA agent might generate and the more involved responses that a human is likely to generate if they are engaged in the conversation. These more human-like responses are not only longer, they provide reasoning information that might be of use even though it is not specifically asked for.

Although our proposed framework is inspired by generative adversarial networks (GANs) [9], there are several technical contributions that lead to the final success on the visual dialog generation task. First, we propose a sequential co-attention generative model that aims to ensure that attention can be passed effectively across the image, question and dialog history. The co-attended multi-modal features are combined together to generate a response. Secondly, and significantly, within the structure we propose the discriminator has access to the attention weights the generator used in generating its response. Note that the attention weights can be seen as a form of ‘reason’ for the generated response. For example, it indicates which region should be focused on and what dialog pairs are informative when generating the response. This structure is important as it allows the discriminator to assess the quality of the response, given the reason. It also allows the discriminator to assess the response in the context of the dialogue thus far. Finally, as with most sequence generation problems, the quality of the response can only be assessed over the whole sequence. We follow [42] to apply Monte Carlo (MC) search to calculate the intermediate rewards.

We evaluate our method on the VisDial dataset [5] and show that it outperforms the baseline methods by a large margin. We also outperform several state-of-the-art methods. Specifically, our adversarial learned generative model outperforms our strong baseline MLE model by 1.87% on recall@5, improving over previous best reported results by 2.14% on recall@5, and 2.50% recall@10. Qualitative evaluation shows that our generative model generates more informative responses and a human study shows that 49% of our responses pass the Turing Test. We additionally implement a model under the discriminative setting (a candidate response list is given) and achieve the state-of-the-art performance.

Figure 1: Human-like vs. Machine-like responses in a visual dialog. The human-like responses clearly answer the questions more comprehensively, and help to maintain a meaningful dialogue.

Figure 2: The adversarial learning framework of our proposed model. Our model is composed of two components, the first being a sequential co-attention generator that accepts as input image, question and dialog history tuples, and uses the co-attention encoder to jointly reason over them. The second component is a discriminator tasked with labelling whether each answer has been generated by a human or the generative model by considering the attention weights. The output from the discriminator is used as a reward to push the generator to generate responses that are indistinguishable from those a human might generate.

Figure 3: The sequential co-attention encoder. Each input feature is coattend by the other two features in a sequential fashion, using the Eq.1-3. The number on each function indicates the sequential order, and the final attended features u ̃,v ̃ and q ̃ form the output of the encoder.

Conclusion
Visual Dialog generation is an interesting topic that requires machine to understand visual content, natural language dialog and have the ability of multi-modal reasoning. More importantly, as a human-computer interaction interface for the further robotics and AI, apart from the correctness, the human-like level of the generated response is a significant index. In this paper, we have proposed an adversarial learning based approach to encourage the generator to generate more human-like dialogs. Technically, by combining a sequential co-attention generative model that can jointly reason the image, dialog history and question, and a discriminator that can dynamically access to the attention memories, with an intermediate reward, our final proposed model achieves the state-of-art on VisDial dataset. A Turing Test fashion study also shows that our model can produce more human-like visual dialog responses.

Segmentation，Detection

Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective

Abstract
The success of current deep saliency detection methods heavily depends on the availability of large-scale supervision in the form of per-pixel labeling. Such supervision, while labor-intensive and not always possible, tends to hinder the generalization ability of the learned models. By contrast, traditional handcrafted features based unsupervised saliency detection methods, even though have been surpassed by the deep supervised methods, are generally dataset-independent and could be applied in the wild. This raises a natural question that “Is it possible to learn saliency maps without using labeled data while improving the generalization ability?”. To this end, we present a novel perspective to unsupervised 1 saliency detection through learning from multiple noisy labeling generated by “weak” and “noisy” unsupervised handcrafted saliency methods. Our end-to-end deep learning framework for unsupervised saliency detection consists of a latent saliency prediction module and a noise modeling module that work collaboratively and are optimized jointly. Explicit noise modeling enables us to deal with noisy saliency maps in a probabilistic way. Extensive experimental results on various benchmarking datasets show that our model not only outperforms all the unsupervised saliency methods with a large margin but also achieves comparable performance with the recent state-of-the-art supervised deep saliency methods.

Figure 1. Unsupervised saliency learning from weak “noisy” saliency maps. Given an input image xi and its corresponding unsupervised saliency maps yij, our framework learns the latent saliency map y ̄i by jointly optimizing the saliency prediction module and the noise modeling module.

Compared with SBF [35] which also learns from unsupervised saliency but with different strategy, our model achieves better performance.

Introduction
Saliency detection aims at identifying the visually interesting objects in images that are consistent with human perception, which is intrinsic to various vision tasks such as context-aware image editing [36], image caption generation [31]. Depending on whether human annotations have been used, saliency detection methods can be roughly divided as: unsupervised methods and supervised methods. The former ones compute saliency directly based on various priors (e.g., center prior [9], global contrast prior [6], background connectivity prior [43] and etc.), which are summarized and described with human knowledge. The later ones learn direct mapping from color images to saliency maps by exploiting the availability of large-scale human annotated database.

Building upon the powerful learning capacity of convolutional neural network (CNN), deep supervised saliency detection methods [42, 11, 40] achieve state-of-the-art performances, outperforming the unsupervised methods by a wide margin. The success of these deep saliency methods strongly depend on the availability of large-scale training dataset with pixel-level human annotations, which is not only labor-intensive but also could hinder the generalization ability of the learned network models. By contrast, the unsupervised saliency methods, even though have been outperformed by the deep supervised methods, are generally dataset-independent and could be applied in the wild.

In this paper, we present a novel end-to-end deep learning framework for saliency detection that is free from human annotations, thus “unsupervised” (see Fig. 1 for a visualization). Our framework is built upon existing efficient and effective unsupervised saliency methods and the powerful capacity of deep neural network. The unsupervised saliency methods are formulated with human knowledge and different unsupervised saliency methods exploit different human designed priors for saliency detection. They are noisy (compared with ground truth human annotations) and could have method-specific bias in predicting saliency maps. By utilizing existing unsupervised saliency maps, we are able to remove the need of labor-intensive human annotations, also by jointly learn different priors from multiple unsupervised saliency methods, we are able to get complementary information of those unsupervised saliency.

To effectively leverage these noisy but informative saliency maps, we propose a novel perspective to the problem: Instead of removing the noise in saliency labeling from unsupervised saliency methods with different fusion strategies [35], we explicitly model the noise in saliency maps. As illustrated in Fig. 2, our framework consists of two consecutive modules, namely a saliency prediction module that learns the mapping from a color image to the “latent” saliency map based on current noise estimation and the noisy saliency maps, and a noise modeling module that fits the noise in noisy saliency maps and updates the noise estimation in different saliency maps based on updated saliency prediction and the noisy saliency maps. In this way, our method takes advantages of both probabilistic methods and deterministic methods, where the latent saliency prediction module works in a deterministic way while the noise modeling module fits the noise distribution in a probabilistic manner. Experiments suggest that our strategy is very effective and it only takes several rounds 2 till convergence.

To the best of our knowledge, the idea of considering unsupervised saliency maps as learning from multiple noisy labels is brand new and different from existing unsupervised deep saliency methods (e.g., [35]). Our main contributions can be summarized as:
1) We present a novel perspective to unsupervised deep saliency detection, and learn saliency maps from multiple noisy unsupervised saliency methods. We formulate the problem as joint optimization of a latent saliency prediction module and a noise modeling module.
2) Our deep saliency model is trained in an end-to-end manner without using any human annotations, leading to an extremely cheap solution.
3) Extensive performance evaluation on seven benchmarking datasets show that our framework outperforms existing unsupervised methods with a wide margin while achieving comparable results with state-of-the-art deep supervised saliency detection methods [11, 40].

Figure 2. Conceptual illustration of our saliency detection framework, which consists of a “latent” saliency prediction module and a noise modeling module. Given an input image, noisy saliency maps are generated by handcrafted feature based unsupervised saliency detection methods. Our framework jointly optimizes both modules under a unified loss function. The saliency prediction module targets at learning latent saliency maps based on current noise estimation and the noisy saliency maps. The noise modeling module updates the noise estimation in different saliency maps based on updated saliency prediction and the noisy saliency maps. In our experiments, the overall optimization converges in several rounds.

Conclusions
In this paper, we propose an end-to-end saliency learning framework without the need of human annotated saliency maps in network training. We represent unsupervised saliency learning as learning from multiple noisy saliency maps generated by various efficient and effective conventional unsupervised saliency detection methods. Our framework consists of a latent saliency prediction module and an explicit noise modeling models, which work collaboratively. Extensive experimental results on various benchmarking datasets prove the superiority of our method, which not only outperforms traditional unsupervised methods with a wide margin but also achieves highly comparable performance with current state-of-the-art deep supervised saliency detection methods. In the future, we plan to investigate the challenging scenarios of multiple saliency object detection and small salient object detection under our framework. Extending our framework to dense prediction tasks such as semantic segmentation [25] and monocular depth estimation [18] could be interesting directions.

Path Aggregation Network for Instance Segmentation

Abstract
The way that information propagates in neural networks is of great importance. In this paper, we propose Path Aggregation Network (PANet) aiming at boosting information flow in proposal-based instance segmentation framework. Specifically, we enhance the entire feature hierarchy with accurate localization signals in lower layers by bottom-up path augmentation, which shortens the information path between lower layers and topmost feature. We present adaptive feature pooling, which links feature grid and all feature levels to make useful information in each feature level propagate directly to following proposal subnetworks. A complementary branch capturing different views for each proposal is created to further improve mask prediction.

These improvements are simple to implement, with subtle extra computational overhead. Our PANet reaches the 1st place in the COCO 2017 Challenge Instance Segmentation task and the 2nd place in Object Detection task without large-batch training. It is also state-of-the-art on MVD and Cityscapes.

Introduction
Instance segmentation is one of the most important and challenging tasks. It aims to predict class label and pixelwise instance mask to localize varying numbers of instances presented in images. This task widely benefits autonomous vehicles, robotics, video surveillance, to name a few.
With the help of deep convolutional neural networks, several frameworks for instance segmentation, e.g., [21, 33, 3, 38], were proposed where performance grows rapidly [12]. Mask R-CNN [21] is a simple and effective system for instance segmentation. Based on Fast/Faster R-CNN [16, 51], a fully convolutional network (FCN) is used for mask prediction, along with box regression and classification. To achieve high performance, feature pyramid network (FPN) [35] is utilized to extract in-network feature hierarchy, where a top-down path with lateral connections is augmented to propagate semantically strong features.

Several newly released datasets [37, 7, 45] make large room for algorithm improvement. COCO [37] consists of 200k images. Lots of instances with complex spatial layout are captured in each image. Differently, Cityscapes [7] and MVD [45] provide street scenes with a large number of traffic participants in each image. Blur, heavy occlusion and extremely small instances appear in these datasets.

There have been several principles proposed for designing networks in image classification that are also effective for object recognition. For example, shortening information path and easing information propagation by clean residual connection [23, 24] and dense connection [26] are useful. Increasing the flexibility and diversity of information paths by creating parallel paths following the splittransform-merge strategy [61, 6] is also beneficial.

Our Findings Our research indicates that information propagation in state-of-the-art Mask R-CNN can be further improved. Specifically, features in low levels are helpful for large instance identification. But there is a long path from low-level structure to topmost features, increasing difficulty to access accurate localization information. Further, each proposal is predicted based on feature grids pooled from one feature level, which is assigned heuristically. This process can be updated since information discarded in other levels may be helpful for final prediction. Finally, mask prediction is made on a single view, losing the chance to gather more diverse information.

Our Contributions Inspired by these principles and observations, we propose PANet, illustrated in Figure 1, for instance segmentation.
First, to shorten information path and enhance feature pyramid with accurate localization signals existing in low-levels, bottom-up path augmentation is created. In fact, features in low-layers were utilized in the systems of [44, 42, 13, 46, 35, 5, 31, 14]. But propagating low-level features to enhance entire feature hierarchy for instance recognition was not explored.

Second, to recover broken information path between each proposal and all feature levels, we develop adaptive feature pooling. It is a simple component to aggregate features from all feature levels for each proposal, avoiding arbitrarily assigned results. With this operation, cleaner paths are created compared with those of [4, 62].

Finally, to capture different views of each proposal, we augment mask prediction with tiny fully-connected (fc) layers, which possess complementary properties to FCN originally used by Mask R-CNN. By fusing predictions from these two views, information diversity increases and masks with better quality are produced.

The first two components are shared by both object detection and instance segmentation, leading to much enhanced performance of both tasks.

Experimental Results With PANet, we achieve state-ofthe-art performance on several datasets. With ResNet-50 [23] as the initial network, our PANet tested with a single scale already outperforms champion of COCO 2016 Challenge in both object detection [27] and instance segmentation [33] tasks. Note that these previous results are achieved by larger models [23, 58] together with multi-scale and horizontal flip testing.

We achieve the 1st place in COCO 2017 Challenge Instance Segmentation task and the 2nd place in Object Detection task without large-batch training. We also benchmark our system on Cityscapes and MVD, which similarly yields top-ranking results, manifesting that our PANet is a very practical and top-performing framework. Our code and models will be made publicly available.

Figure 1. Illustration of our framework. (a) FPN backbone. (b) Bottom-up path augmentation. (c) Adaptive feature pooling. (d) Box branch. (e) Fully-connected fusion. Note that we omit channel dimension of feature maps in (a) and (b) for brevity.

Conclusion
We have presented our PANet for instance segmentation. We designed several simple and yet effective components to enhance information propagation in representative pipelines. We pool features from all feature levels and shorten the distance among lower and topmost feature levels for reliable information passing. Complementary path is augmented to enrich feature for each proposal. Impressive results are produced. Our future work will be to extend our method to video and RGBD data.

Context Encoding for Semantic Segmentation

Abstract
Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps. The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN. Our approach has achieved new state-of-theart results 51.7% mIoU on PASCAL-Context, 85.9% mIoU on PASCAL VOC 2012. Our single model achieves a final score of 0.5567 on ADE20K test set, which surpasses the winning entry of COCO-Place Challenge 2017. In addition, we also explore how the Context Encoding Module can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset. Our 14 layer network has achieved an error rate of 3.45%, which is comparable with state-of-the-art approaches with over 10× more layers. The source code for the complete system are publicly available1.

Introduction
Semantic segmentation assigns per-pixel predictions of object categories for the given image, which provides a comprehensive scene description including the information of object category, location and shape. State-of-the-art semantic segmentation approaches are typically based on the Fully Convolutional Network (FCN) framework [37]. The adaption of Deep Convolutional Neural Networks (CNNs) [29] benefits from the rich information of object categories and scene semantics learned from diverse set of images [10]. CNNs are able to capture the informative representations with global receptive fields by stacking convolutional layers with non-linearities and downsampling. For conquering the problem of spatial resolution loss associated with downsampling, recent work uses Dilated/Atrous convolution strategy to produce dense predictions from pretrained networks [4,54]. However, this strategy also isolates the pixels from the global scene context, leading to misclassified pixels. For example in the 3rd row of Figure 4, the baseline approach classifies some pixels in the windowpane as door.

Recent methods have achieved state-of-the-art performance by enlarging the receptive field using multiresolution pyramid-based representations. For example, PSPNet adopts Spatial Pyramid Pooling that pools the featuremaps into different sizes and concatenates them the after upsampling [59] and Deeplab proposes an Atrous Spatial Pyramid Pooling that employs large rate dilated/atrous convolutions [5]. While these approaches do improve performance, the context representations are not explicit, leading to the questions: Is capturing contextual information the sameasincreasingthereceptivefieldsize? Considerlabeling a new image for a large dataset (such as ADE20K [61] containing 150 categories) as shown in Figure 1. Suppose we have a tool allowing the annotator to first select the semantic context of the image, (e.g. a bedroom). Then, the tool could provide a much smaller sublist of relevant categories (e.g. bed, chair, etc.), which would dramatically reduce the search space of possible categories. Similarly, if we can design an approach to fully utilize the strong correlation between scene context and the probabilities of categories, the semantic segmentation becomes easier for the network.

Classic computer vision approaches have the advantage of capturing semantic context of the scene. For a given input image, hand-engineered features are densely extracted using SIFT [38] or filter bank responses [30, 48]. Then a visual vocabulary (dictionary) is often learned and the global feature statistics are described by classic encoders such as Bag-of-Words (BoW) [8, 13, 26, 46], VLAD [25] or Fisher Vector [44]. The classic representations encode global contextual information by capturing feature statistics. While the hand-crafted feature were improved greatly by CNN methods, the overall encoding process of traditional methods was convenient and powerful. Can we leverage the context encoding of classic approaches with the power of deep learning? Recent work has made great progress in generalizing traditional encoders in a CNN framework [1, 58]. Zhang et al. introduces an Encoding Layer that integrates the entire dictionary learning and residual encoding pipeline into a single CNN layer to capture orderless representations. This method has achieved state-of-the-art results on texture classification [58]. In this work, we extend the Encoding Layer to capture global feature statistics for understanding semantic context.

As the first contribution of this paper, we introduce a Context Encoding Module incorporating Semantic Encoding Loss (SE-loss), a simple unit to leverage the global scene context information. The Context Encoding Module integrates an Encoding Layer to capture global context and selectively highlight the class-dependent featuremaps. For intuition, consider that we would want to de-emphasize the probability of a vehicle to appear in an indoor scene. Standard training process only employs per-pixel segmentation loss, which does not strongly utilize global context of the scene. We introduce Semantic Encoding Loss (SE-loss) to regularize the training, which lets the network predict the presence of the object categories in the scene to enforce network learning of semantic context. Unlike per-pixel loss, SE-loss gives an equal contributions for both big and small objects and we find the performance of small objects are often improved in practice. The proposed Context Encoding Module and Semantic Encoding Loss are conceptually straight-forward and compatible with existing FCN based approaches.

The second contribution of this paper is the design and implementation of a new semantic segmentation framework Context Encoding Network (EncNet). EncNet augments a pre-trained Deep Residual Network (ResNet) [17] by including a Context Encoding Module as shown in Figure 2. We use dilation strategy [4,54] of pre-trained networks. The proposed Context Encoding Network achieves state-of-theart results 85.9% mIoU on PASCAL VOC 2012 and 51.7% on PASCAL in Context. Our single model of EncNet-101 has achieved a score of 0.5567 which surpass the winning entry of COCO-Place Challenge 2017 [61]. In addition to semantic segmentation, we also study the power of our Context Encoding Module for visual recognition on CIFAR-10 dataset [28] and the performance of shallow network is significantly improved using the proposed Context Encoding Module. Our network has achieved an error rate of 3.96% using only 3.5M parameters. We release the complete system including state-of-the-art approaches together with our implementation of synchronized multi-GPU Batch Normalization [23] and memory-efficient Encoding Layer [58].

Figure 1: Labeling a scene with accurate per-pixel labels is a challenge for semantic segmentation algorithms. Even humans find the task challenging. However, narrowing the list of probable categories based on scene context makes labeling much easier. Motivated by this, we introduce the Context Encoding Module which selectively highlights the class-dependent featuremaps and makes the semantic segmentation easier for the network. (Examples from ADE20K [61].)

Figure 2: Overview of the proposed EncNet. Given an input image, we first use a pre-trained CNN to extract dense convolutional featuremaps. We build a Context Encoding Module on top, including an Encoding Layer to capture the encoded semantics and predict scaling factors that are conditional on these encoded semantics. These learned factors selectively highlight class-dependent featuremaps (visualized in colors). In another branch, we employ Semantic Encoding Loss (SE-loss) to regularize the training which lets the Context Encoding Module predict the presence of the categories in the scene. Finally, the representation of Context Encoding Module is fed into the last convolutional layer to make per-pixel prediction. (Notation: FC fully connected layer, Conv convolutional layer, Encode Encoding Layer [58], channel-wise multiplication.)

Figure 3: Dilation strategy and losses. Each cube denotes different network stages. We apply dilation strategy to the stage 3 and 4. The Semantic Encoding Losses (SE-loss) are added to both stage 3 and 4 of the base network. (D denotes the dilation rate, Seg-loss represents the per-pixel segmentation loss.)

Conclusion
To capture and utilize the contextual information for semantic segmentation, we introduce a Context Encoding Module, which selectively highlights the class-dependent featuremap and “simplifies” the problem for the network. The proposed Context Encoding Module is conceptually straightforward, light-weight and compatible with existing FCN base approaches. The experimental results has demonstrated superior performance of the proposed EncNet. We hope the strategy of Context Encoding and our state-ofthe-art implementation (including baselines, Synchronized Cross-GPU Batch Normalization and Encoding Layer) can be beneficial to scene parsing and semantic segmentation work in the community.

Figure 4: Understanding contextual information of the scene is important for semantic segmentation. For example, baseline FCN classifies sand as earth without knowing the context as in 1st example. building, house and skyscraper are hard to distinguish without the semantics as in 2nd and 4th rows. In the 3rd example, FCN identify windowpane as door due to classifying isolated pixels without a global sense/view. (Visual examples from ADE20K dataset.)

Figure 6: Visual examples in PASCAL-Context dataset.EncNet produce more accurate predictions.

你可能感兴趣的:(笔记,CVPR2018)

10月|愿你的青春不负梦想-读书笔记-01 Tracy的小书斋
本书的作者是俞敏洪，大家都很熟悉他了吧。俞敏洪老师是我行业的领头羊吧，也是我事业上的偶像。本日摘录他书中第一章中的金句：『一个人如果什么目标都没有，就会浑浑噩噩，感觉生命中缺少能量。能给我们能量的，是对未来的期待。第一件事，我始终为了进步而努力。与其追寻全世界的骏马，不如种植丰美的草原，到时骏马自然会来。第二件事，我始终有阶段性的目标。什么东西能给我能量？答案是对未来的期待。』读到这里的时候，我便
《投行人生》读书笔记小蘑菇的树洞
《投行人生》----作者詹姆斯-A-朗德摩根斯坦利副主席40年的职业洞见-很短小精悍的篇幅，比较适合初入职场的新人。第一部分成功的职业生涯需要规划1.情商归为适应能力分享与协作同理心适应能力，更多的是自我意识，你有能力识别自己的情并分辨这些情绪如何影响你的思想和行为。2.对于初入职场的人的建议，细节，截止日期和数据很重要截止日期，一种有效的方法是请老板为你所有的任务进行优先级排序。和老板喝咖啡的好
【一起学Rust | 设计模式】习惯语法——使用借用类型作为参数、格式化拼接字符串、构造函数广龙宇一起学Rust #Rust设计模式 rust 设计模式开发语言
提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、使用借用类型作为参数二、格式化拼接字符串三、使用构造函数总结前言Rust不是传统的面向对象编程语言，它的所有特性，使其独一无二。因此，学习特定于Rust的设计模式是必要的。本系列文章为作者学习《Rust设计模式》的学习笔记以及自己的见解。因此，本系列文章的结构也与此书的结构相同（后续可能会调成结构），基本上分为三个部分
git常用命令笔记咩酱-小羊 git 笔记
###用习惯了idea总是不记得git的一些常见命令，需要用到的时候总是担心旁边站了人~~~记个笔记@_@，告诉自己看笔记不丢人初始化初始化一个新的Git仓库gitinit配置配置用户信息gitconfig--globaluser.name"YourName"gitconfig--globaluser.email"[email protected]"基本操作克隆远程仓库gitclone查看
509. 斐波那契数(每日一题) lzyprime
lzyprime博客(github)创建时间：2021.01.04qq及邮箱：2383518170leetcode笔记题目描述斐波那契数，通常用F(n)表示，形成的序列称为斐波那契数列。该数列由0和1开始，后面的每一项数字都是前面两项数字的和。也就是：F(0)=0，F(1)=1F(n)=F(n-1)+F(n-2)，其中n>1给你n，请计算F(n)。示例1：输入：2输出：1解释：F(2)=F(1)+
拥有断舍离的心态，过精简生活--《断舍离》读书笔记爱吃丸子的小樱桃
不知不觉间房间里的东西越来越多，虽然摆放整齐，但也时常会觉得空间逼仄，令人心生烦闷。抱着断舍离的态度，我开始阅读《断舍离》这本书，希望从书中能找到一些有效的方法，帮助我实现空间、物品上的断舍离。《断舍离》是日本作家山下英子通过自己的经历、思考和实践总结而成的，整体内涵也从刚开始的私人生活哲学的“断舍离”升华成了“人生实践哲学”，接着又成为每个人都能实行的“改变人生的断舍离”，从“哲学”逐渐升华成“
四章-32-点要素的聚合彩云飘过
本文基于腾讯课堂老胡的课《跟我学Openlayers--基础实例详解》做的学习笔记，使用的openlayers5.3.xapi。源码见1032.html，对应的官网示例https://openlayers.org/en/latest/examples/cluster.htmlhttps://openlayers.org/en/latest/examples/earthquake-clusters.
高端密码学院笔记285 柚子_b4b4
高端幸福密码学院（高级班）幸福使者：李华第（598）期《幸福》之回归内在深层生命原动力基础篇——揭秘“激励”成长的喜悦心理案例分析主讲：刘莉一，知识扩充:成功=艰苦劳动+正确方法+少说空话。贪图省力的船夫，目标永远下游。智者的梦再美，也不如愚人实干的脚印。幸福早课堂2020.10.16星期五一笔记:1，重视和珍惜的前提是知道它的价值非常重要，当你珍惜了，你就真正定下来，真正的学到身上。2，大家需要
Day17笔记-高阶函数 ~在杰难逃~ Python 笔记 python 开发语言 pycharm 数据分析
高阶函数【重点掌握】函数的本质：函数是一个变量，函数名是一个变量名，一个函数可以作为另一个函数的参数或返回值使用如果A函数作为B函数的参数，B函数调用完成之后，会得到一个结果，则B函数被称为高阶函数常用的高阶函数：map(),reduce(),filter(),sorted()1.map()map(func,iterable)，返回值是一个iterator【容器，迭代器】func:函数iterab
Day1笔记-Python简介&标识符和关键字&输入输出 ~在杰难逃~ Python python 开发语言大数据数据分析数据挖掘
大家好，从今天开始呢，杰哥开展一个新的专栏，当然，数据分析部分也会不定时更新的，这个新的专栏主要是讲解一些Python的基础语法和知识，帮助0基础的小伙伴入门和学习Python，感兴趣的小伙伴可以开始认真学习啦！一、Python简介【了解】1.计算机工作原理编程语言就是用来定义计算机程序的形式语言。我们通过编程语言来编写程序代码，再通过语言处理程序执行向计算机发送指令，让计算机完成对应的工作，编程
node.js学习小猿L node.js node.js 学习 vim
node.js学习实操及笔记温故node.js，node.js学习实操过程及笔记~node.js学习视频node.js官网node.js中文网实操笔记githubcsdn笔记为什么学node.js可以让别人访问我们编写的网页为后续的框架学习打下基础，三大框架vuereactangular离不开node.jsnode.js是什么官网：node.js是一个开源的、跨平台的运行JavaScript的运行
数据仓库——维度表一致性墨染丶eye 背诵数据仓库
数据仓库基础笔记思维导图已经整理完毕，完整连接为：数据仓库基础知识笔记思维导图维度一致性问题从逻辑层面来看，当一系列星型模型共享一组公共维度时，所涉及的维度称为一致性维度。当维度表存在不一致时，短期的成功难以弥补长期的错误。维度时确保不同过程中信息集成起来实现横向钻取货活动的关键。造成横向钻取失败的原因维度结构的差别，因为维度的差别，分析工作涉及的领域从简单到复杂，但是都是通过复杂的报表来弥补设计
【Git】常见命令(仅笔记) 好想有猫猫 Git Linux学习笔记 git 笔记 elasticsearch linux c++
文章目录创建/初始化本地仓库添加本地仓库配置项提交文件查看仓库状态回退仓库查看日志分支删除文件暂存工作区代码远程仓库使用`.gitigore`文件让git不追踪一些文件标签创建/初始化本地仓库gitinit添加本地仓库配置项gitconfig-l#以列表形式显示配置项gitconfiguser.name"ljh"#配置user.namegitconfiguser.email"[email protected]
为什么你总是对下属不满意? ZhaoWu1050
【ZhaoWu的听课笔记】大多数公司，都存在两种问题。我创业四年，更是体会深切。这两种问题就是：老板经常不满意下属的表现；下属总是不知道老板想要什么；虽然这两种问题普遍存在，其实解决方法并不复杂。这节课，我们再聊聊第一个问题：为什么老板经常不满意下属表现?其实，这背后也是一条管理常识。管理学家德鲁克先生早就说过：管理者的任务，不是去改变人。*来自《卓有成效的管理者》只是大多数老板和我一样，都是一边
母亲节如何做小红书营销美橙传媒
小红书的一举一动引起了外界的高度关注。通过爆款笔记和流行话题，我们可以看到“干货”类型的内容在小红书中偏向实用的生活经验共享和生活指南非常受欢迎。根据运营社的分析，这种现象是由小红书用户心智和内容社区背后机制共同决定的。首先，小红书将使用“强搜索”逻辑为用户提供特定的“搜索场景”。在“我必须这样生活”中，大量使用了满足小红书站用户喜好和需求的内容。内容社区自制的高质量内容也吸引了寻找营销新途径的品
读书笔记|《遇见孩子，遇见更好的自己》5 抹茶社长
为人父母意味着放弃自己的过去，不要对以往没有实现的心愿耿耿于怀，只有这样，孩子们才能做回自己。985909803.jpg孩子在与父母保持亲密的同时更需要独立，唯有这样，孩子才会成为孩子，父母才会成其为父母。有耐心的人生往往更幸福，给孩子留点余地。认识到养儿育女是对耐心的考验。为失败做好心理准备，教会孩子控制情绪。了解自己的底线，说到底线，有一点很重要，父母之所以发脾气，真正的原因往往在于他们自己，
基于Python给出的PDF文档转Markdown文档的方法程序媛了了 python pdf 开发语言
注：网上有很多将Markdown文档转为PDF文档的方法，但是却很少有将PDF文档转为Markdown文档的方法。就算有，比如某些网站声称可以将PDF文档转为Markdown文档，尝试过，不太符合自己的要求，而且无法保证文档没有泄露风险。于是本人为了解决这个问题，借助GPT（能使用GPT镜像或者有条件直接使用GPT的，反正能调用GPT接口就行）生成Python代码来完成这个功能。笔记、代码难免存在
语文主题教学学习笔记之87 东哥杂谈
“语文主题教学”学习笔记之八十七（0125）今天继续学习小学语文主题教学的实践样态。板块三：教学中体现“书艺”味道。作为四大名著之一的《水浒传》，堪称我国文学宝库之经典。对从《水浒传》中摘选的单元，教师就要了解其原生态，即评书体特点。这也要求教师要了解一些常用的评书行话术语，然后在教学时适时地加入一些，让学生体味其文本中原有的特色。学生也要尽可能地通过朗读的方式，而不单是分析讲解的方式进行学习。细
Armv8.3 体系结构扩展--原文版代码改变世界ctw ARM-TEE-Android armv8 嵌入式 arm架构安全架构芯片 Trustzone Secureboot
快速链接:.ARMv8/ARMv9架构入门到精通-[目录]付费专栏-付费课程【购买须知】:个人博客笔记导读目录(全部)TheArmv8.3architectureextensionTheArmv8.3architectureextensionisanextensiontoArmv8.2.Itaddsmandatoryandoptionalarchitecturalfeatures.Somefeat
springboot+vue项目实战一-创建SpringBoot简单项目苹果酱0567 面试题汇总与解析 spring boot 后端 java 中间件开发语言
这段时间抽空给女朋友搭建一个个人博客，想着记录一下建站的过程，就当做笔记吧。虽然复制zjblog只要一个小时就可以搞定一个网站，或者用cms系统，三四个小时就可以做出一个前后台都有的网站，而且想做成啥样也都行。但是就是要从新做，自己做的意义不一样，更何况，俺就是专门干这个的，嘿嘿嘿要做一个网站，而且从零开始，首先呢就是技术选型了，经过一番思量决定选择-SpringBoot做后端，前端使用Vue做一
阅读《认知觉醒》读书笔记就看看书
本周阅读了周岭的《认知觉醒开启自我改变的原动力》，启发较多，故做读书笔记一则，留待学习。全书共八章，讲述了大脑、潜意识、元认知、专注力、学习力、行动力、情绪力及成本最低的成长之道。具体描述了大脑、焦虑、耐心、模糊、感性、元认知、自控力、专注力、情绪专注、学习专注、匹配、深度、关联、体系、打卡、反馈、休息、清晰、傻瓜、行动、心智宽带、单一视角、游戏心态、早起、冥想、阅读、写作、运动等相关知识点。大脑
阅读笔记：阅读方法中的逻辑和转念施吉涛
聊聊一些阅读的方法论吧，别人家的读书方法刚开始想写，然后就不知道写什么了，因为作者写的非常的“精致”我有一种乡巴佬进城的感觉，看到精美的摆盘，精致的食材不知道该如何下口也就是《阅读的方法》，我们姑且来试一下强劲的大脑篇，第一节：逻辑通俗的来讲，也就是表达的排列和顺序，再进一步就是因果关系和关联实际上书已经看了大概一遍，但直到打算写一下笔记的时候，才发现作者讲的推理更多的是阅读的对象中呈现出的逻辑也
《转介绍方法论》学习笔记小可乐的妈妈
一、高效转介绍的流程：价值观---执行----方案一）转介绍发生的背景：1、对象：谁向谁转介绍？全员营销，人人参与。①员工的激励政策、客户的转介绍诱因制作客户画像：a信任；支付能力；意愿度；便利度（根据家长具备四个特征的个数分为四类）B性格分类C职业分类D年龄性别②执行：套路，策略，方法，流程2、诱因：为什么要转介绍？认同信任；多方共赢；传递美好；零风险承诺打动人心，超越期待。选择做教育，就是选择
JAVA学习笔记之23种设计模式学习 victorfreedom Java技术设计模式 android java 常用设计模式
博主最近买了《设计模式》这本书来学习，无奈这本书是以C++语言为基础进行说明，整个学习流程下来效率不是很高，虽然有的设计模式通俗易懂，但感觉还是没有充分的掌握了所有的设计模式。于是博主百度了一番，发现有大神写过了这方面的问题，于是博主迅速拿来学习。一、设计模式的分类总体来说设计模式分为三大类：创建型模式，共五种：工厂方法模式、抽象工厂模式、单例模式、建造者模式、原型模式。结构型模式，共七种：适配器
解决Obsidian写笔记中的＜img＞标签无法显示图片的问题全能全知者笔记
Obsidian中写md笔记如果使用标签会显示不出图案，后来才知道因为Obsidian的问题导致只能用绝对路径定位。所以我本人写了一个py插件，将md笔记里的img标签批量替换成Obsidian能够读取的形式。安装FixObsImgDpy:pipinstallFixObsImgDpy安装完成后在需要修复的md文件的父目录下运行命令:FixObsImgDpy就会自动修复父目录以下的全部md文件仓库
2021年周总结 03 Ruby之家
这周的生活过得也是比较快，因为暂时住的离公司有点距离，所以通勤时间相对较长一点，而在地铁上的一个半小时如何充分利用起来，则是我最近一直在思考的问题，2021年想让自己的生活都运行在计划中。(有时候自己想干一件事情就总是给自己找很多借口，想着以后怎么怎么样？然而哪有那么多的以后，能够方便当下的工作生活就立马执行就OK，这仅仅只是我此时想到背的很重的老人机笔记本电脑，也算是陪伴我快8年的—当时买的时候
2021-12-11 人生导演
今天读到佛学书籍的一段话：初学者很难直接体验到无我，但可以经常提醒自己：一切事物都是无我的。不断强化这个观念，也会相当有帮助。比如生病了我们一般会说：“我不舒服！我很痛！我很惨！”这时候如果我们提醒自己：没有我，只是这个肉体的某些部分、某些功能出了问题，不舒服、疼痛也只是一时的感受，而感受随时在变化。仅仅是知道没有一个实存的我在生病、在受苦。然后把“一切事物都是无我的”这句话，记到笔记上，并且朗读
新能源汽车 BMS 学习笔记篇—BMS 基本定义及分类 WPG大大通其他笔记汽车 BMS 经验分享新能源电池
一、BMS定义1、概念：BMS（BatteryManagementSystem）即电池管理系统，其管理对象是二次电池（充电电池或蓄电池），其主要目的是电池的利用率，防止电池出现过度充电和过度放电，可应用于电动汽车、电瓶车、机器人、无人机等图片来源：腾讯网https://new.qq.com《标准普尔警告，电动汽车电池生产面临供应链和地缘政治风险》2、四大功能①感知和测量：检测电池的电压、电流、温度
机器学习-聚类算法不良人龍木木机器学习机器学习算法聚类
机器学习-聚类算法1.AHC2.K-means3.SC4.MCL仅个人笔记，感谢点赞关注！1.AHC2.K-means3.SC传统谱聚类：个人对谱聚类算法的理解以及改进4.MCL目前仅专注于NLP的技术学习和分享感谢大家的关注与支持！
LeetCode github集合，附CMU大神整理笔记 Wesley@ LeetCode github
GithubLeetCode集合本人所有做过的题目都写在一个java项目中，同步到github中了，算是见证自己的进步。github目前同步的题目是2020-09-17日之后写的题。之前写过的题会陆续跟新到github中。目前大概400个题目Github项目链接：https://github.com/sunliancheng/leetcode_github附上一份优秀的教材整合：这是卡内基梅隆(C
ASM系列六利用TreeApi 添加和移除类成员 lijingyao8206 jvm 动态代理 ASM 字节码技术 TreeAPI
同生成的做法一样，添加和移除类成员只要去修改fields和methods中的元素即可。这里我们拿一个简单的类做例子，下面这个Task类，我们来移除isNeedRemove方法，并且添加一个int 类型的addedField属性。 package asm.core; /** * Created by yunshen.ljy on 2015/6/
Springmvc-权限设计 bee1314 spring Web jsp
万丈高楼平地起。权限管理对于管理系统而言已经是标配中的标配了吧，对于我等俗人更是不能免俗。同时就目前的项目状况而言，我们还不需要那么高大上的开源的解决方案，如Spring Security，Shiro。小伙伴一致决定我们还是从基本的功能迭代起来吧。目标： 1.实现权限的管理（CRUD） 2.实现部门管理（CRUD) 3.实现人员的管理（CRUD） 4.实现部门和权限
算法竞赛入门经典（第二版）第2章习题 CrazyMizzz c 算法
2.4.1 输出技巧 #include <stdio.h> int main() { int i, n; scanf("%d", &n); for (i = 1; i <= n; i++) printf("%d\n", i); return 0; } 习题2-2 水仙花数(daffodil
struts2中jsp自动跳转到Action 麦田的设计者 jsp webxml struts2 自动跳转
1、在struts2的开发中，经常需要用户点击网页后就直接跳转到一个Action，执行Action里面的方法，利用mvc分层思想执行相应操作在界面上得到动态数据。毕竟用户不可能在地址栏里输入一个Action（不是专业人士） 2、＜jsp:forward page="xxx.action" /＞，这个标签可以实现跳转，page的路径是相对地址,不同与jsp和j
php 操作webservice实例 IT独行者 PHP webservice
首先大家要简单了解了何谓webservice，接下来就做两个非常简单的例子，webservice还是逃不开server端与client端。我测试的环境为：apache2.2.11 php5.2.10做这个测试之前，要确认你的php配置文件中已经将soap扩展打开，即extension=php_soap.dll; OK 现在我们来体验webservice //server端 serve
Windows下使用Vagrant安装linux系统 _wy_ windows vagrant
准备工作：下载安装 VirtualBox ：https://www.virtualbox.org/ 下载安装 Vagrant ：http://www.vagrantup.com/ 下载需要使用的 box ：官方提供的范例：http://files.vagrantup.com/precise32.box 还可以在 http://www.vagrantbox.es/
更改linux的文件拥有者及用户组(chown和chgrp) 无量 c linux chgrp chown
本文（转） http://blog.163.com/yanenshun@126/blog/static/128388169201203011157308/ http://ydlmlh.iteye.com/blog/1435157 一、基本使用：使用chown命令可以修改文件或目录所属的用户：命令
linux下抓包工具矮蛋蛋 linux
原文地址： http://blog.chinaunix.net/uid-23670869-id-2610683.html tcpdump -nn -vv -X udp port 8888 上面命令是抓取udp包、端口为8888 netstat -tln 命令是用来查看linux的端口使用情况 13 . 列出所有的网络连接 lsof -i 14. 列出所有tcp 网络连接信息 l
我觉得mybatis是垃圾！：“每一个用mybatis的男纸，你伤不起” alafqq mybatis
最近看了每一个用mybatis的男纸，你伤不起原文地址：http://www.iteye.com/topic/1073938 发表一下个人看法。欢迎大神拍砖；个人一直使用的是Ibatis框架，公司对其进行过小小的改良；最近换了公司，要使用新的框架。听说mybatis不错；就对其进行了部分的研究；发现多了一个mapper层；个人感觉就是个dao；
解决java数据交换之谜百合不是茶数据交换
交换两个数字的方法有以下三种，其中第一种最常用 /* 输出最小的一个数 */ public class jiaohuan1 { public static void main(String[] args) { int a =4; int b = 3; if(a<b){ // 第一种交换方式 int tmep =
渐变显示 bijian1013 JavaScript
<style type="text/css"> #wxf { FILTER: progid:DXImageTransform.Microsoft.Gradient(GradientType=0, StartColorStr=#ffffff, EndColorStr=#97FF98); height: 25px; } </style>
探索JUnit4扩展：断言语法assertThat bijian1013 java 单元测试 assertThat
一.概述 JUnit 设计的目的就是有效地抓住编程人员写代码的意图，然后快速检查他们的代码是否与他们的意图相匹配。 JUnit 发展至今，版本不停的翻新，但是所有版本都一致致力于解决一个问题，那就是如何发现编程人员的代码意图，并且如何使得编程人员更加容易地表达他们的代码意图。JUnit 4.4 也是为了如何能够
【Gson三】Gson解析{"data":{"IM":["MSN","QQ","Gtalk"]}} bit1129 gson
如何把如下简单的JSON字符串反序列化为Java的POJO对象? {"data":{"IM":["MSN","QQ","Gtalk"]}} 下面的POJO类Model无法完成正确的解析： import com.google.gson.Gson;
【Kafka九】Kafka High Level API vs. Low Level API bit1129 kafka
1. Kafka提供了两种Consumer API High Level Consumer API Low Level Consumer API(Kafka诡异的称之为Simple Consumer API，实际上非常复杂) 在选用哪种Consumer API时，首先要弄清楚这两种API的工作原理，能做什么不能做什么，能做的话怎么做的以及用的时候，有哪些可能的问题
在nginx中集成lua脚本：添加自定义Http头，封IP等 ronin47 nginx lua
Lua是一个可以嵌入到Nginx配置文件中的动态脚本语言，从而可以在Nginx请求处理的任何阶段执行各种Lua代码。刚开始我们只是用Lua 把请求路由到后端服务器，但是它对我们架构的作用超出了我们的预期。下面就讲讲我们所做的工作。强制搜索引擎只索引mixlr.com Google把子域名当作完全独立的网站，我们不希望爬虫抓取子域名的页面，降低我们的Page rank。 location /{
java-归并排序 bylijinnan java
import java.util.Arrays; public class MergeSort { public static void main(String[] args) { int[] a={20,1,3,8,5,9,4,25}; mergeSort(a,0,a.length-1); System.out.println(Arrays.to
Netty源码学习-CompositeChannelBuffer bylijinnan java netty
CompositeChannelBuffer体现了Netty的“Transparent Zero Copy” 查看API（ http://docs.jboss.org/netty/3.2/api/org/jboss/netty/buffer/package-summary.html#package_description）可以看到，所谓“Transparent Zero Copy”是通
Android中给Activity添加返回键 hotsunshine Activity
// this need android:minSdkVersion="11" getActionBar().setDisplayHomeAsUpEnabled(true); @Override public boolean onOptionsItemSelected(MenuItem item) {
静态页面传参 ctrain 静态
$(document).ready(function () { var request = { QueryString : function (val) { var uri = window.location.search; var re = new RegExp("" + val + "=([^&?]*)", &
Windows中查找某个目录下的所有文件中包含某个字符串的命令 daizj windows 查找某个目录下的所有文件包含某个字符串
findstr可以完成这个工作。 [html] view plain copy >findstr /s /i "string" *.* 上面的命令表示，当前目录以及当前目录的所有子目录下的所有文件中查找"string&qu
改善程序代码质量的一些技巧 dcj3sjt126com 编程 PHP 重构
有很多理由都能说明为什么我们应该写出清晰、可读性好的程序。最重要的一点，程序你只写一次，但以后会无数次的阅读。当你第二天回头来看你的代码时，你就要开始阅读它了。当你把代码拿给其他人看时，他必须阅读你的代码。因此，在编写时多花一点时间，你会在阅读它时节省大量的时间。让我们看一些基本的编程技巧：尽量保持方法简短尽管很多人都遵
SharedPreferences对数据的存储 dcj3sjt126com
SharedPreferences简介： &nbs
linux复习笔记之bash shell (2) bash基础 eksliang bash bash shell
转载请出自出处： http://eksliang.iteye.com/blog/2104329 1.影响显示结果的语系变量（locale） 1.1locale这个命令就是查看当前系统支持多少种语系，命令使用如下： [root@localhost shell]# locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8"
Android零碎知识总结 gqdy365 android
1、CopyOnWriteArrayList add(E) 和remove(int index)都是对新的数组进行修改和新增。所以在多线程操作时不会出现java.util.ConcurrentModificationException错误。所以最后得出结论：CopyOnWriteArrayList适合使用在读操作远远大于写操作的场景里，比如缓存。发生修改时候做copy，新老版本分离，保证读的高
HoverTree.Model.ArticleSelect类的作用 hvt Web .net C#hovertree asp.net
ArticleSelect类在命名空间HoverTree.Model中可以认为是文章查询条件类，用于存放查询文章时的条件，例如HvtId就是文章的id。HvtIsShow就是文章的显示属性，当为-1是，该条件不产生作用，当为0时，查询不公开显示的文章，当为1时查询公开显示的文章。HvtIsHome则为是否在首页显示。HoverTree系统源码完全开放，开发环境为Visual Studio 2013
PHP 判断是否使用代理 PHP Proxy Detector 天梯梦 proxy
1. php 类 I found this class looking for something else actually but I remembered I needed some while ago something similar and I never found one. I'm sure it will help a lot of developers who try to
apache的math库中的回归——regression（翻译） lvdccyb Math apache
这个Math库，虽然不向weka那样专业的ML库，但是用户友好，易用。多元线性回归，协方差和相关性（皮尔逊和斯皮尔曼），分布测试（假设检验，t，卡方，G），统计。数学库中还包含，Cholesky，LU，SVD，QR，特征根分解，真不错。基本覆盖了：线代，统计，矩阵，最优化理论曲线拟合常微分方程遗传算法（GA），还有3维的运算。。。
基础数据结构和算法十三：Undirected Graphs (2) sunwinner Algorithm
Design pattern for graph processing. Since we consider a large number of graph-processing algorithms, our initial design goal is to decouple our implementations from the graph representation
云计算平台最重要的五项技术 sumapp 云计算云平台智城云
云计算平台最重要的五项技术 1、云服务器云服务器提供简单高效，处理能力可弹性伸缩的计算服务，支持国内领先的云计算技术和大规模分布存储技术，使您的系统更稳定、数据更安全、传输更快速、部署更灵活。特性机型丰富通过高性能服务器虚拟化为云服务器，提供丰富配置类型虚拟机，极大简化数据存储、数据库搭建、web服务器搭建等工作；仅需要几分钟，根据CP
《京东技术解密》有奖试读获奖名单公布 ITeye管理员活动
ITeye携手博文视点举办的12月技术图书有奖试读活动已圆满结束，非常感谢广大用户对本次活动的关注与参与。 12月试读活动回顾： http://webmaster.iteye.com/blog/2164754 本次技术图书试读活动获奖名单及相应作品如下：一等奖（两名） Microhardest：http://microhardest.ite