【流行深度学习模型的(Keras)参考实现集】’Keras Applications - Reference implementations of popular deep learning models.' GitHub: 网页链接
【Tensorflow.js摄像头(人体)追踪】《Webcam Tracking with Tensorflow.js - YouTube》by Siraj Raval 网页链接 GitHub:网页链接 “Tensorflow.js摄像头(人体)追踪” 搬运:网页链接
【贝叶斯思维可视化指南】《A visual guide to Bayesian thinking - YouTube》by Julia Galef 网页链接 L爱可可-爱生活的秒拍视频
【从头搭建自动问答系统】《Building a Question-Answering System from Scratch》by Alvira Swalin Part 1:网页链接 pdf:网页链接
《Hyperbolic Neural Networks》O Ganea, G Bécigneul, T Hofmann [ETH Zürich] (2018) 网页链接 view:网页链接 GitHub:网页链接
《EcoRNN: Fused LSTM RNN Implementation with Data Layout Optimization》B Zheng, A Nair, Q Wu, N Vijaykumar, G Pekhimenko [University of Toronto & CMU] (2018) 网页链接 view:网页链接
《Virtuously Safe Reinforcement Learning》H Aslund, E M E Mhamdi, R Guerraoui, A Maurer [EPFL] (2018) 网页链接 view:网页链接
《Embedding Syntax and Semantics of Prepositions via Tensor Decomposition》H Gong, S Bhat, P Viswanath [University of Illinois at Urbana-Champaign] (2018) 网页链接 view:网页链接
《Estimating Carotid Pulse and Breathing Rate from Near-infrared Video of the Neck》W Chen, J Hernandez, R W. Picard [MIT] (2018) 网页链接 view:网页链接
《Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data》W Hsu, J Glass [MIT] (2018) 网页链接 view:网页链接
《Phrase Table as Recommendation Memory for Neural Machine Translation》Y Zhao, Y Wang, J Zhang, C Zong [CAS & University of Chinese Academy of Sciences] (2018) 网页链接 view:网页链接
《Polyglot Semantic Role Labeling》P Mulcaire, S Swayamdipta, N Smith [University of Washington & CMU] (2018) 网页链接 view:网页链接
《Rotation Equivariance and Invariance in Convolutional Neural Networks》B Chidester, M N. Do, J Ma [CMU & University of Illinois at Urbana-Champaign] (2018) 网页链接 view:网页链接 GitHub:网页链接
《Supervised Policy Update》Q H Vuong, Y Zhang, K W. Ross [New York University] (2018) 网页链接 view:网页链接
《How Important Is a Neuron?》K Dhamdhere, M Sundararajan, Q Yan [Google AI] (2018) 网页链接 view:网页链接
《Depth and nonlinearity induce implicit exploration for RL》J Dauparas, R Tomioka, K Hofmann [University of Cambridge & Microsoft Research] (2018) 网页链接 view:网页链接
《Pathology Segmentation using Distributional Differences to Images of Healthy Origin》S Andermatt, A Horváth, S Pezold, P Cattin [University of Basel] (2018) 网页链接 view:网页链接
《A Unified Particle-Optimization Framework for Scalable Bayesian Sampling》C Chen, R Zhang, W Wang, B Li, L Chen [University at Buffalo & Duke University] (2018) 网页链接 view:网页链接
《Multi-Resolution 3D Convolutional Neural Networks for Object Recognition》S Ghadai, X Lee, A Balu, S Sarkar, A Krishnamurthy [Iowa State University] (2018) 网页链接 view:网页链接
《Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization》S Liu, B Kailkhura, P Chen, P Ting, S Chang, L Amini [IBM Research & Lawrence Livermore National Laboratory & University of Michigan] (2018) 网页链接 view:网页链接
《Large Data and Zero Noise Limits of Graph-Based Semi-Supervised Learning Algorithms》M M. Dunlop, D Slepčev, A M. Stuart, M Thorpe [Caltech & CMU & University of Cambridge] (2018) 网页链接 view:网页链接
《Pushing the bounds of dropout》G Melis, C Blundell, T Kočiský, K M Hermann, C Dyer, P Blunsom [DeepMind] (2018) 网页链接 view:网页链接
《A Generalized Active Learning Approach for Unsupervised Anomaly Detection》T Pimentel, M Monteiro, J Viana, A Veloso, N Ziviani [Kunumi & UFMG] (2018) 网页链接 view:网页链接
《Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces》A Coucke, A Saade, A Ball, T Bluche... [Snips] (2018) 网页链接 view:网页链接
《Deployment of Customized Deep Learning based Video Analytics On Surveillance Cameras》P Dubal, R Mahadev, S Kothawade, K Dargan, R Iyer [AitoeLabs] (2018) 网页链接 view:网页链接
《Deep Functional Dictionaries: Learning Consistent Semantic Structures on 3D Models from Functions》M Sung, H Su, R Yu, L Guibas [Stanford University & University of California San Diego] (2018) 网页链接 view:网页链接
《A Unified Probabilistic Model for Learning Latent Factors and Their Connectivities from High-Dimensional Data》R P Monti, A Hyvärinen [University College London] (2018) 网页链接 view:网页链接
《Meta-Gradient Reinforcement Learning》Z Xu, H v Hasselt, D Silver [DeepMind] (2018) 网页链接 view:网页链接
《Unsupervised Alignment of Embeddings with Wasserstein Procrustes》E Grave, A Joulin, Q Berthet [New York City & Facebook AI Research & University of Cambridge] (2018) 网页链接 view:网页链接
《Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting》Y Chen, M Bansal [UNC Chapel Hill] (2018) 网页链接 view:网页链接 GitHub:网页链接
arXiv Papers | Data Analytics & R
[1805.11643v1] High Dimensional Robust Sparse Regression
We provide a novel -- and to the best of our knowledge, the first -- algorithm for high dimensional sparse regression with corruptions in explanatory and/or response variables. Our algorithm recovers the true sparse parameters in the presence of a constant fraction of arbitrary corruptions. Our main contribution is a robust variant of Iterative Hard Thresholding. Using this, we provide accurate estimators with sub-linear sample complexity. Our algorithm consists of a novel randomized outlier removal technique for robust sparse mean estimation that may be of interest in its own right: it is orderwise more efficient computationally than existing algorithms, and succeeds with high probability, thus making it suitable for general use in iterative algorithms. We demonstrate the effectiveness on large-scale sparse regression problems with arbitrary corruptions.[1805.11653v1] LSTMs Exploit Linguistic Attributes of Data
While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of sequential data. We investigate how the properties of natural language data affect an LSTM's ability to learn a nonlinguistic task: recalling elements from its input. We find that models trained on natural language data are able to recall tokens from much longer sequences than models trained on non-language sequential data. Furthermore, we show that the LSTM learns to solve the memorization task by explicitly using a subset of its neurons to count timesteps in the input. We hypothesize that the patterns and structure in natural language data enable LSTMs to learn by providing approximate ways of reducing loss, but understanding the effect of different training data on the learnability of LSTMs remains an open question.[1805.11686v1] Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition
The design of a reward function often poses a major practical challenge to real-world applications of reinforcement learning. Approaches such as inverse reinforcement learning attempt to overcome this challenge, but require expert demonstrations, which can be difficult or expensive to obtain in practice. We propose variational inverse control with events (VICE), which generalizes inverse reinforcement learning methods to cases where full demonstrations are not needed, such as when only samples of desired goal states are available. Our method is grounded in an alternative perspective on control and reinforcement learning, where an agent's goal is to maximize the probability that one or more events will happen at some point in the future, rather than maximizing cumulative rewards. We demonstrate the effectiveness of our methods on continuous control tasks, with a focus on high-dimensional observations like images where rewards are hard or even impossible to specify.[1805.11706v1] Supervised Policy Update
We propose a new sample-efficient methodology, called Supervised Policy Update (SPU), for deep reinforcement learning. Starting with data generated by the current policy, SPU optimizes over the proximal policy space to find a non-parameterized policy. It then solves a supervised regression problem to convert the non-parameterized policy to a parameterized policy, from which it draws new samples. There is significant flexibility in setting the labels in the supervised regression problem, with different settings corresponding to different underlying optimization problems. We develop a methodology for finding an optimal policy in the non-parameterized policy space, and show how Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) can be addressed by this methodology. In terms of sample efficiency, our experiments show SPU can outperform PPO for simulated robotic locomotion tasks.[1805.11724v1] Rethinking Knowledge Graph Propagation for Zero-Shot Learning
The potential of graph convolutional neural networks for the task of zero-shot learning has been demonstrated recently. These models are highly sample efficient as related concepts in the graph structure share statistical strength allowing generalization to new classes when faced with a lack of data. However, knowledge from distant nodes can get diluted when propagating through intermediate nodes, because current approaches to zero-shot learning use graph propagation schemes that perform Laplacian smoothing at each layer. We show that extensive smoothing does not help the task of regressing classifier weights in zero-shot learning. In order to still incorporate information from distant nodes and utilize the graph structure, we propose an Attentive Dense Graph Propagation Module (ADGPM). ADGPM allows us to exploit the hierarchical graph structure of the knowledge graph through additional connections. These connections are added based on a node's relationship to its ancestors and descendants and an attention scheme is further used to weigh their contribution depending on the distance to the node. Finally, we illustrate that finetuning of the feature representation after training the ADGPM leads to considerable improvements. Our method achieves competitive results, outperforming previous zero-shot learning approaches.[1805.11730v1] Learn to Combine Modalities in Multimodal Deep Learning
Combining complementary information from multiple modalities is intuitively appealing for improving the performance of learning-based approaches. However, it is challenging to fully leverage different modalities due to practical challenges such as varying levels of noise and conflicts between modalities. Existing methods do not adopt a joint approach to capturing synergies between the modalities while simultaneously filtering noise and resolving conflicts on a per sample basis. In this work we propose a novel deep neural network based technique that multiplicatively combines information from different source modalities. Thus the model training process automatically focuses on information from more reliable modalities while reducing emphasis on the less reliable modalities. Furthermore, we propose an extension that multiplicatively combines not only the single-source modalities, but a set of mixtured source modalities to better capture cross-modal signal correlations. We demonstrate the effectiveness of our proposed technique by presenting empirical results on three multimodal classification tasks from different domains. The results show consistent accuracy improvements on all three tasks.[1805.11761v1] Collaborative Learning for Deep Neural Networks
We introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost. It acquires the strengths from auxiliary training, multi-task learning and knowledge distillation. There are two important mechanisms involved in collaborative learning. First, the consensus of multiple views from different classifier heads on the same example provides supplementary information as well as regularization to each classifier, thereby improving generalization. Second, intermediate-level representation (ILR) sharing with backpropagation rescaling aggregates the gradient flows from all heads, which not only reduces training computational complexity, but also facilitates supervision to the shared layers. The empirical results on CIFAR and ImageNet datasets demonstrate that deep neural networks learned as a group in a collaborative way significantly reduce the generalization error and increase the robustness to label noise.[1805.11797v1] Grow and Prune Compact, Fast, and AccurateLSTMs
Long short-term memory (LSTM) has been widely used for sequential data modeling. Researchers have increased LSTM depth by stacking LSTM cells to improve performance. This incurs model redundancy, increases run-time delay, and makes the LSTMs more prone to overfitting. To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM's original one level non-linear control gates. H-LSTM increases accuracy while employing fewer external stacked layers, thus reducing the number of parameters and run-time latency significantly. We employ grow-and-prune (GP) training to iteratively adjust the hidden layers through gradient-based growth and magnitude-based pruning of connections. This learns both the weights and the compact architecture of H-LSTM control gates. We have GP-trained H-LSTMs for image captioning and speech recognition applications. For the NeuralTalk architecture on the MSCOCO dataset, our three models reduce the number of parameters by 38.7x [floating-point operations (FLOPs) by 45.5x], run-time latency by 4.5x, and improve the CIDEr score by 2.6. For the DeepSpeech2 architecture on the AN4 dataset, our two models reduce the number of parameters by 19.4x (FLOPs by 23.5x), run-time latency by 15.7%, and the word error rate from 12.9% to 8.7%. Thus, GP-trained H-LSTMs can be seen to be compact, fast, and accurate.[1805.11917v1] The Dynamics of Learning: A Random Matrix Approach
Understanding the learning dynamics of neural networks is one of the key issues for the improvement of optimization algorithms as well as for the theoretical comprehension of why deep neural nets work so well today. In this paper, we introduce a random matrix-based framework to analyze the learning dynamics of a single-layer linear network on a binary classification problem, for data of simultaneously large dimension and size, trained by gradient descent. Our results provide rich insights into common questions in neural nets, such as overfitting, early stopping and the initialization of training, thereby opening the door for future studies of more elaborate structures and models appearing in today's neural networks.[1805.11648v1] Teaching Meaningful Explanations
The adoption of machine learning in high-stakes applications such as healthcare and law has lagged in part because predictions are not accompanied by explanations comprehensible to the domain user, who often holds ultimate responsibility for decisions and outcomes. In this paper, we propose an approach to generate such explanations in which training data is augmented to include, in addition to features and labels, explanations elicited from domain users. A joint model is then learned to produce both labels and explanations from the input features. This simple idea ensures that explanations are tailored to the complexity expectations and domain knowledge of the consumer. Evaluation spans multiple modeling techniques on a simple game dataset, an image dataset, and a chemical odor dataset, showing that our approach is generalizable across domains and algorithms. Results demonstrate that meaningful explanations can be reliably taught to machine learning algorithms, and in some cases, improve modeling accuracy.