Whats new on arXiv

Decomposing an information stream into the principal components

We propose an approach to decomposing a thematic information stream into principal components. Each principal component is related to a narrow topic extracted from the information stream. The essence of the approach arises from analogy with the Fourier transform. We examine methods for analyzing the principal components and propose using multifractal analysis for identifying similar topics. The decomposition technique is applied to the information stream dedicated to Brexit. We provide a comparison between the principal components obtained by applying the decomposition to Brexit stream and the related topics extracted by Google Trends.

The Deep Kernelized Autoencoder

Autoencoders learn data representations (codes) in such a way that the input is reproduced at the output of the network. However, it is not always clear what kind of properties of the input data need to be captured by the codes. Kernel machines have experienced great success by operating via inner-products in a theoretically well-defined reproducing kernel Hilbert space, hence capturing topological properties of input data. In this paper, we enhance the autoencoder’s ability to learn effective data representations by aligning inner products between codes with respect to a kernel matrix. By doing so, the proposed kernelized autoencoder allows learning similarity-preserving embeddings of input data, where the notion of similarity is explicitly controlled by the user and encoded in a positive semi-definite kernel matrix. Experiments are performed for evaluating both reconstruction and kernel alignment performance in classification tasks and visualization of high-dimensional data. Additionally, we show that our method is capable to emulate kernel principal component analysis on a denoising task, obtaining competitive results at a much lower computational cost.

Doubly Stochastic Adversarial Autoencoder

Any autoencoder network can be turned into a generative model by imposing an arbitrary prior distribution on its hidden code vector. Variational Autoencoder (VAE) [2] uses a KL divergence penalty to impose the prior, whereas Adversarial Autoencoder (AAE) [1] uses {\it generative adversarial networks} GAN [3]. GAN trades the complexities of {\it sampling} algorithms with the complexities of {\it searching} Nash equilibrium in minimax games. Such minimax architectures get trained with the help of data examples and gradients flowing through a generator and an adversary. A straightforward modification of AAE is to replace the adversary with the maximum mean discrepancy (MMD) test [4-5]. This replacement leads to a new type of probabilistic autoencoder, which is also discussed in our paper. We propose a novel probabilistic autoencoder in which the adversary of AAE is replaced with a space of {\it stochastic} functions. This replacement introduces a new source of randomness, which can be considered as a continuous control for encouraging {\it explorations}. This prevents the adversary from fitting too closely to the generator and therefore leads to a more diverse set of generated samples.

Sparse space-time models: Concentration Inequalities and Lasso

Inspired by Kalikow-type decompositions, we introduce a new stochastic model of infinite neuronal networks, for which we establish oracle inequalities for Lasso methods and restricted eigenvalue properties for the associated Gram matrix with high probability. These results hold even if the network is only partially observed. The main argument rely on the fact that concentration inequalities can easily be derived whenever the transition probabilities of the underlying process admit a sparse space-time representation.

Rapid Time Series Prediction with a Hardware-Based Reservoir Computer

Reservoir computing is a neural network approach for processing time-dependent signals that has seen rapid development in recent years. Physical implementations of the technique using optical reservoirs have demonstrated remarkable accuracy and processing speed at benchmark tasks. However, these approaches require an electronic output layer to maintain high performance, which limits their use in tasks such as time-series prediction, where the output is fed back into the reservoir. We present here a reservoir computing scheme that has rapid processing speed both by the reservoir and the output layer. The reservoir is realized by an autonomous, time-delay, Boolean network configured on a field-programmable gate array. We investigate the dynamical properties of the network and observe the fading memory property that is critical for successful reservoir computing. We demonstrate the utility of the technique by training a reservoir to learn the short- and long-term behavior of a chaotic system. We find accuracy comparable to state-of-the-art software approaches of similar network size, but with a superior real-time prediction rate up to 160 MHz.

gSMat: A Scalable Sparse Matrix-based Join for SPARQL Query Processing

Resource Description Framework (RDF) has been widely used to represent information on the web, while SPARQL is a standard query language to manipulate RDF data. Given a SPARQL query, there often exist many joins which are the bottlenecks of efficiency of query processing. Besides, the real RDF datasets often reveal strong data sparsity, which indicates that a resource often only relates to a few resources even the number of total resources is large. In this paper, we propose a sparse matrix-based (SM-based) SPARQL query processing approach over RDF datasets which con- siders both join optimization and data sparsity. Firstly, we present a SM-based storage for RDF datasets to lift the storage efficiency, where valid edges are stored only, and then introduce a predicate- based hash index on the storage. Secondly, we develop a scalable SM-based join algorithm for SPARQL query processing. Finally, we analyze the overall cost by accumulating all intermediate results and design a query plan generated algorithm. Besides, we extend our SM-based join algorithm on GPU for parallelizing SPARQL query processing. We have evaluated our approach compared with the state-of-the-art RDF engines over benchmark RDF datasets and the experimental results show that our proposal can significantly improve SPARQL query processing with high scalability.

Apache Spark Streaming and HarmonicIO: A Performance and Architecture Comparison

Studies have demonstrated that Apache Spark, Flink and related frameworks can perform stream processing at very high frequencies, whilst tending to focus on small messages with a computationally light `map’ stage for each message; a common enterprise use case. We add to these benchmarks by broadening the domain to include loads with larger messages (leading to network-bound throughput), and that are computationally intensive (leading to CPU-bound throughput) in the map phase; in order to evaluate applicability of these frameworks to scientific computing applications. We present a performance benchmark comparison between Apache Spark Streaming (ASS) under both file and TCP streaming modes; and HarmonicIO, comparing maximum throughput over a broad domain of message sizes and CPU loads. We find that relative performance varies considerably across this domain, with the chosen means of stream source integration having a big impact. We offer recommendations for choosing and configuring the frameworks, and present a benchmarking toolset developed for this study.

Learning Representations for Soft Skill Matching

Employers actively look for talents having not only specific hard skills but also various soft skills. To analyze the soft skill demands on the job market, it is important to be able to detect soft skill phrases from job advertisements automatically. However, a naive matching of soft skill phrases can lead to false positive matches when a soft skill phrase, such as friendly, is used to describe a company, a team, or another entity, rather than a desired candidate. In this paper, we propose a phrase-matching-based approach which differentiates between soft skill phrases referring to a candidate vs. something else. The disambiguation is formulated as a binary text classification problem where the prediction is made for the potential soft skill based on the context where it occurs. To inform the model about the soft skill for which the prediction is made, we develop several approaches, including soft skill masking and soft skill tagging. We compare several neural network based approaches, including CNN, LSTM and Hierarchical Attention Model. The proposed tagging-based input representation using LSTM achieved the highest recall of 83.92% on the job dataset when fixing a precision to 95%.

Twitter Sentiment Analysis System

Social media is increasingly used by humans to express their feelings and opinions in the form of short text messages. Detecting sentiments in the text has a wide range of applications including identifying anxiety or depression of individuals and measuring well-being or mood of a community. Sentiments can be expressed in many ways that can be seen such as facial expression and gestures, speech and by written text. Sentiment Analysis in text documents is essentially a content-based classification problem involving concepts from the domains of Natural Language Processing as well as Machine Learning. In this paper, sentiment recognition based on textual data and the techniques used in sentiment analysis are discussed.

A Generalized Vector Space Model for Ontology-Based Information Retrieval

Named entities (NE) are objects that are referred to by names such as people, organizations and locations. Named entities and keywords are important to the meaning of a document. We propose a generalized vector space model that combines named entities and keywords. In the model, we take into account different ontological features of named entities, namely, aliases, classes and identifiers. Moreover, we use entity classes to represent the latent information of interrogative words in Wh-queries, which are ignored in traditional keyword-based searching. We have implemented and tested the proposed model on a TREC dataset, as presented and discussed in the paper.

Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds

Similarity and metric learning provides a principled approach to construct a task-specific similarity from weakly supervised data. However, these methods are subject to the curse of dimensionality: as the number of features grows large, poor generalization is to be expected and training becomes intractable due to high computational and memory costs. In this paper, we propose a similarity learning method that can efficiently deal with high-dimensional sparse data. This is achieved through a parameterization of similarity functions by convex combinations of sparse rank-one matrices, together with the use of a greedy approximate Frank-Wolfe algorithm which provides an efficient way to control the number of active features. We show that the convergence rate of the algorithm, as well as its time and memory complexity, are independent of the data dimension. We further provide a theoretical justification of our modeling choices through an analysis of the generalization error, which depends logarithmically on the sparsity of the solution rather than on the number of features. Our experiments on datasets with up to one million features demonstrate the ability of our approach to generalize well despite the high dimensionality as well as its superiority compared to several competing methods.

The Sliding Window Discrete Fourier Transform

This paper introduces a new tool for time-series analysis: the Sliding Window Discrete Fourier Transform (SWDFT). The SWDFT is especially useful for time-series with local- in-time periodic components. We define a 5-parameter model for noiseless local periodic signals, then study the SWDFT of this model. Our study illustrates several key concepts crucial to analyzing time-series with the SWDFT, in particular Aliasing, Leakage, and Ringing. We also show how these ideas extend to R > 1 local periodic components, using the linearity property of the Fourier transform. Next, we propose a simple procedure for estimating the 5 parameters of our local periodic signal model using the SWDFT. Our estimation procedure speeds up computation by using a trigonometric identity that linearizes estimation of 2 of the 5 parameters. We conclude with a very small Monte Carlo simulation study of our estimation procedure under different levels of noise.

Competition vs. Concatenation in Skip Connections of Fully Convolutional Networks

Increased information sharing through short and long-range skip connections between layers in fully convolutional networks have demonstrated significant improvement in performance for semantic segmentation. In this paper, we propose Competitive Dense Fully Convolutional Networks (CDFNet) by introducing competitive maxout activations in place of naive feature concatenation for inducing competition amongst layers. Within CDFNet, we propose two architectural contributions, namely competitive dense block (CDB) and competitive unpooling block (CUB) to induce competition at local and global scales for short and long-range skip connections respectively. This extension is demonstrated to boost learning of specialized sub-networks targeted at segmenting specific anatomies, which in turn eases the training of complex tasks. We present the proof-of-concept on the challenging task of whole body segmentation in the publicly available VISCERAL benchmark and demonstrate improved performance over multiple learning and registration based state-of-the-art methods.

Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis

Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as MATLAB/Octave, to tens of thousands of cores presents many technical challenges – in particular, rapidly dispatching many tasks through a scheduler, such as Slurm, and starting many instances of applications with thousands of dependencies. Careful tuning of launches and prepositioning of applications overcome these challenges and allow the launching of thousands of tasks in seconds on a 40,000-core supercomputer. Specifically, this work demonstrates launching 32,000 TensorFlow processes in 4 seconds and launching 262,000 Octave processes in 40 seconds. These capabilities allow researchers to rapidly explore novel machine learning architecture and data analysis algorithms.

An Operational Approach to Information Leakage

Semi-Generative Modelling: Domain Adaptation with Cause and Effect Features

This paper presents a novel, causally-inspired approach to domain adaptation which aims to also include unlabelled data in the model fitting when labelled data is scarce. We consider a case of covariate-shift adaptation with cause and effect features, and–drawing from recent ideas in causal modelling and invariant prediction–show how this setting leads to, what we will refer to as, a semi-generative model: P(Y, X_eff; X_cau,\theta). Our proposed approach is robust to changes in the distribution over causal features, and naturally allows to impose model constraints by unsupervised learning of a map from causes to effects. In experiments on synthetic datasets we demonstrate a significant improvement in classification performance of our semi-generative model over purely-supervised and importance-weighting baselines when the amount of labelled data is small. Moreover, we apply our approach for regression on real-world protein-count data and compare it to feature transformation methods.

Boosting algorithms for uplift modeling

Uplift modeling is an area of machine learning which aims at predicting the causal effect of some action on a given individual. The action may be a medical procedure, marketing campaign, or any other circumstance controlled by the experimenter. Building an uplift model requires two training sets: the treatment group, where individuals have been subject to the action, and the control group, where no action has been performed. An uplift model allows then to assess the gain resulting from taking the action on a given individual, such as the increase in probability of patient recovery or of a product being purchased. This paper describes an adaptation of the well-known boosting techniques to the uplift modeling case. We formulate three desirable properties which an uplift boosting algorithm should have. Since all three properties cannot be satisfied simultaneously, we propose three uplift boosting algorithms, each satisfying two of them. Experiments demonstrate the usefulness of the proposed methods, which often dramatically improve performance of the base models and are thus new and powerful tools for uplift modeling.

Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy

Deep convolution neural network has achieved great success in many artificial intelligence applications. However, its enormous model size and massive computation cost have become the main obstacle for deployment of such powerful algorithm in the low power and resource-limited embedded systems. As the countermeasure to this problem, in this work, we propose statistical weight scaling and residual expansion methods to reduce the bit-width of the whole network weight parameters to ternary values (i.e. -1, 0, +1), with the objectives to greatly reduce model size, computation cost and accuracy degradation caused by the model compression. With about 16x model compression rate, our ternarized ResNet-32/44/56 could outperform full-precision counterparts by 0.12%, 0.24% and 0.18% on CIFAR- 10 dataset. We also test our ternarization method with AlexNet and ResNet-18 on ImageNet dataset, which both achieve the best top-1 accuracy compared to recent similar works, with the same 16x compression rate. If further incorporating our residual expansion method, compared to the full-precision counterpart, our ternarized ResNet-18 even improves the top-5 accuracy by 0.61% and merely degrades the top-1 accuracy only by 0.42% for the ImageNet dataset, with 8x model compression rate. It outperforms the recent ABC-Net by 1.03% in top-1 accuracy and 1.78% in top-5 accuracy, with around 1.25x higher compression rate and more than 6x computation reduction due to the weight sparsity.

• Explicit inverse of nonsingular Jacobi matrices• Field-Trial of Machine Learning-Assisted Quantum Key Distribution (QKD) Networking with SDN• Socio-Material Network Analysis: A Mixed Method Study of Five European Artistic Collectives• Battery Swapping Station as an Energy Storage for Capturing Distribution-Integrated Solar Variability• When Do Gomory-Hu Subtrees Exist• About BIRDS project (Bioinformatics and Information Retrieval Data Structures Analysis and Design)• Output Selection and Observer Design for Boolean Control Networks: A Sub-Optimal Polynomial-Complexity Algorithm• Fully Convolutional Pixel Adaptive Image Denoiser• Thouless and Relaxation Time Scales in Many-Body Quantum Systems• Spectral characterizations of anti-regular graphs• The colored longest common prefix array computed via sequential scans• Wα,p W α , p and C0,γ C 0 , γ regularity of solutions to (μΔ+b)u=f ( μ − Δ + b ⋅ ∇ ) u = f with form-bounded vector fields• Avoiding 3-Term Geometric Progressions in Non-Commutative Settings• Unrolling Swiss Cheese: Metric repair on manifolds with holes• Adaptive Variational Particle Filtering in Non-stationary Environments• Derivation degree sequences of non-free arrangements• Generalized Metric Repair on Graphs• Approximate Collapsed Gibbs Clustering with Expectation Propagation• Distributed Power Control Schemes for In-Band Full-Duplex Energy Harvesting Wireless Networks• An Optimal Algorithm for Stochastic and Adversarial Bandits• The Limiting Eigenvalue Distribution of Iterated k-Regular Graph Cylinders• Quasi-polynomial time approximation schemes for packing and covering problems in planar graphs• Multi-Resolution Hashing for Fast Pairwise Summations• Coloring in Graph Streams• Distributed approximation algorithms for maximum matching in graphs and hypergraphs• On Chebotarëv’s nonvanishing minors theorem and the Biró-Meshulam-Tao discrete uncertainty principle• Near-Optimal Distributed Estimation for a Network of Sensing Units Operating Under Communication Constraints• Deriving star cluster parameters by convolutional neural networks. I. Age, mass, and size• Automatically Designing CNN Architectures for Medical Image Segmentation• Multitask Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies• Statistical generalized derivative applied to the profile likelihood estimation in a mixture of semiparametric models• Bounding Box Embedding for Single Shot Person Instance Segmentation• Arithmetic aspects of symmetric edge polytopes• Generalized Stochastic Frank-Wolfe Algorithm with Stochastic ‘Substitute” Gradient for Structured Convex Optimization• Optimal Co-design of Industrial Networked Control Systems with State-dependent Correlated Fading Channels• Exact minimum number of bits to stabilize a linear system• Toward Characteristic-Preserving Image-based Virtual Try-On Network• Automatic Semantic Content Removal by Learning to Neglect• Wild Residual Bootstrap Inference for Penalized Quantile Regression with Heteroscedastic Errors• Editable Generative Adversarial Networks: Generating and Editing Faces Simultaneously• PhaseStain: Digital staining of label-free quantitative phase microscopy images using deep learning• Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model• Bi-Directional Cooperative NOMA without Full CSIT• Blind Signal Classification for Non-Orthogonal Multiple Access in Internet-of-Things Networks• Brain Tumor Segmentation and Tractographic Feature Extraction from Structural MR Images for Overall Survival Prediction• Efficient Facial Representations for Age, Gender and Identity Recognition in Organizing Photo Albums using Multi-output CNN• On Euclidean Methods for Cubic and Quartic Jacobi Symbols• Hitting time, access time and optimal transport on graphs• Stability in EMU• Impurity coupled to a lattice with disorder• Simplicial complexes and complex systems• Breaking of ensemble equivalence for perturbed Erdös-Rényi random graphs• Learning the effect of latent variables in Gaussian Graphical models with unobserved variables• Stable MPC Design for Hybrid Mixed Logical Dynamical Systems: l l ∞ -based Lyapunov Approach• Improving Image Clustering With Multiple Pretrained CNN Feature Extractors• Controllability of Social Networks and the Strategic Use of Random Information• Novel flow field design for vanadium redox flow batteries via topology optimization• Concentration of measure for finite spin systems• Above Modelling and Adaptive Control of a Double Winded Induction Generator• Physical Adversarial Examples for Object Detectors• Wind Energy Conversion System – a Laboratory Setup• Principal Flow Patterns across renewable electricity networks• Non universality of fluctuations of outlier eigenvectors for block diagonal deformations of Wigner matrices• Semantic Document Clustering on Named Entity Features• Dialectical GAN for SAR Image Translation: From Sentinel-1 to TerraSAR-X• Lesion Saliency for Weakly Supervised Lesion Detection from Breast DCE-MRI• Fast transforms over finite fields of characteristic two• Kripke Semantics of the Perfectly Transparent Equilibrium• 3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image• Finding Structure in Dynamic Networks• Continuous-Time Accelerated Methods via a Hybrid Control Lens• A characterization and an application of weight-regular partitions of graphs• Semiparametric Inference and Lower Bounds for Real Elliptically Symmetric Distributions• A Theil-like Class of Inequality Measures, its Asymptotic Normality Theory and Applications• Effect of disorder and noise in shaping the dynamics of power grids• On SOCP/SDP formulation of the extended trust region subproblem• Wireless Multi-Sensor Networks for Smart Cities: A Prototype System with Statistical Data Analysis• A Norm-Bounded based MPC strategy for uncertain systems under partial state availability• Towards functorial language-games• Biclustering Using Modified Matrix Bandwidth Minimization and Biogeography-based Optimization• Rank Minimization for Snapshot Compressive Imaging• TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time• Distance-based Kernels for Surrogate Model-based Neuroevolution• On Synchronization of Dynamical Systems over Directed Switching Topologies: An Algebraic and Geometric Perspective• Self-regulation promotes cooperation in social networks• Surgical Phase Recognition of Short Video Shots Based on Temporal Modeling of Deep Features• Talking Face Generation by Adversarially Disentangled Audio-Visual Representation• From Face Recognition to Models of Identity: A Bayesian Approach to Learning about Unknown Identities from Unsupervised Data• On Secure Transmission Design: An Information Leakage Perspective• On a Loss-based prior for the number of components in mixture models• Complex Economic Activities Concentrate in Large Cities• Submodular Maximization with Optimal Approximation, Adaptivity and Query Complexity• Self-stabilization Overhead: an Experimental Case Study on Coded Atomic Storage• The Iterated Weighted Least-Squares Fit• Optimal Bounds on the VC-dimension• Asymptotic results under multiway clustering• Photorealistic Video Super Resolution• On discrete-time semi-Markov processes• Large scale evaluation of local image feature detectors on homography datasets• (k,λ) ( k , λ ) -Anti-Powers and Other Patterns in Words• Future Semantic Segmentation with Convolutional LSTM• Decentralized Task Allocation in Multi-Robot Systems via Bipartite Graph Matching Augmented with Fuzzy Clustering

Like this:

Like Loading…

Related

你可能感兴趣的:(深度学习)