ICDE 2017 论文目录与一些想法的整理

ICDE 2017

1.1 Research Session 1A: Graphs

UniWalk: Unidirectional Random Walk Based Scalable SimRank Computation over Large Graph


A Fast Order-Based Approach for Core Maintenance


Scalable and Interactive Graph Clustering Algorithm on Multicore CPUs


Fast Computation of Dense Temporal Subgraphs

1.2 Research Session 1B: Keyword Search, Text, and Strings

Reverse Keyword-Based Location Search
反向搜索关键字的位置

Abstract
The proliferation of geo-textual data gives prominence to spatial keyword search. The basic top-k spatial keyword query, returns k geo-textual objects that rank the highest according to their textual relevance and spatial proximity to query keywords and a query location. We define, study, and provide means of computing the reverse top-k keyword-based location query. This new type of query takes a set of keywords, a query object q, and a number k as arguments, and it returns a spatial region such that any top-k spatial keyword query with the query keywords and a location in this region would contain object q in its result. This query targets applications in market analysis, geographical planning, and location optimization, and it may support applications related to safe zones and influence zones that are used widely in location-based services.
We show that computing an exact query result requires evaluating and merging a set of weighted Voronoi cells, which is expensive. We therefore devise effective algorithms that approximate result regions with quality guarantees. We develop novel pruning techniques on top of an index, and we offer a series of optimization techniques that aim to further accelerate query processing. Empirical studies suggest that the proposed query processing is efficient and scalable.

摘要
地理文本数据的激增突出了空间关键词搜索。最基本的 top-k 空间关键字查询,返回 k 个地理文本对象,根据它们与查询关键字的文本相关性和空间接近程度以及查询位置三者排序最高的返回。我们定义、研究并提供计算的方法,是基于 top-k 关键字的反向位置查询的。这种新类型的查询使用一组关键字、一个查询对象 q 和一个数字 k 作为参数,它返回一个空间区域,这样任何带有查询关键字的 top-k 空间关键字查询,和可能在结果中包含对象 q 的该区域的一个位置。此查询针对市场分析、地理规划和位置优化中应用程序,它可能支持在基于位置的服务中广泛使用的与安全区域和影响区域相关的应用程序。

我们证明,计算精确的查询结果需要计算和合并一组加权 Voronoi 单元,这是非常昂贵的。因此,我们设计了有效的算法来近似带有质量保证的结果区域。我们在索引之上开发新的修剪技术,并提供一系列优化技术,以进一步加速查询处理。实证研究表明,提出的查询处理是有效的和可扩展的。

【单词】
proliferation:n. 增殖,扩散;分芽繁殖;
devise:vt. 设计;想出;发明;图谋;遗赠给;
Empirical studies:实证研究;


Reverse Top-k Geo-Social Keyword Queries in Road Networks
反向 top-k 地理 - 社会关键字查询在道路网络中


Mismatching Trees and BWT Arrays: A New Way for String Matching with k-Mismatches
不匹配树和 BWT 数组:字符串匹配与 k 不匹配的新方法


Source-LDA: Enhancing probabilistic topic models using prior knowledge sources
source-LDA:使用先前的知识来源增强概率主题模型

1.3 Research Session 2A: Data Mining

Network Backboning with Noisy Data
有噪声数据的网络反向连接


Scalable Informative Rule Mining
可伸缩的关联规则


Streaming k-Means Clustering with Fast Queries
使用快速查询的流化 k-means 集群


Density based Clustering over Location Based Services
基于密度的集群优于基于位置的服务

1.4 Research Session 2B: Query Optimization and Provenance

Provenance-aware Query Optimization
知晓出处的查询优化


A SQL-Middleware Unifying Why and Why-Not Provenance for First-Order Queries
一种 SQL 中间件统一定义为什么和为什么不使用第一阶查询的出处
【单词】
unifying:统一,使统一;
Provenance:出处;


Extended Characteristic Sets: Graph Indexing for SPARQL Query Optimization


TT-Join: Efficient Set Containment Join
tt-Join: 高效的集合包含连接
【单词】:
containment:n. 包含

1.5 Research Session 3A: Systems for New Analytics

Scalable Linear Algebra on a Relational Database System
KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics

Parallel SPARQL Query Optimization

Efficient Scalable Accurate Regression Queries in In-DBMS Analytics

Towards Unified Data and Lifecycle Management for Deep Learning

1.6 Research Session 3B: Top-k, kNN and Skyline Querying

Monitoring the Top-m Aggregation in a Sliding Window of Spatial Queries

Abstract—In this paper, we propose and study the problem of top-m rank aggregation of spatial objects in streaming queries, where, given a set of objects O, a stream of spatial queries (kNN or range), the goal is to report the m objects with the highest aggregate rank. The rank of an object with respect to an individual query is computed based on its distance from the query location, and the aggregate rank is computed from all of the individual rank orderings. In order to solve this problem, we show how to upper and lower bound the rank of an object for any unseen query. Then we propose an approximation solution to continuously monitor the top-m objects efficiently, for which we design an Inverted Rank File (IRF) index to guarantee the error bound of the solution. In particular, we propose the notion of safe ranking to determine whether the current result is still valid or not when new queries arrive, and propose the notion of validation objects to limit the number of objects to update in the top-m results. We also propose an exact solution for applications where an approximate solution is not sufficient. Last, we conduct extensive experiments to verify the efficiency and effectiveness of our solutions. This is a fundamental problem that draws inspiration from three different domains: rank aggregation, continuous queries and spatial databases, and the solution can be used to monitor the importance / popularity of spatial objects, which in turn can provide new analytical tools for spatial data.

摘要: 在本文中,我们提出并研究了流式查询中空间对象的 top-m 秩聚合问题,其中给定一组对象 O,一个空间查询流 (kNN 或 range),目标是报告聚合程度最高的对象 m。针对单个查询的对象的 rank 是根据其与查询位置的距离来计算的,而且聚合级别是根据所有的单个级别顺序计算的。为了解决这个问题,我们展示了如何向上和向下绑定对象的级别,以获取任何未见的查询。然后我们提出了一种近似解,可以有效地连续监测 top-m 对象,为此我们设计了一个反向 rank 文件(Inverted-Ranked-File)(IRF) 索引来保证解决方案的误差范围。特别是,我们提出了安全排序的概念,以确定当新的查询到达时当前结果是否仍然有效,并提出了验证对象的概念,以限制 top-m 结果中要更新的对象的数量。对于近似解不充分的应用领域,我们也提出了一个精确解。最后,我们进行了大量的实验来验证我们的解决方案的有效性和有效性。这是一个从三个不同领域获得灵感的基本问题: rank 聚合、连续查询和空间数据库,该解决方案可用于监视空间对象的重要性 / 流行程度,从而为空间数据提供新的分析工具。


Answering Top-k Exemplar Trajectory Queries
响应 Top-k 范例的轨迹查询

Abstract—We study a new type of spatial-textual trajectory search: the Exemplar Trajectory Query (ETQ), which specifies one or more places to visit, and descriptions of activities at each place. Our goal is to efficiently find the top-k trajectories by computing spatial and textual similarity at each point. The computational cost for pointwise matching is significantly higher than previous approaches. Therefore, we introduce an incremental pruning baseline and explore how to adaptively tune our approach, introducing a gap-based optimization and a novel twolevel threshold algorithm to improve efficiency. Our proposed methods support order-sensitive ETQ with a minor extension. Experiments on two datasets verify the efficiency and scalability of our proposed solution.

摘要——我们研究了一种新的空间文本轨迹搜索: 范例轨迹查询 (ETQ),它指定一个或多个要访问的地方,以及每个地方的活动描述。我们的目标是通过计算每个点的空间和文本相似性来有效地找到 top-k 轨迹。点对点(逐点)匹配的计算成本明显高于以前的方法。因此,我们引入了一种增量修剪基本方法(baseline),并探讨了如何自适应地调整我们的方法,引入了一种基于间隙(gap-based)的优化和一种新的两层阈值算法来提高效率。我们提出的方法支持对订单敏感的(order-sensitive)和有一个小的扩展的 ETQ。两个数据集的实验验证了我们提出的解决方案的效率和可扩展性。

【单词】
textual:文本的;
Exemplar:示例、榜样;
Trajectory:轨道;
specifies:指定,说明;
pointwise:逐点的;
tune:v. 调整;
novel:adj. 新颖的;


V-Tree: Efficient kNN Search on Moving Objects with Road-Network Constraints

Abstract—Intelligent transportation systems, e.g., Uber, have become an important tool for urban transportation. An important problem is k nearest neighbor (kNN) search on moving objects with road-network constraints, which, given moving objects on the road networks and a query, finds k nearest objects to the query location. Existing studies focus on either kNN search on static objects or continuous kNN search with Euclidean-distance constraints. The former cannot support dynamic updates of moving objects while the latter cannot support road networks. Since the objects are dynamically moving on the road networks, there are two main challenges. The first is how to index the moving objects on road networks and the second is how to find the k nearest moving objects. To address these challenges, in this paper we proposes a new index, V-Tree, which has two salient features. Firstly, it is a balanced search tree and can support efficient kNN search. Secondly, it can support dynamical updates of moving objects. To build a V-Tree, we iteratively partition the road network into sub-networks and build a tree structure on top of the sub-networks. Then we associate the moving objects on their nearest vertices in the V-Tree. When the location of an object is updated, we only need to update the tree nodes on the path from the corresponding leaf node to the root. We design a novel kNN search algorithm using V-Tree by pruning large numbers of irrelevant vertices in the road network. Experimental results on real datasets show that our method significantly outperforms baseline approaches by 2-3 orders of magnitude.

摘要–智能交通系统,如 Uber,已经成为城市交通的重要工具。一个重要的问题是 k 最近邻居 (kNN) 搜索具有道路网络约束的移动对象,该搜索在给定道路网络上的移动对象和查询,查找到距离查询位置的 k 最近对象。现有的研究主要集中在静态对象的 kNN 搜索或者使用欧氏距离约束的连续 kNN 搜索。前者不支持移动对象的动态更新,而后者不支持道路网络。由于目标是在道路网络上动态移动的,因此有两个主要的挑战:一是如何索引路网中运动的物体,二是如何找到 k 个最近的运动物体。为了解决这些挑战,本文提出了一个新的索引——V-Tree,它有两个显著的特点。首先,它是一个均衡的搜索树,可以支持高效的 kNN 搜索。其次,它可以支持运动对象的动态更新。为了构建 vTree,我们迭代地将路网划分为子网络,并在子网络之上构建一个树结构。然后我们将移动的对象与 vTree 中最近的顶点相关联。当更新对象的位置时,我们只需要更新从相应的叶节点到根节点的路径上的树节点。通过对路网中大量不相关顶点的修剪,我们设计了一种新的基于 vTree 的 kNN 搜索算法。在实际数据集上的实验结果表明,我们的方法显著地优于 baseline 方法 2-3 个数量级。

【单词】
constraints:限制,约束;
salient:显著的,突出的;
iteratively:adj. 迭代的、反复的;
partition:vt. 分割、分割、区分;n. 划分,分开;
sub-networks:n. 子网络;


Sweet KNN: An Efficient KNN on GPU through Reconciliation of Redundancy and Regularity
Sweet-KNN:通过冗余去除与规律性的协调,使用 GPU 实现的一种的高效 KNN

Abstract—Finding the k nearest neighbors of a query point or a set of query points (KNN) is a fundamental problem in many application domains. It is expensive to do. Prior efforts in improving its speed have followed two directions with conflicting considerations: One tries to minimize the redundant distance computations but often introduces irregularities into computations, the other tries to exploit the regularity in computations to best exert the power of GPU-like massively parallel processors, which often introduces even extra distance computations. This work gives a detailed study on how to effectively combine the strengths of both approaches. It manages to reconcile the polar opposite effects of the two directions through elastic algorithmic designs, adaptive runtime configurations, and a set of careful implementation-level optimizations. The efforts finally lead to a new KNN on GPU named Sweet KNN, the first high-performance triangular-inequality-based KNN on GPU that manages to reach a sweet point between redundancy minimization and regularity preservation for various datasets. Experiments on a set of datasets show that Sweet KNN outperforms existing GPU implementations on KNN by up to 120X (11X on average).

摘要–查找 query 的一个或一组查询点的 k 个最近邻居 (KNN 问题),是许多应用域的基本问题。这是很昂贵的。之前的努力在提高它的速度,并伴随着冲突的考虑,主要由两个方向: 一个试图最小化冗余的距离计算,但经常引入不规范的计算, 另一种方法试图探索规律计算,以发挥出例如 GPU 的大规模并行处理器的最大能力, 这类并行处理器经常引入了额外的距离计算。这项工作详细地研究了如何有效地结合两种方法的优点。它通过弹性算法设计、自适应运行时配置和一组精心的实现级别优化来协调两个方向的极性相反效果。这些努力最终导致了在 GPU 上一个名为 Sweet KNN 的新 KNN 的开发,这是 GPU 上第一个高性能的三角不平等的 KNN,它试图在冗余最小化和各种数据集的规则保存之间达到了一个平衡点。在一组数据集上的实验表明,Sweet KNN 比已有的 GPU 实现在 KNN 上的性能要高出 120 倍 (平均 11 倍)。

【单词】:
reconciliation:和解、调和、和谐、甘愿;
regularity:规律性;
redundancy:冗余;
irregularities:不齐,不规则,不法行为;
reconcile:使一致,使和解,调停,调解,使顺从;
triangular:三角的,三角形的;


Secure Skyline Queries on Cloud Platform
在云平台上安全的 skyline 查询

Abstract—Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multicriteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semanticallysecure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.

将数据和计算外包给云服务器,为支持大规模数据存储和查询处理提供了一种具有成本效益的方式。然而,由于安全和隐私问题,敏感数据 (例如,医疗记录) 需要从云服务器和其他未经授权的用户那里得到保护。一种方法是将加密数据外包给云服务器,让云服务器只对加密数据执行查询处理。以安全有效的方式支持加密数据上的各种查询仍然是一项具有挑战性的任务,以便云服务器不了解数据、查询和查询结果。本文研究了加密数据的安全天际线查询问题。skyline 查询对于多准则决策尤其重要,但由于其复杂的计算,也带来了重大挑战。我们提出了一种使用半安全加密对数据进行加密的完全安全的 skyline 查询协议。作为一项关键子例程,我们提出了一种新的安全优势协议,它也可以用作其他查询的构建块。最后,我们提供了串行和并行的实现,并在不同的参数设置下对协议的效率和可扩展性进行了实证研究,验证了我们提出的解决方案的可行性。

【单词】
secure:adj. 安全的,无虑的;

1.7 Research Session 4A: New Hardware

Accelerating multi-column selection predicates in main-memory - the Elf approach
在主内存中加速多列选择谓词——Elf 方法


Revisiting the Design of Data Stream Processing Systems on Multi-Core Processors

DIDO: Dynamic Pipelines for In-Memory Key-Value Stores on Coupled CPU-GPU Architectures

On Log-Structured Merge for Solid-State Drives

1.8 Research Session 4B: Security and Encryption

Adaptively Secure Conjunctive Query Processing over Encrypted Data for Cloud Computing
对云计算加密数据进行自适应安全的联合查询处理


Towards a unifying Attribute Based Access Control approach for NoSQL datastores
面向基于属性统一的 NoSQL 数据存储访问控制方法

Abstract—NoSQL datastores allow the efficient management of high volumes of heterogeneous and unstructured data, meeting the requirements of a variety of today ICT applications. However, most of these systems poorly support data security, and recent surveys show that their simplistic support for data protection is considered as a reason not to use them.1 In recent years, Attribute Based Access Control (ABAC) is getting more and more popularity, for its ability to provide highly flexible and customized forms of data protection at different granularity levels. In the current work, with the aim to raise users’ confidence in the protection of data managed by NoSQL systems, we define a general approach to enforce ABAC within NoSQL systems. Our approach relies on SQL++[20], a unifying query language for NoSQL platforms. In particular, we develop a novel SQL++ query rewriting mechanism able to enforce heterogeneous types of ABAC policies specified up to cell level. Experimental results show an overhead which is not negligible for policies covering high percentage of the fields characterizing the protected documents, but which is far more contained when field level policies are more sparsely specified.

摘要 - nosql 数据存储允许高效地管理大量异构和非结构化数据,满足当今各种 ICT 应用的需求。然而,这些系统中的大多数都不支持数据安全性,最近的调查显示,它们对数据保护的简单支持被认为是不使用它们的一个原因。近年来,基于属性的访问控制(ABAC)越来越受欢迎,因为它能够在不同的粒度级别上提供高度灵活和定制的数据保护形式。在目前的工作中,为了提高用户对 NoSQL 系统管理的数据保护的信心,我们定义了在 NoSQL 系统中实施 ABAC 的通用方法。我们的方法依赖于 SQL++[20],这是一种用于 NoSQL 平台的统一查询语言。特别是,我们开发了一种新的 SQL + 查询重写机制,该机制可以增强到单元级指定的 ABAC 策略的异构类型。实验结果表明,对于覆盖描述受保护文档的高百分比字段的策略来说,这种开销是不可忽略的,但当字段级策略指定得更少时(稀疏),这种开销就会更大。
【单词】:
simplistic:adj. 过分简单化的,过分单纯化的;
customized:adj. 自定义,自定义级别的; v. 定制,按特别订货生产;
granularity:间隔尺寸;
unifying:v. 使统一;
mechanism:机制,原理,途径,进程;
heterogeneous:adj. 异种的,多相的;
specified up to:指定到;
overhead:在头顶上,在空中;
negligible:adj. 微不足道的,可以忽略的;
sparsely:adv. 稀疏地,贫乏地;
specified:adj. 规定的,详细说明的;v. 指定,详细说明;


Frequency-hiding Dependency-preserving Encryption for Outsourced Databases
外包数据库的频率隐藏的依赖性保护加密


Secure and Efficient Query Processing over Hybrid Clouds
安全高效的混合云查询处理

Capturing the Moment: Lightweight Similarity Computations
捕捉瞬间: 轻量级的相似计算


An Efficient Framework for Exact Set Similarity Search using Tree Structure Indexes
使用树结构索引进行精确集相似性搜索的有效框架


Role Discovery in Graphs using Global Features: Algorithms, Applications and a Novel Evaluation Strategy
图中使用全局特征的角色发现: 算法、应用和新的评估策略。


Similarity Search in Graph Databases: A Multi-layered Indexing Approach
图数据库中的相似性搜索: 一种多层索引方法

1.10 Research Session 6A: Pot Pourri

Posterior Snapshot Isolation
后快照隔离


PrivSuper: a Superset-First Approach to Frequent Itemset Mining under Differential Privacy
PrivSuper: 一种超集优先方法,用于在不同的隐私条件下频繁挖掘项目集


Quantifying Differential Privacy under Temporal Correlations
时间相关性下的差异隐私量化
【单词】
Quantifying:n. 定量法; v. 定量,量化;


Tracking matrix approximations over distributed sliding windows
分布滑动窗口上的跟踪矩阵逼近


1.11 Research Session 6B: Social Networks

Temporal Influence Blocking: Minimizing the Effect of Misinformation in Social Networks
时间影响阻塞: 最小化社交网络中错误信息的影响


Complex Event-Participant Planning and Its Incremental Variant
复杂事件参与者计划及其增量变体


Most Influential Community Search over Large Social Networks
在大型社交网络上最具影响力的社区搜索


Boosting Information Spread: An Algorithmic Approach
促进信息传播: 一种算法方法


1.12 Research Session 7A: Data Cleaning

Cleaning Data with Forbidden Itemsets

Parallel Progressive Approach to Entity Resolution Using MapReduce

A Collective, Probabilistic Approach to Schema Mapping

Cleaning Relations using Knowledge Bases

1.13 Research Session 7B: Learning and Outlier Detection

Time Series Classification by Sequence Learning in All-Subsequence Space

Multi-tactic Distance-based Outlier Detection

Link Prediction across Aligned Networks with Sparse and Low Rank Matrix Estimation

LSHiForest: A Generic Framework for Fast Tree Isolation based Ensemble Anomaly Analysis

1.14 Research Session 8A: Crowdsourcing and Recommender Systems 众包和推荐系统

Prediction-Based Task Assignment in Spatial Crowdsourcing
基于预测的空间众包任务分配


Trichromatic Online Matching in Real-time Spatial Crowdsourcing
实时空间众包中的三色在线匹配


Tuning Crowdsourced Human Computation
调优众包人工计算


Scalable and interpretable product recommendations via overlapping co-clustering
通过重叠的协聚来扩展和解释产品推荐

1.15 Research Session 8B: Distributed Processing 分布计算

In-memory Distributed Matrix Computation Processing and Optimization
内存分布式矩阵的计算处理和优化


Fast and Scalable Distributed Set Similarity Joins for Big Data Analytics
快速和可伸缩的分布式集相似连接用于大数据分析


Fast and Scalable Distributed Boolean Tensor Factorization
快速可伸缩的分布式布尔张量分解


Spinner: Scalable Graph Partitioning in the Cloud
Spinner: 云中的可伸缩图形分区


Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream
在空间文本数据流上的分布式发布 / 订阅查询处理

你可能感兴趣的:(paper)