近两个星期,我大致浏览了2016年有关数据挖掘的几个会议文章的标题KDD,IJCAI,WWW,ICDM。通过阅读文献的标题与部分文章的摘要,大致了解文献的意图,并按照文献的工作内容分类文章,
浏览统计结果:
下面以关键字表格的形式列出总结结果(注:我只统计经常出现的一些关键词,并不是全部):
表1:IJCAI-16文献统计(共551篇) |
||||
关键字 |
有关文献数 |
细分成分 |
关键字 |
有关文献数 |
planing |
23 |
classical planning4 |
multi-agent |
7 |
knowledge |
23 |
Knowledge base 6 , knowledge graph 5 , knowledge acquisition 2 |
matrix factorization |
7 |
representation |
21 |
|
topic |
6 |
recognition |
20 |
表情,人类活动,行为,物体,感情,面部,群体活动 |
large scale |
6 |
Graph |
19 |
|
track |
5 |
games 游戏 |
19 |
|
social network |
5 |
feature |
19 |
feature learning 7,feature selection 6 |
sampling |
5 |
semantic |
18 |
|
regression |
5 |
Neural Network |
17 |
|
Natural Language |
5 |
Clustering |
17 |
|
multi-task |
5 |
query |
15 |
query answering 5 |
deep learning |
5 |
detection |
14 |
community detection 2 ,anomaly detection 1 |
retrieval |
4 |
Recommendation |
13 |
|
ranking |
4 |
classification |
12 |
|
person re-identification |
4 |
robot |
11 |
|
coding |
4 |
Bayesian |
11 |
|
time series |
3 |
tree |
10 |
Mote-Clarlo tree search 4 |
probabilistic matrix |
3 |
text |
8 |
|
random forest |
2 |
hashing |
8 |
|
dictionary learning |
2 |
filtering |
8 |
collaborative filtering 7 |
advertisements |
2 |
multi-view |
7 |
|
|
|
表2:WWW-16(共100篇左右) |
|||||
关键字 |
相关文献数 |
关键字 |
相关文献数 |
关键字 |
相关文献数 |
web(web search, web cookies, web shells, web tables, web tracking, web queries ,web application) |
17 |
topic |
4 |
knowledge bases |
2 |
recommendation |
10 |
search engines |
4 |
clustering |
1 |
detection |
9 |
advertisement |
3 |
representation |
1 |
query |
7 |
filter |
3 |
Bayesian |
2 |
mobile |
6 |
large-scale |
3 |
documents |
3 |
text |
2 |
bayain |
2 |
graph |
1 |
semantic |
4 |
track |
2 |
|
|
表3:KDD-16(共208篇) |
|||||
关键字 |
相关文献数 |
关键字 |
相关文献数 |
关键字 |
相关文献数 |
Graph |
12 |
document |
4 |
hashing |
2 |
recommendation |
10 |
neural network |
4 |
multi-task |
2 |
clustering |
9 |
social network |
4 |
multi-view |
2 |
optimization |
9 |
text |
4 |
query |
2 |
feature |
8 |
classification |
3 |
regression |
2 |
large scale |
8 |
sampling |
3 |
semantic |
2 |
feature |
7 |
topic |
3 |
community detection |
1 |
anomaly |
6 |
attributed network |
2 |
matrix completion |
1 |
rank |
5 |
filtering |
2 |
pagerank |
1 |
表4:ICDM-16(共201篇 含Demonstrations) |
|||||
关键字 |
有关文献数 |
关键字 |
有关文献数 |
关键字 |
有关文献数 |
graph(graph Decomposition , supergraph serach , communyity Detection, knowledge graph) |
27 |
feature |
4 |
tree |
3 |
query |
14 |
semantic |
4 |
advertisement |
3 |
clustering |
13 |
text |
4 |
filtering |
2 |
KNN |
7 |
hashing |
3 |
topic |
2 |
recommendation |
6 |
large-scale |
3 |
factorization |
1 |
classification |
5 |
neural network |
3 |
multi-task |
1 |
group search |
5 |
sampling |
3 |
mults-view |
1 |
social network |
5 |
|
|
|
|
有关发现与理解:
从这些表格中,我发现大家主要在做的可以分为几个大方面:
1.features 2.clustering 3.classification 4.detection 5.representation 6recognition 7.planning(这个我现在还不是很理解是什么)
当然 还有:
8.filtering 9.query (这个也不是很理解是什么) 10.regression 11 ranking 12 sampling 13 track
有关应用方面:
做的最多的还是1.recommendation 广告推荐 新闻推荐 ;
其次还有 2.social network 3. topic 主题分析;4 search
当然数据挖掘的应用面很广,我在阅读时记录下来的应用范围就有如下:
出租车 广告业 新闻业 商务业 医疗 广播业 交通 投资贷款 城市规划 生物基因 经济
电子商务 诉讼 社交网络 物理 学校评估 数据中心储存盘替换 警务 法律 体育(足球)航空
邮件处理 招聘 抓小偷 房地产 火灾 说唱歌词 猎头人才检测 工作推荐 在线采购 病毒检测
甚至政党领导人选举等等。
从统计中看,数据挖掘中用到的数据结构主要是:graph与tree,(感觉graph 很火的样子)。至于用到的方法 neural network 还是很热门,其次bayesian也有很多人用。hashing 也有不少人在讨论,矩阵操作也是有很多。
还有一些有意思的发现:
1.每个会议都有一些文章在讨论’large-scale’数据或者相关的问题。
2.multi-task multi-view multi-agent 在每个会议中也都会出现
问题与解决
在阅读的过程中也遇到了一些问题,看不懂在标题意思,甚至词语意思都不懂,(这些词语我都记录了下来)。对于感兴趣的文章我也记录下来,留有以后学习理解。
问题如下:
1.大尺寸数据处理越来越热?
2.连百度hr都可以发kDD
3.有这么多做航空公司的为什么没有做铁路公司的? 有多人做交通网络 公路交通网 包涵了铁路
Maximum Weight Clique Problem?
ASP?An ASP Semantics for Default Reasoning with Constraints
470
SMT?
Proving the Incompatibility of Efficiency and Strategyproofness via SMT Solving
How to Build Your Network? A Structural Analysis
Item Recommendation for Emerging Online Businesses
NP hard ?
bilief? embedding?
Out-of-Sample Data?
Distributed?
Heuristics ? 启发?
Directional Statistics?
Plan Recognition as Planning Revisited? planning?
logics? 逻辑?
Hierarchical?Hierarchical model Hierarchical planning
*-based?
query?
Temporal Graph?
Graph Streams?
join?
KNN?
Embedding?
time-seris? 时间序列分析研究的时间序列?
感兴趣的文章:
KDD: http://www.kdd.org/kdd2016/program/accepted-papers
A Real Linear and Parallel Multiple Longest Common Subsequences (MLCS) Algorithm
Taxi Driving Behavior Analysis in Latent Vehicle-to-Vehicle Networks: A Social Influence Perspective
Predicting Socio-Economic Indicators using News Events
Minimizing Legal Exposure for High-Tech Companies through Collaborative Filtering Methods
Identifying Decision Makers from Professional Social Networks
Catch Me If You Can: Detecting Pickpocket Suspects from Large-Scale Transit Records
Ranking Universities Based on Career Outcomes of Graduates
DopeLearning: A Computational Approach to Rap Lyrics Generation
Analyzing Volleyball Match Data from the 2014 World Championships Using Machine Learning Techniques
Collective Evolution Inference in Heterogeneous Information Networks
Lexis: An Optimization Framework for Discovering the Hierarchical Structure of Sequential Data
IJCAI:http://ijcai-16.org/index.php/welcome/view/accepted_papers
Multi-view Exclusive Unsupervised Dimension Reduction for Video-based Facial Expression Recognition
Dimensionally Guided Synthesis of Mathematical Word Problems
Truncating Shortest Path Search for Efficient Map-matching
Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations
Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition
A Neural Network for Document Summarization
Improving Top-N Recommendation with Heterogeneous Losses
Transductive Optimization of Top k Precision
Chinese Song Iambics Generation with Neural Attention-based Model
Derivative-Free Optimization of High-Dimensional Non-Convex Functions by Sequential Random Embeddings
Fear and Hope Emerge from Anticipation in Model-Based Reinforcement Learning
Driver Frustration Detection From Audio and Video in the Wild
Moving in a Crowd: Safe and Efficient Navigation among Heterogeneous Agents
Predictive models of malicious behavior in human negotiations
A Framework for Recommending Relevant and Diverse Items
Matrix Factorization+ for Movie Recommendation
Context-Aware Advertisement Recommendation for High-Speed Social News Feeding
A Novel Fast and Memory Efficient Parallel MLCS Algorithm for Longer and Large-Scale Sequences Alignments
A Framework for Enabling User Preference Profiling through Wi-Fi Logs
WWW:http://www2016.ca/2-home/72-accepted-papers.html
Immersive Recommendation: News and Event Recommendations Using Personal Digital Traces
Exploiting Green Energy to Reduce Operational Costs of Multi-Center Web Search Engines
Scaling up Dynamic Topic Models (jun zhu)
The Lifecycle and Cascade of Social Messaging Groups(jie tang)
Impact, Characteristics, and Detection of Wikipedia Hoaxes