为了开始流量分析和分类的研究课题,先调研相关学术会议论文,调查目前研究现状,从CCF推荐学术会议中搜集到了以下有关流量分类和流量分析的有关会议论文
INFOCOM 2019(IEEE Conference on Computer Communications)B类
A large-scale analysis of deployed traffic differentiation practices
大规模分析已部署的流量区分做法
SIGCOMM '19 Proceedings of the ACM Special Interest Group on Data Communication |
Pages 130-144 |
Authors: |
Fangfan Li |
Northeastern University |
Arian Akhavan Niaki |
University of Massachusetts Amherst |
|
David Choffnes |
Northeastern University |
|
Phillipa Gill |
University of Massachusetts Amherst |
|
Alan Mislove |
Northeastern University |
摘要:
Net neutrality has been the subject of considerable public debate over the past decade. Despite the potential impact on content providers and users, there is currently a lack of tools or data for stakeholders to independently audit the net neutrality policies of network providers. In this work, we address this issue by conducting a one-year study of content-based traffic differentiation policies deployed in operational networks, using results from 1,045,413 crowdsourced measurements conducted by 126,249 users across 2,735 ISPs in 183 countries/regions. We develop and evaluate a methodology that combines individual per-device measurements to form high-confidence, statistically significant inferences of differentiation practices, including fixed-rate bandwidth limits (i.e., throttling) and delayed throttling practices. Using this approach, we identify differentiation in both cellular and WiFi networks, comprising 30 ISPs in 7 countries. We also investigate the impact of throttling practices on video streaming resolution for several popular video streaming providers.
在过去的十年中,网络中立一直是公开辩论的主题。尽管对内容提供商和用户有潜在的影响,但是目前缺乏使利益相关者独立审核网络提供商的网络中立性策略的工具或数据。在这项工作中,我们通过对183个国家/地区的2735家ISP的126,249位用户进行的1,045,413众包测量得出的结果,对运营网络中部署的基于内容的流量区分策略进行了为期一年的研究。我们开发并评估了一种方法,该方法结合了每个设备的单个测量结果,以形成对差异化实践(包括固定速率带宽限制(即节流)和延迟节流实践)的高置信度,统计上有意义的推断。使用这种方法,我们确定了蜂窝网络和WiFi网络之间的差异,其中包括7个国家/地区的30个ISP。我们还研究了几种流行的视频流提供商的节流实践对视频流分辨率的影响。
https://dl.acm.org/citation.cfm?id=3342092
INFOCOM 2019(IEEE Conference on Computer Communications)B类
Transaction Clustering Using Network Traffic Analysis for Bitcoin and Derived Blockchains
使用网络流量分析对比特币和衍生区块链进行交易聚类
IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)
Date of Conference: 29 April-2 May 2019
Conference Location: Paris, France, France
IEEE Keywords
Peer-to-peer computing,Bitcoin,Blockchain,IP networks,Privacy,Clustering algorithms
Authors:
Alex Biryukov University of Luxembourg,
Sergei Tikhomirov University of Luxembourg
摘要:
Bitcoin is a decentralized digital currency introduced in 2008 and launched in 2009. Bitcoin provides a way to transact without any trusted intermediary, but its privacy guarantees are questionable, and multiple deanonymization attacks have been proposed. Cryptocurrency privacy research has been mostly focused on blockchain analysis, i.e., extracting information from the transaction graph. We focus on another vector for privacy attacks: network analysis. We describe the message propagation mechanics in Bitcoin and propose a novel technique for transaction clustering based on network traffic analysis. We show that timings of transaction messages leak information about their origin, which can be exploited by a well connected adversarial node. We implement and evaluate our method in the Bitcoin testnet with a high level of accuracy, deanonymizing our own transactions issued from a desktop wallet (Bitcoin Core) and from a mobile (Mycelium) wallet. Compared to existing approaches, we leverage the propagation information from multiple peers, which allows us to overcome an anti-deanonymization technique (“diffusion”) used in Bitcoin.
比特币是一种分散的数字货币,于2008年推出并于2009年推出。比特币提供了一种无需任何可信任的中介机构即可进行交易的方式,但其隐私保证存在问题,并且已经提出了多种去匿名化攻击。加密货币隐私研究主要集中在区块链分析上,即从交易图中提取信息。我们专注于隐私攻击的另一种媒介:网络分析。我们描述了比特币中的消息传播机制,并提出了一种基于网络流量分析的交易聚类新技术。我们证明了交易消息的计时泄露了有关其起源的信息,可以由一个连接良好的对抗节点来利用。我们在比特币测试网中高度准确地实施和评估我们的方法,对从台式机钱包(Bitcoin Core)和移动(Mycelium)钱包发出的交易进行匿名处理。与现有方法相比,我们利用了来自多个对等方的传播信息,这使我们能够克服比特币中使用的反去匿名化技术(“扩散”)。
https://ieeexplore.ieee.org/document/8845213
INFOCOM 2019(IEEE Conference on Computer Communications)B类
Early Online Classification of Encrypted Traffic Streams using Multi-fractal Features
使用多重分形特征对加密流量进行早期在线分类
IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)
IEEE Keywords
Feature extraction,Mathematical model,Time series,analysis,Cryptography,Fractals,
Quality of experience,Streaming media
Author:
Erik Areström ; Niklas Carlsson
摘要:
Timely and accurate flow classification is important for identifying flows with different service requirements, optimized network management, and for helping network operators simultaneously operate networks at higher utilization while providing end users good quality of experience (QoE). With most services starting to use end-to-end encryption (HTTPS and QUIC), traditional Deep Packet Inspection (DPI) and port-based approaches are no longer applicable. Furthermore, most flow-level-based approaches ignore the complex non-linear characteristics of internet traffic (e.g., self similarity). To address this challenge, in this paper, we present and evaluate a classification framework that combines multi-fractal feature extraction based on time series data (which captures these non-linear characteristics), principal component analysis (PCA) based feature selection, and man-in-the-middle (MITM) based flow labeling. Our detailed evaluation shows that the method is able to quickly and effectively classify traffic belonging to the six most popular traffic types (video streaming, web browsing, social networking, audio communication, text communication, and bulk download) and to distinguish between video-on-demand (VoD) and live streaming sessions delivered from the same services. Our results show that good accuracy can be achieved with only information about the timing of the packets within a flow.
及时准确的流分类对于识别具有不同服务需求的流,优化网络管理以及帮助网络运营商以更高的利用率同时操作网络,同时为最终用户提供良好的体验质量(QoE)至关重要。随着大多数服务开始使用端到端加密(HTTPS和QUIC),传统的深度包检查(DPI)和基于端口的方法不再适用。此外,大多数基于流级别的方法都忽略了互联网流量的复杂非线性特征(例如,自相似性)。为了解决这一挑战,在本文中,我们提出并评估了一个分类框架,该框架结合了基于时间序列数据(捕获这些非线性特征)的多重分形特征提取,基于主成分分析(PCA)的特征选择,以及基于中间人(MITM)的流标记。我们的详细评估表明,该方法能够快速有效地对属于六种最流行流量类型(视频流,Web浏览,社交网络,音频通信,文本通信和批量下载)的流量进行分类,并区分视频开启需求(VoD)和通过相同服务传递的实时流会话。我们的结果表明,仅通过有关流中数据包时序的信息就可以实现良好的准确性。音频通信,文本通信和批量下载),并区分从同一服务传递的视频点播(VoD)和实时流会话。我们的结果表明,仅通过有关流中数据包时序的信息就可以实现良好的准确性。音频通信,文本通信和批量下载),并区分从同一服务传递的视频点播(VoD)和实时流会话。我们的结果表明,仅通过有关流中数据包时序的信息就可以实现良好的准确性.
https://ieeexplore.ieee.org/document/8845127
INFOCOM 2018(IEEE Conference on Computer Communications)B类
Analysis of malicious flows via SIS epidemic model in CCN
通过CCN中的SIS流行病模型分析恶意流
IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)
Conference Location: Honolulu, HI, USA
IEEE Keywords
Mathematical model,Conferences,Computational modeling,Floods,Edge computing,
Next generation networking,Analytical models
Author(s)
Weihong Yang ; Yang Qin ; Yuanyuan Yang
摘要:
Content Centric Networking (CCN) is a novel network architecture that attempts to overcome the limitations of today's network. Security is supported fundamentally by CCN. In this paper, the spreading of malicious flows is modelled via a modified N-intertwined epidemic model. We consider three types of malicious flows: Interest flooding attack flow, content poisoning flow, and rumor. To the best of our knowledge, this paper is a first attempt to apply epidemic model to flow analysis in CCN. We introduce a forwarding matrix to the original epidemic model, which can characterize the forwarding strategy chosen by nodes. Based on modified epidemic model, we derive the upper bound for the number of nodes that affected by malicious flows, and conclude that forwarding strategy can affect the spreading of malicious flows. Matlab-based study and packet-level simulation are performed to verify our model, and the results show that spreading of malicious flows is related to forwarding strategy, and our epidemic model can better characterize malicious flows than original model.
以内容为中心的网络(CCN)是一种新颖的网络体系结构,它试图克服当今网络的局限性。CCN从根本上支持安全性。在本文中,通过改进的N交错流行模型对恶意流的传播进行建模。我们考虑了三种类型的恶意流:兴趣泛滥攻击流,内容中毒流和谣言。据我们所知,本文是将流行病模型应用于CCN流量分析的首次尝试。我们在原始的流行模型中引入了一个转发矩阵,该矩阵可以描述节点选择的转发策略。基于改进的流行病模型,我们得出了受恶意流影响的节点数的上限,并得出结论,转发策略会影响恶意流的传播。
https://ieeexplore.ieee.org/document/8406860
INFOCOM 2018(IEEE Conference on Computer Communications)B类
Mining Long-Term Stealthy User Behaviors on High Speed Links
在高速链路上挖掘长期的秘密用户行为
IEEE INFOCOM 2018 - IEEE Conference on Computer Communications
Conference Location: Honolulu, HI, USA
IEEE Keywords
Monitoring,IP networks,Data collection,Probabilistic logic,Conferences,
Information filtering
Author(s):
Pinghui Wang ; Peng Jia ; Jing Tao ; Xiaohong Guan
View All Authors
摘要:
Mining user behaviors over high speed links is important for applications such as network anomaly detection. Previous work focuses on monitoring anomalies such as extremely frequent users occurring in a short timeslot such as 1 minute. Little attention has been paid to detect users with stealthy behaviors such as persistent frequent and co-occurrence behaviors over a long period of time at the timeslot granularity (e.g., 1 minute granularity level). Unlike frequent users, persistent users do not necessarily occur more frequently than other users in a single timeslot, but persist and occur in a larger number of timeslots. Due to limited computation and storage resources on routers, it is prohibitive to collect massive network traffic in a long period of time. We develop an end-to-end method for solving challenges in both long-term online traffic collection and offline user behavior analysis. To achieve this goal, we design a user embedding (UE) method to fast build compact sketches of user-occurrence events over time. To reduce the estimation error introduced by Bloom Filter, we model UE as a sampling method and propose methods to accurately mine a variety of user behaviors from user-occurrence events rebuilt from UE sketches. In addition, we introduce another new embedding method reversible UE (RUE) to detect persistent frequent behaviors when monitored users' IDs are not given in advance for offline analysis. We conduct extensive experiments on real-world traffic, and the results demonstrate that our methods significantly outperform state-of-the-art methods.
通过高速链路挖掘用户行为对于诸如网络异常检测之类的应用很重要。先前的工作重点是监视异常,例如在短时间段(例如1分钟)内发生的非常频繁的用户。在时隙粒度(例如1分钟粒度级别)上,很少注意检测具有隐身行为(例如,长时间持续存在的频繁出现的行为和共现行为)的用户。与频繁用户不同,持久性用户不一定在单个时隙中比其他用户更频繁地出现,而是在多个时隙中持续存在。由于路由器上的计算和存储资源有限,因此禁止长时间收集大量网络流量。我们开发了一种端到端方法来解决长期在线流量收集和离线用户行为分析中的挑战。为了实现此目标,我们设计了一种用户嵌入(UE)方法,以快速构建随时间推移的用户发生事件的紧凑草图。为减少Bloom Filter引入的估计误差,我们将UE建模为一种采样方法,并提出了从UE草图重建的用户出现事件中准确挖掘各种用户行为的方法。此外,我们引入了另一种新的嵌入方法可逆UE(RUE),以在未预先提供受监视用户ID进行脱机分析时检测持久性频繁行为。我们对现实世界的流量进行了广泛的实验,结果表明,我们的方法明显优于最新方法。
https://ieeexplore.ieee.org/document/8485858
INFOCOM2018(IEEE Conference on Computer Communications)A类
Can We Learn what People are Doing from Raw DNS Queries?
我们可以从原始DNS查询中了解人们在做什么吗?
IEEE INFOCOM 2018 - IEEE Conference on Computer Communications
IEEE Keywords:
Conferences,Monitoring,Forestry,Google,Training,Internet,Encryption
Authors:
Jianfeng Li
MOE KLINNS Lab, Xi'an Jiaotong University, Xi'an, China
Xiaobo Ma
MOE KLINNS Lab, Xi'an Jiaotong University, Xi'an, China
Li Guodong
MOE KLINNS Lab, Xi'an Jiaotong University, Xi'an, China
Xiapu Luo
Department of Computing, The Hong Kong Polytechnic University, Hong Kong
Junjie Zhang
Department of Computer Science and Engineering, Wright State University, Dayton, USA
Wei Li
MOE KLINNS Lab, Xi'an Jiaotong University, Xi'an, China
Xiaohong Guan
MOE KLINNS Lab, Xi'an Jiaotong University, Xi'an, China
摘要:
Domain Name System (DNS) is one of the pillars of today's Internet. Due to its appealing properties such as low data volume, wide-ranging applications and encryption free, DNS traffic has been extensively utilized for network monitoring. Most existing studies of DNS traffic, however, focus on domain name reputation. Little attention has been paid to understanding and profiling what people are doing from DNS traffic, a fundamental problem in the areas including Internet demographics and network behavior analysis. Consequently, simple questions like “How to determine whether a DNS query for www.google.com means searching or any other behaviors?” cannot be answered by existing studies. In this paper, we take the first step to identify user activities from raw DNS queries. We advance a multiscale hierarchical framework to tackle two practical challenges, i.e., behavior ambiguity and behavior polymorphism. Under this framework, a series of novel methods, such as pattern upward mapping and multi-scale random forest classifier, are proposed to characterize and identify user activities of interest. Evaluation using both synthetic and real-world DNS traces demonstrates the effectiveness of our method.
域名系统(DNS)是当今Internet的支柱之一。由于其吸引人的特性,例如数据量低,应用范围广泛和无需加密,DNS流量已被广泛用于网络监视。但是,大多数现有的DNS流量研究都集中在域名信誉方面。很少有人关注通过DNS流量来了解和分析人们的行为,DNS流量是Internet人口统计和网络行为分析等领域的基本问题。因此,现有研究无法回答诸如“如何确定对www.google.com的DNS查询是否意味着搜索或任何其他行为?”之类的简单问题。在本文中,我们迈出了从原始DNS查询中识别用户活动的第一步。我们提出了一个多尺度的层次结构框架,以解决两个实际挑战,即行为歧义和行为多态性。在此框架下,提出了一系列新颖的方法,如模式向上映射和多尺度随机森林分类器,以表征和识别感兴趣的用户活动。使用合成的和真实的DNS跟踪进行的评估证明了我们方法的有效性。
https://ieeexplore.ieee.org/document/8486210
IMC2017(Proceedings of the 2017 Internet Measurement Conference)B类
Detection, classification, and analysis of inter-domain traffic with spoofed source IP addresses
具有欺骗性源IP地址的域间流量的检测,分类和分析
IMC '17 Proceedings of the 2017 Internet Measurement Conference |
Pages 86-99 |
Authors:
Franziska LichtblauTU Berlin,Florian StreibeltTU BerlinThorben KrügerTU BerlinPhilipp RichterTU BerlinAnja FeldmannTU Berlin
摘要:
IP traffic with forged source addresses (i.e., spoofed traffic) enables a series of threats ranging from the impersonation of remote hosts to massive denial-of-service attacks. Consequently, IP address spoofing received considerable attention with efforts to either suppress spoofing, to mitigate its consequences, or to actively measure the ability to spoof in individual networks. However, as of today, we still lack a comprehensive understanding both of the prevalence and the characteristics of spoofed traffic "in the wild" as well as of the networks that inject spoofed traffic into the Internet.
In this paper, we propose and evaluate a method to passively detect spoofed packets in traffic exchanged between networks in the inter-domain Internet. Our detection mechanism identifies both source IP addresses that should never be visible in the inter-domain Internet (i.e., unrouted and bogon sources) as well as source addresses that should not be sourced by individual networks, as inferred from BGP routing information. We apply our method to classify the traffic exchanged between more than 700 networks at a large European IXP. We find that the majority of connected networks do not, or not consistently, filter their outgoing traffic. Filtering strategies and contributions of spoofed traffic vary heavily across networks of different types and sizes. Finally, we study qualitative characteristics of spoofed traffic, regarding both application popularity as well as structural properties of addresses. Combining our observations, we identify and study dominant attack patterns.
具有伪造源地址的IP流量(即欺骗性流量)会带来一系列威胁,从模拟远程主机到大规模拒绝服务攻击。因此,IP地址欺骗在抑制欺骗,减轻其后果或积极衡量单个网络中的欺骗能力方面受到了极大的关注。但是,直到今天,我们仍然对“在野外”的欺骗性流量以及将欺骗性流量注入Internet的网络的流行和特征都缺乏全面的了解。
在本文中,我们提出并评估了一种在域间Internet中的网络之间交换的流量中被动检测欺骗性数据包的方法。我们的检测机制既可以识别在域间Internet中永远不应该看到的源IP地址(即,未路由的源和Bogon源),也可以识别不应该由单个网络提供的源地址(如BGP路由信息所推断)。我们使用我们的方法对大型欧洲IXP上700多个网络之间交换的流量进行分类。我们发现,大多数连接的网络无法或不一致地过滤其传出流量。跨不同类型和规模的网络,过滤策略和欺骗流量的贡献差异很大。最后,我们研究了欺骗性流量的质量特征,关于应用程序的普及以及地址的结构属性。结合我们的观察,我们确定并研究主要的攻击模式。
https://dl.acm.org/citation.cfm?id=3131367
ICNP2017(International Conference on Network Protocols)B类
Spatio-temporal analysis and prediction of cellular traffic in metropolis
都市手机流量的时空分析与预测
2017 IEEE 25th International Conference on Network Protocols (ICNP)
Date of Conference: 10-13 Oct. 2017
Conference Location: Toronto, ON, Canada
IEEE Keywords
Poles and towers,Mobile communication,Urban areas,
Computer architecture,Monitoring,Predictive models,Mobile handsets
Authors:
Xu Wang
School of Software and TNList, Tsinghua University
Zimu Zhou
Computer Engineering and Networks Laboratory, ETH Zurich
Zheng Yang
School of Software and TNList, Tsinghua University
Yunhao Liu
School of Software and TNList, Tsinghua University
Chunyi Peng
Dept. CSE, The Ohio State University
Xu Wang
School of Software and TNList, Tsinghua University
Zimu Zhou
Computer Engineering and Networks Laboratory, ETH Zurich
Zheng Yang
School of Software and TNList, Tsinghua University
Yunhao Liu
School of Software and TNList, Tsinghua University
Chunyi Peng
Dept. CSE, The Ohio State University
摘要:
Understanding and predicting cellular traffic at large-scale and fine-granularity is beneficial and valuable to mobile users, wireless carriers and city authorities. Predicting cellular traffic in modern metropolis is particularly challenging because of the tremendous temporal and spatial dynamics introduced by diverse user Internet behaviours and frequent user mobility citywide. In this paper, we characterize and investigate the root causes of such dynamics in cellular traffic through a big cellular usage dataset covering 1.5 million users and 5,929 cell towers in a major city of China. We reveal intensive spatio-temporal dependency even among distant cell towers, which is largely overlooked in previous works. To explicitly characterize and effectively model the spatio-temporal dependency of urban cellular traffic, we propose a novel decomposition of in-cell and inter-cell data traffic, and apply a graph-based deep learning approach to accurate cellular traffic prediction. Experimental results demonstrate that our method consistently outperforms the state-of-the-art time-series based approaches and we also show through an example study how the decomposition of cellular traffic can be used for event inference.
对移动用户,无线运营商和城市当局而言,大规模并细粒度地了解和预测蜂窝通信量是有益且有价值的。由于各种用户互联网行为和全市范围内频繁的用户移动性引入了巨大的时空动态,因此预测现代大都市中的蜂窝通信量尤其具有挑战性。在本文中,我们通过一个覆盖中国主要城市的150万用户和5,929个手机信号塔的大型蜂窝使用数据集,表征和研究了这种动态的蜂窝通信流量的根本原因。即使在遥远的蜂窝塔之间,我们也揭示了强烈的时空依赖性,这在以前的工作中基本上被忽略了。为了明确表征和有效建模城市蜂窝交通的时空依赖性,我们提出了一种新的单元内和单元间数据流量分解方法,并将基于图的深度学习方法应用于准确的蜂窝流量预测。实验结果表明,我们的方法始终优于基于最新时间序列的方法,并且还通过示例研究显示了如何将蜂窝流量分解用于事件推理。
https://ieeexplore.ieee.org/document/8117559