写在前面:
作者: Yang Liu, Armin Sarabi, Jing Zhang, and Parinaz Naghizadeh, University of Michigan; Manish Karir, QuadMetrics, Inc.; Michael Bailey, University of Illinois at Urbana-Champaign; Mingyan Liu, University of Michigan and QuadMetrics, Inc.
论文来源: USENIX,2015
论文链接: https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-liu.pdf
Abstract: In this study we characterize the extent to which cyber security incidents, such as those referenced by Verizon in its annual Data Breach Investigations Reports (DBIR), can be predicted based on externally observable properties of an organization’s network. We seek to proactively forecast an organization’s breaches and to do so without cooperation of the organization itself. To accomplish this goal, we collect 258 externally measurable features about an organization’s network from two main categories: mismanagement symptoms, such as misconfigured DNS or BGP within a network, and malicious activity time series, which include spam, phishing, and scanning activity sourced from these organizations. Using these features we train and test a Random Forest (RF) classifier against more than 1,000 incident reports taken from the VERIS community database, Hackmageddon, and the Web Hacking Incidents Database that cover events from mid-2013 to the end of 2014. The resulting classifier is able to achieve a 90% True Positive (TP) rate, a 10% False Positive (FP) rate, and an overall 90% accuracy.
作者: Yufei Han and Matteo Dell’Amico
论文来源: CCS,2017
论文链接: https://dl.acm.org/doi/10.1145/3133956.3134022
Abstract: The current evolution of the cyber-threat ecosystem shows that no system can be considered invulnerable. It is therefore important to quantify the risk level within a system and devise risk prediction methods such that proactive measures can be taken to reduce the damage of cyber attacks. We present RiskTeller, a system that analyzes binary file appearance logs of machines to predict which machines are at risk of infection months in advance. Risk prediction models are built by creating, for each machine, a comprehensive profile capturing its usage patterns, and then associating each profile to a risk level through both fully and semi-supervised learning methods. We evaluate RiskTeller on a year-long dataset containing information about all the binaries appearing on machines of 18 enterprises. We show that RiskTeller can use the machine profile computed for a given machine to predict subsequent infections with the highest prediction precision achieved to date.
Contributions:
作者: Yun Shen, Enrico Mariconti, Pierre-Antoine Vervier and Gianluca Stringhini
论文来源: CCS,2018
论文链接: https://seclab.bu.edu/people/gianluca/papers/tiresias-ccs2018.pdf
Abstract: With the increased complexity of modern computer attacks, there is a need for defenders not only to detect malicious activity as it happens, but also to predict the specific steps that will be taken by an adversary when performing an attack. However this is still an open research problem, and previous research in predicting malicious events only looked at binary outcomes (eg. whether an attack would happen or not), but not at the specific steps that an attacker would undertake. To fill this gap we present Tiresias, a system that leverages Recurrent Neural Networks (RNNs) to predict future events on a machine, based on previous observations. We test Tiresias on a dataset of 3.4 billion security events collected from a commercial intrusion prevention system, and show that our approach is effective in predicting the next event that will occur on a machine with a precision of up to 0.93. We also show that the models learned by Tiresias are reasonably stable over time, and provide a mechanism that can identify sudden drops in precision and trigger a retraining of the system. Finally, we show that the long-term memory typical of RNNs is key in performing event prediction, rendering simpler methods not up to the task.
论文要点: 这篇论文对于安全事件的预测不仅仅是预测安全事件是否会发生(不是一个二分类任务),而是去预测在进行攻击时攻击者会采取的具体行动,比如在多步攻击中攻击者会使用的CVE,或者在早起的攻击发生时就可以评估攻击的潜在严重性。
本文作者按照安全事件发生的时间顺序建立安全事件序列,使用已知的安全事件序列来预测未来要发生的安全事件。
作者: Liu, Y., Zhang, J., Sarabi, A., Liu, M., Karir, M., Bailey, M.
论文来源: IWSPA,2015
论文链接: https://www.researchgate.net/publication/295351303_Predicting_Cyber_Security_Incidents_Using_Feature-Based_Characterization_of_Network-Level_Malicious_Activities
Abstract: This study offers a first step toward understanding the extent to which we may be able to predict cyber security incidents (which can be of one of many types) by applying machine learning techniques and using externally observed malicious activities associated with network entities, including spamming, phishing, and scanning, each of which may or may not have direct bearing on a specific attack mechanism or incident type. Our hypothesis is that when viewed collectively, malicious activities originating from a network are indicative of the general cleanness of a network and how well it is run, and that furthermore, collectively they exhibit fairly stable and thus predictive behavior over time. To test this hypothesis, we utilize two datasets in this study: (1) a collection of commonly used IP address-based/host reputation blacklists (RBLs) collected over more than a year, and (2) a set of security incident reports collected over roughly the same period. Specifically, we first aggregate the RBL data at a prefix level and then introduce a set of features that capture the dynamics of this aggregated temporal process. A comparison between the distribution of these feature values taken from the incident dataset and from the general population of prefixes shows distinct differences, suggesting their value in distinguishing between the two while also highlighting the importance of capturing dynamic behavior (second order statistics) in the malicious activities. These features are then used to train a support vector machine (SVM) for prediction. Our preliminary results show that we can achieve reasonably good prediction performance over a forecasting window of a few months.