机器学习案例分析_网络安全分析中的机器学习案例

机器学习案例分析

机器学习案例分析_网络安全分析中的机器学习案例_第1张图片Security tends to scale badly with complexity. As information, applications and systems become more sophisticated so to do the challenges faced in assuring Confidentiality, Integrity and Availability. The role of machine assistance is emerging as one of the most important areas in data science, much of which is underpinned by techniques from machine learning and statistics. Using these techniques we are developing characterisation engines that provide baseline and anomaly detection capabilities, alleviating the need for explicit signatures to be written. Such techniques will become increasingly important as the range and capability of networked devices increases – as evidenced by the recent exploitation of IoT devices to mount large-scale Distributed Denial of Service (DDoS) campaigns. Encompassing all areas of systems design and engineering, information security presents a unique problem space in which these techniques can be developed and tested.

安全性往往会因复杂性而严重扩展。 随着信息,应用程序和系统变得越来越复杂,以应对确保机密性,完整性和可用性所面临的挑战。 机器辅助的作用正在成为数据科学中最重要的领域之一,其中许多以机器学习和统计技术为基础。 使用这些技术,我们正在开发表征引擎,该引擎提供基线和异常检测功能,从而减少了要写入显式签名的需求。 随着联网设备的范围和功能的增加,此类技术将变得越来越重要-物联网设备的最新开发证明了这种趋势,即大规模开展分布式拒绝服务(DDoS)活动。 信息安全涵盖了系统设计和工程的所有领域,提供了一个独特的问题空间,可以在其中开发和测试这些技术。

网络安全分析中的机器学习案例 (The case for machine learning in network security analysis)

In the context of information security, the possible applications of machine learning and many and varied. Security analysis of network data is complex, involving datasets that are large in terms of both their volume and density. We have been working on how machine learning can be used to support human analysts in making sense of the data produced by the systems the monitor; and in helping them to make decisions based on insights they draw from this analysis. We have developed a set of bespoke classification, clustering and anomaly detection techniques in python, SPL and C to characterise networks based on the events taking place within them. These include:

在信息安全的背景下,机器学习的可能应用还有很多。 网络数据的安全性分析非常复杂,涉及的数据量和密度都很大。 我们一直在研究如何使用机器学习来支持人类分析人员理解由监视器系统生成的数据。 并帮助他们根据从分析中得出的见解做出决策。 我们在python,SPL和C中开发了一套定制分类,聚类和异常检测技术,以根据其中发生的事件来表征网络。 这些包括:

  • Applying multiple learning models including semi-supervised, unsupervised and re-enforcement;
  • A classifier to detect interactive webshell traffic using only low resolution data sources, such as netflow or IPFIX;
  • Domain-specific distance functions to express
  • A heuristic anomaly detection algorithm built using probabilistic techniques and operating at medium-dimensionality; and
  • A set of clustering techniques built purely for network datasets, capable of processing both summary and full-capture (such as PCAP) data.
  • 采用多种学习模式,包括半监督,无监督和强化;
  • 仅使用低分辨率数据源(例如netflow或IPFIX)检测交互式Webshel​​l流量的分类器;
  • 特定领域的距离函数来表达
  • 一种启发式异常检测算法,使用概率技术构建,并在中等维度下运行;
  • 一组纯粹为网络数据集构建的聚类技术,能够处理摘要和完整捕获(例如PCAP)数据。

The overriding purpose of these techniques is to develop re-usable, but domain-specific tools that help the analyst to make greater use of their monitoring data. An important aspect of this process is computing in environments where data can be incomplete, multivariate and which represents very different types of infrastructure.

这些技术的首要目的是开发可重用但针对特定领域的工具,以帮助分析人员更好地利用其监视数据。 此过程的重要方面是在数据可能不完整,多变量并且代表非常不同类型的基础结构的环境中进行计算。

维数的诅咒 (The Curse of Dimensionality)

In common with many other ‘Big Data’-like problems, security analysis can quickly fall a foul of The Curse of Dimensionality – that is to say, the problem escalates into (potentially very) high-dimensional spaces, bringing with it significant computability and scalability challenges. Analysis of network datasets often leads to a comparatively large number of inferred dimensions being built atop native data points such as IP addresses, directionality, byte counts, protocols used, etc. In Emerging Technology we have developed a set of corresponding techniques for dimension reduction, specifically for use within this area – notably non-linear manifold learning techniques by domain-tuning Principal Component Analysis and T-distributed Stochastic Neighbour Embedding. These techniques allow the user to better manage their data, but also open-up the problem space to techniques that would otherwise be infeasible.

与许多其他类似“大数据”的问题一样,安全分析可能很快就成为“维度诅咒”的罪魁祸首。也就是说,问题升级到(可能是非常)高维空间,从而带来了显着的可计算性和可扩展性挑战。 网络数据集的分析通常会导致在本机数据点(例如IP地址,方向性,字节数,所使用的协议等)之上构建相对大量的推断维。在新兴技术中,我们开发了一套相应的技术来减少维数,特别适用于该领域-特别是通过域主分量分析和T分布随机邻居嵌入进行非线性流形学习的技术。 这些技术使用户可以更好地管理他们的数据,而且可以将问题空间开放给原本不可行的技术。

下一步… (Next steps…)

翻译自: https://www.pybloggers.com/2016/11/the-case-for-machine-learning-in-network-security-analysis/

机器学习案例分析

你可能感兴趣的:(机器学习案例分析_网络安全分析中的机器学习案例)