Apache Metron Meetup May 4, 2016 - Big data cybersecurity

原文来自:

http://slideshare.neatcn.com/hortonworks/apache-metron-meetup-may-4-2016-big-data-cybersecurity

需要了解更多信息,请访问:

http://hortonworks.com/apache/metron/ 


 Apache Metron Meetup May 4, 2016 - Big data cybersecurity

Apache Metron 大会 2016年5月4日 —— 大数据网络安全


  1. 1. Apache Metron Meetup & Code Lab George Vetticaden Principal Architect @ Hortonworks Apache Metron Committer James Sirota Engineering Lead & Chief Data Scientist @ Hortonworks Apache Metron Committer


文章标题:

Apache Metron Meetup & Code Lab

作者介绍:

George Vetticaden  :James Sirota 

Principal Architect @ Hortonworks :Engineering Lead & Chief Data Scientist @ Hortonworks 

Apache Metron Committer:Apache Metron Committer

  1. 2. Part 1 – Overview of Apache Metron • Challenges with Today’s Security Tools to Combat Cyber Attacks • Introduction to Apache Metron • Metron Architecture • Personas and Core Themes • Why Apache Metron? Part 2 – Code Lab: Adding a Net New Data Telemetry Data Source into Metron • Setting up the Use Case for the Code Lab: Tracing a Squid Telemetry through the platform • Get your Metron vagrant VM started • Use Case 1: Adding a net new telemetry data source to Metron • Use Case 2: Enriching Telemetry Data • Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds • Use Case 4: Setting up your IDE and writing Tests Agenda


Part1 —— Apache Metron 概览

  现今的安全工具应对网络攻击时面临的挑战;

  Metron架构;

  角色和核心主题;

  为什么选择Apache Metron?

Part2 —— 代码实验室:向Metron添加一个新的数据遥感数据源

  为代码实验室设置使用案例:跟着一个Squid遥感通过平台;

  启动Metron的非定向虚拟机;

  案例1:为Metron添加一个新的遥感信息源;

  案例2:丰富遥感数据;

  案例3:通过Intel Feeds来增加/丰富/验证;

  案例4:设置你的IDE并编写测试日程;

  1. 3. Metron
Metron

  1. 4. Page4 The Good Guys Security Practitioner I have too many tools I need to learn I don’t have a centralized view of my data My tools are too expensive I can’t find enough talent I can’t keep relying on static rules I need to discover bad stuff quicker Most of my alerts are false positives I have too many manual tasks SOC Manager Threat landscape too dynamic More assets/users to manage Attack surface increases Legacy techniques don’t work anymore Metron will make it easier and faster to find the real issues I need to act on Metron is a more cost effective way for my team to deal with the fast moving threat landscape


良性用户:

安全从业者面临的困难:

  有太多工具需要学习怎样使用;

  数据没有一个集中的视图;

  有太多操作任务;

  大多数警报都是假阳性;

  不能依赖统计规则;

  需要尽快发现坏事;

Metron将会更快且简便地发现需要对哪些问题采取行动。


SOC经理面临的困难:

  我的工具过于昂贵;

  找不到足够的人才;

  安全威胁变化迅速;

  需要管理更多的资产和人员;

  遗留技术不再管用;

Metron是一个成本更小且有效的方式,来为我们处理动态威胁。

  1. 5. Page5 The Bad Guys Advanced Persistent Threat Script Kiddie My techniques are predictable and known My attack vectors are also known You are not the only person I’ve attacked I brag about what I did or will do I set off a large number of alerts I fumble around a lot I am very unique in a way I do things I live on your network for about 300 days I know what I am after and I look for it, slowly Your rules will not detect me, I am too smart I impersonate a legitimate user, but I don’t act like one Metron can take everything that is known about me and check for it in real time Metron can model historical behavior of whoever I am impersonating and flag me as I try to deviate


恶性用户:

初级黑客面临的困难:

  我的行为被预测;

  我的攻击路径被知道;

  摸索花费很多时间;

  我触发了许多警告;

  你不是我唯一攻击过的人;

  我吹嘘我做过的和将要做的事情;

 Metron可以获取和我相关的一切事情,并且实时检测出来。


APT(高级可持续攻击)面临的困难:

  我做事情特立独行;

  我已经入侵你的网络300多天了;

  慢慢地,我知道我是谁,我知道我追求什么;

  你的规则并不会影响我,我很聪明;

  我模仿一个合法用户,但是我的行为看起来并不像;

  Metron可以做出历史模型,来检查我模仿了谁,并且画出我的逃离轨迹。

  1. 6. Page6 Problems With Existing Tools Security Information Management System I am prohibitively expensive I have vendor lock-in I can’t deal with big data I am not open I am not extensible enough Legacy Point Tools I was built for 1995 I am super specialized I don’t scale horizontally I have a proprietary format You need a PhD to operate me Behavioral Analytics Tools I am mostly vapor ware I was built by a small startup I was modeled after a data set from 1999 I spam you with false positives


现有工具存在的问题:

安全信息管理系统:

  昂贵;

  卖家固定;

  不能处理大数据;

  不开放;

  扩展性差。

遗留技术:

  为1995年的安全环境建立;

  过于专业;

  拥有一个特制格式;

  不具备规模水平;

  需要一个博士来操作;

行为分析工具:

  多数情况在吹牛;

  起点很小;

  根据一个1999年的数据集设立;

  垃圾邮件被判定为假阳性;

  1. 7. Page7 Apache Metron Vision “Apache Metron is a Security Data Analytics Platform (SDAP). As a next generation security analytics framework, it is designed to consume and monitor network traffic and machine data within an enterprise. Apache Metron is extensible and is designed to work at a massive scale. It is not a SIEM but rather the next evolution of a SIEM.” Apache Metron provides the following capabilities:  Extensible spouts and parsers for attaching Apache Metron to monitor any telemetry source  Extensible enrichment framework for any telemetry stream  Hadoop-backed storage for telemetry stream with a customizable retention time  Automated real-time index for telemetry streams enabling real-time search  Telemetry correlation and SQL query capability for data stored in Hadoop backed by Hive  ODBC/JDBC compatibility and integration with existing analytics tools


Apache Metron 展示

   Apache Metron是一个安全数据分析平台,做为下一代安全分析框架,它被设计用了消费和监控企业级别的网络数据,Apache Metron是可扩展的,被设计在大规模环境下工作,它不是SIEM,而是SIEM的下一步进展。


Apache Metron提供了以下的功能:

  可扩展的出口和解析器,用来检测各种检测源; 

  可扩展的加强数据;

  使用Hadoop存储可定制时间的监测流;

  为监测流提供自动化实时索引,用于保证实时搜索;

  为Hadoop中的数据(Hive提供支持)提供监测相关性和SQL查询功能;

  兼容ODBC/JDBC,并且集成现有的分析工具;

  1. 8. Challenges that Apache Metron Solves 60%: Percent of breaches that happened in minutes 8 months: Average time an advanced security breach goes unnoticed $400 million in estimated financial loss in 2015 70%-90%: Percentage of malware in breach unique to organization 2015 Verizon Data Breach Investigations Report • Too expensive to keep data for enough time to understand history • Not enough of the right data to provide context • Too expensive to collect all the desired data to understand context • Not sure if can detect a targeted event. • Too many events to review in timely manner • Not enough staff to review events in a timely manner


Apache Metron解决的问题:

  长期保持数据过于昂贵;

  提供的正确数据并不多;

  收集用于理解文章的数据过于昂贵;

  不确定是否可以检测目标事件;

  需要及时处理很多事件;

  没有足够的员工及时审查事件;


60%:在几分钟内发生的违规事件;

8个月:一个高级安全事件多久才会引起注意;

4亿:2015年的损失;

70%-90%:恶意组织所持有的恶意软件;

  1. 9. Part 1 – Overview of Apache Metron • Challenges with Today’s Security Tools to Combat Cyber Attacks • Introduction to Apache Metron • Metron Architecture • Personas and Core Themes • Why Apache Metron? Part 2 – Code Lab: Adding a Net New Data Telemetry Data Source into Metron • Setting up the Use Case for the Code Lab: Tracing a Squid Telemetry through the platform • Get your Metron vagrant VM started • Use Case 1: Adding a net new telemetry data source to Metron • Use Case 2: Enriching Telemetry Data • Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds • Use Case 4: Setting up your IDE and writing Tests Agenda




Part1 —— Apache Metron 概览

  现今的安全工具应对网络攻击时面临的挑战;

  Metron架构;

  角色和核心主题;

  为什么选择Apache Metron?

Part2 —— 代码实验室:向Metron添加一个新的数据遥感数据源

  为代码实验室设置使用案例:跟着一个Squid遥感通过平台;

  启动Metron的非定向虚拟机;

  案例1:为Metron添加一个新的遥感信息源;

  案例2:丰富遥感数据;

  案例3:通过Intel Feeds来增加/丰富/验证;

  案例4:设置你的IDE并编写测试日程;

  1. 10. Real-time Processing Engine PCAP NETFLOW DPI IDS AV EMAIL FIREWALL HOST LOGS PARSE NORMALIZE TAG VALIDATE PROCESS USER ASSET GEO WHOIS CONN ENRICH STIX Flat Files Aggregators Model As A Service Cloud Services LABEL PCAP Store ALERT PERSIST Alert Security Data Vault Apache Metron Logical Architecture Network Tap Custom Metron UI/Portals Real-Time Search Interactive Dashboards Data Modelling Integration Layer PCAP Replay Security Layer Data & Integration Services Apache Metron
Apache Metron 逻辑结构


  1. 11. Page11 Sensor A Sensor B Sensor N Topic A Topic B Topic (N) Apache Kafka PCAP PCAP Probe Physical Architecture Normalizing Topology A Normalizing Topology B Normalizing Topology N Apache Storm Native Format Native Format Native Format PCAP on HDFS Metron PCAP Service PCAP Topology Enrich Normalized Metron Format Enrichment/ Threat Intel Topology Out to Index + HDFS

  物理结构


  

  1. 12. Page12 Topic A Normalizing Topology A Sensor A Native Format Apache Kafka Apache Storm Kafka Spout Parser Kafka Bolt Enriched Metron JSON Parsing/Normalization Topology Key Points: • Each New Telemetry Data Source will have its own Parser Topology • Two types of Parsers available: Grok and Java

解析器/正常化拓扑

  关键点:

  每一个新的监测数据源都有自己的解析器拓扑;

  两种类型解析器:Grok和Java;

  1. 13. Page13 2 Types of Parsers Parser Type Description Telemetry Type Grok • A grok is a collection of named regular expressions. • Provides a declarative way to write new parsers without any code • A parser takes an input, which is usually a byte array coming from the Kafka Spout, and turns it into a Metron JSON Object. • The Grok parser does this by utilizing the Grok library inside of the Parser Kafka Bolt Adapter. • Use this parser when telemetry is simple to parse or low in volume Java • Java based approach to writing a custom parsers • Use this parser when telemetry is complex to parse or high volume
两种类型解析器:

Grok:

一个Grok是一些指定正则表达式的集合;

提供了一种声明式的方式编写新的解析器,而没有任何代码;

代码解析器需要一个输入,这通常是一个来自Kafka Bolt 适配器的字节数组;

Grok适配器通过利用解析器Kafka Bolt适配器中的Grok图书馆来完成这个工作;

当探测源简单且量小的时候使用;


Java:

基于Java的方法写一个自定义解析器;

当探测源复杂且量大的时候使用;

  1. 14. Page14 Metron JSON Object • Numerous sensors log in different formats. The parser should normalize at least the following subset of fields to the following Metron JSON naming conventions:
Metron JSON 主题

  传感器有不同的日志文件;

  解析器应该正常化至少以下字段, Metron JSON命名约定如下:

  1. 15. Page15 Enrich ment Bolt(a) Enrich ment Bolt(n) Threat Intel Joiner Message Splitter: Enrichment Enrich ment Joiner Message Splitter: Threat Intel Model Bolt (n) Threat Intel Bolt (n) Metron Enrichment Loader Framework Metron Threat Loader Framework Data Store Fast Cach e Fast Cach e Fast Cach e Fast Cach e Data Store Enrichment Topology Apache Kafka Enriched Writer Bolt = Message Stream Apache Storm = Enrichment Stream Enrichment Topology
浓缩拓扑

  1. 16. Page16 Part 1 – Overview of Apache Metron • Challenges with Today’s Security Tools to Combat Cyber Attacks • Introduction to Apache Metron • Metron Architecture • Personas and Core Themes • Why Apache Metron? Part 2 – Code Lab: Adding a Net New Data Telemetry Data Source into Metron • Setting up the Use Case for the Code Lab: Tracing a Squid Telemetry through the platform • Get your Metron vagrant VM started • Use Case 1: Adding a net new telemetry data source to Metron • Use Case 2: Enriching Telemetry Data • Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds • Use Case 4: Setting up your IDE and writing Tests Agenda

Part1 —— Apache Metron 概览

  现今的安全工具应对网络攻击时面临的挑战;

  Metron架构;

  角色和核心主题;

  为什么选择Apache Metron?

Part2 —— 代码实验室:向Metron添加一个新的数据遥感数据源

  为代码实验室设置使用案例:跟着一个Squid遥感通过平台;

  启动Metron的非定向虚拟机;

  案例1:为Metron添加一个新的遥感信息源;

  案例2:丰富遥感数据;

  案例3:通过Intel Feeds来增加/丰富/验证;

  案例4:设置你的IDE并编写测试日程;


  1. 17. Page17 Personas

角色

SOC 分析者

SOC 调查者

SOC 管理者

法律调查者;

安全平台工程师;

安全数据科学家;



  1. 18. Page18 Metron’s Key Functional Themes Platform Work done to harden the platform for performance, scale, extensibility and maintainability. This also includes capabilities around provisioning, managing and monitoring the application. Set of Data Sources that Metron provides capabilities to stream, ingest and parse into the platform. A set of Storm Topologies to perform various actions in real-time including: normalization of telemetry data, enrichment, cross reference with threat intel feeds, alerting, indexing, and persisting into Historical stores Data Collection Data Processing UI Set of portal, dashboard and user interfaces for the different personas.

Metron的关键功能主题

平台:

数据收集:

数据处理:

UI展示:

  1. 19. Page19 Target Personas and Themes for Apache Metron 0.1 T e c h P r e v i e w 1 - I n t r o Theme: Platform Theme: Data Collection Theme: Data Processing Theme: UI Security Platform Engineer Security Platform Engineer Security Platform Engineer SOC Investigator Security Platform Engineer SOC Investigator Forensic Investigator SOC Investigator SOC Analyst SOC Manager
Apache Metron0.1 的目标角色和主题

主题:平台;

主题:数据收集;

主题:数据处理:

主题:页面展示;

  1. 20. Page20 • Fully automated vagrant install of Metron on a single VM • Fully automated install of Metron on multi-node HDP cluster via Ansible scripts, Ambari blueprints and APIs including: • Multi-node Elastic Search Cluster • Metron-UI Web Application • Deployment of the Metron Storm Topology • Deployment of telemetry sensors: PCAP, Bro, YAF(Netflow), Snort • OpenSOC redesign (new topology structure, extensible enrichments, threat intel, data loads, configs, ease of adding new topologies) Platform Data Collection • Ingestion of the following data sources: PCAP via pycapa or C++ DPDK probe, Bro, Netflow via YAF, Snort • Parsers for the following data sources: PCAP, Bro, Netflow & Snort Data Processing • Support for the following enrichment services: Geo, WhoIs, Host • Threat Intelligence Message enrichment - Enrich messages with fields that mat the threat intelligence data in HBase • Support for the following persistence services: HDFS, HBase and Elastic Search • Indexing events and Alerts into Elastic Search cluster • Support for Soltra(CIF) Threat Aggregator Services via STIX and Taxii Feed • Ability to replay PCAP files for Testing UI • Metron Investigator UI to search across indexed events and alerts for SOC Analyst & Investigators • Histogram Panels for each of the data sources (YAF, Bro, Snort) • Table Views for Alerts (YAF, Bro, Snort) • Customize new panels with different data sources and different panel types. Key Features of Apache Metron 0.1
Apache Metron0.1 的关键特征

平台:

数据收集:

数据处理:

页面展示:


  1. 21. Page21 Part 1 – Overview of Apache Metron • Challenges with Today’s Security Tools to Combat Cyber Attacks • Introduction to Apache Metron • Metron Architecture • Personas and Core Themes • Why Apache Metron? Part 2 – Code Lab: Adding a Net New Data Telemetry Data Source into Metron • Setting up the Use Case for the Code Lab: Tracing a Squid Telemetry through the platform • Get your Metron vagrant VM started • Use Case 1: Adding a net new telemetry data source to Metron • Use Case 2: Enriching Telemetry Data • Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds • Use Case 4: Setting up your IDE and writing Tests Agenda

Part1 —— Apache Metron 概览

  现今的安全工具应对网络攻击时面临的挑战;

  Metron架构;

  角色和核心主题;

 为什么选择Apache Metron?

Part2 —— 代码实验室:向Metron添加一个新的数据遥感数据源

  为代码实验室设置使用案例:跟着一个Squid遥感通过平台;

  启动Metron的非定向虚拟机;

  案例1:为Metron添加一个新的遥感信息源;

  案例2:丰富遥感数据;

  案例3:通过Intel Feeds来增加/丰富/验证;

  案例4:设置你的IDE并编写测试日程;


  1. 22. Page22 Why Metron? SOC Analyst Perspective Looking through alerts 25% Collecting contextual data 25% Formulating a Hypothesis 5% Investigate 20% Remediate 15% Update Workflow 5% Wrte Report 5% ANALYST WORKFLOW • Alerts Relevancy Engine • Smarter ML alerts • Centralized Alerts Console • Enriched with threat intel data • Fully enriched messages • Single pane of glass UI • Centralized real-time search • All logs in one place • Granular access to PCAP • Replay old PCAP against new signatures • Tag behavior for modelling by data scientists • Raw messages used as evidentiary store • Mine investigation history • Asset inventory as an enrichment • User identity as an enrichment • Workflow engine • Ticket clustering Everything you need to know in one place
为什么选择Metron?

从SOC分析者的角度看:


  1. 23. Page23 Why Metron? Data Scientist Perspective Formulating a Hypothesis 5% Finding Data 20% Cleaning Data 20% Munging Data 20% Visualizing Data 20% Modelling Data 10% Validating Model 5% DATA SCIENCE WORKFLOW • All my data is in the same place • Data exposed through a variety of APIs • Standard Access Control Policies • Quickly see what I have • Metron normalizes objects • Partial schema validation on ingest • Tagging on ingest • Automatic data enrichment • Automatic application of class labels • Common Metron Objects • Massively parallel computation framework • Reusable Zeppelin Dashboards • Real-time search + UI • Integration with Python/R • Integration with analytics tools Reducing time from hypothesis to model
为什么选择Metron?

从数据科学家的角度看:


  1. 24. Page24 Part 1 – Overview of Apache Metron • Challenges with Today’s Security Tools to Combat Cyber Attacks • Introduction to Apache Metron • Metron Architecture • Personas and Core Themes • Why Apache Metron? Part 2 – Code Lab: Adding a Net New Data Telemetry Data Source into Metron • Setting up the Use Case for the Code Lab: Tracing a Squid Telemetry through the platform • Get your Metron vagrant VM started • Use Case 1: Adding a net new telemetry data source to Metron • Use Case 2: Enriching Telemetry Data • Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds • Use Case 4: Setting up your IDE and writing Tests Agenda

目录

第二部分:

  1. 25. Page25 Use Case Setup • Scenario • Customer Foo has installed Metron TP1 and they are using the out of the box data sources (PCAP, YAF/Netflow, Snort and Bro). They love Metron! • But now they want to add new data source the the platform: squid proxy logs. • Customer Foo’s requirements are the following 1. Need to ingest the proxy events from Squid logs in real-time 2. The proxy logs has to be parsed into a standardized JSON structure that Metron can understand 3. In real-time, the squid proxy event needs to be enriched with domain/whois information (domain, cert, country, company) 4. In real-time, the domain of the proxy event must be checked against for threat intel feeds 5. If there is a threat intel hit, an alert needs to be raised 6. The end user must be able to see the new telemetry events and the alerts from the new data source
案例设置:

场景:

用户需要的是:

  1. 26. Page26 Squid & its Telemetry Event • What is Squid? • Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages • What does a Squid Access Log look like? • When you make an outbound http connection to https://www.cnn.com, the following entry gets added to a file called access.log: Unix Epoch Time IP of host where connection was made. The domain name of the outbound connection 1461576382.642 161 98.220.218.158 TCP_MISS/200 103701 GET http://www.cnn.com/ - DIRECT/199.27.79.73 text/html
Squid & 它的检索事件

什么是Squid?

Squid的接入日志是什么样的?

  1. 27. Page27 What Metron does to the Squid Telemetry Event in Real-time Convert from Unix Epoch to Timestamp Use Metron’s asset enrichment to enrich that IP (hostname, type of device) Use Metron’s WhoIs enrichment To look up domain name information (e.g: Use the Metron’s Threat Intel Services to cross-reference the IP with threat intel feed to see if there is a hit 1461576382.642 161 127.0.0.1 TCP_MISS/200 103701 GET http://www.cnn.com/ - DIRECT/199.27.79.73 text/html Index the event into Elastic and persist into HDFS (Security Data Vault)
Metron怎么实时处理Squid检测事件。


  1. 28. Page28 Real-time Processing Engine Squid Logs PARSE NORMALIZE TAG VALIDATE PROCESS USER ASSET GEO WHOIS CONN ENRICH STIX Flat Files Aggregators Model As A Service Cloud Services LABEL PCAP Store ALERT PERSIST Alert Security Data Vault Real-Time Search Interactive Dashboards Data Modelling Integration Layer PCAP Replay Security Layer Data & Integration Services Tracing the Squid Event across the Platform Custom Metron UI/Portals
通过平台,遍历Squid事件:

  1. 29. Page29 Step 1: Telemetry Ingest (Tracing an Event) 1461576382.642 161 98.220.218.158 TCP_MISS/200 103701 GET http://www.cnn.com/ - DIRECT/199.27.79.73 text/html
检测摄取

  1. 30. Page30 Step 2 – Process/Parse (Tracing an Event)
过程

  1. 31. Page31 Step 3 – Enrich (Tracing an Event)
丰富

  1. 32. Page32 Enriching Data Architecture Metron Enrichment Store (HBase/) Enrichment Loader Framework Bulk Load Polling Enrichment Source Storm Bolt Cache Metron Streaming Messages Enriched Metron Streaming Messages
丰富数据结构

  1. 33. Page33 Step 4 – Label/Threat Intel (Tracing an Event) Threat Intel Store (HBase) Threat Intel Loader Framework Bulk Load Polling Storm Bolt Cache Metron Streaming Messages (Enriched) Enriched Metron Streaming Messages (Enriched) + Threat Intel Hits Threat Intel Feed Source (Optional) Threat Intel Aggregator
标签/因特尔 威胁

  1. 34. Page34 High level Steps – How to Add the New Telemetry 1. Create new Kafka topic for the new telemetry source called “squid” 2. Create and validate a grok statement file that parses the squid event log into a format that Metron can understand 3. Store that grok statement in HDFS 4. Create a new flux configuration for the new Squid parser Storm Topology. 5. Update Zookeeper with configuration to mark what fields in the telemetry to enrich and what fields to cross- reference with threat intel feeds. 6. Move the flux configuration to the host where you will deploy the topology. 7. Deploy the new squid Storm parser topology using the new flux configuration 8. Load WhoIs enrichment data and configure enrichment mapping 9. Load Threat Intel data and configure threat intel matching mapping 10. Use Apache Nifi to capture the squid events and push them into Metron 11. Create a new Panel in Kibana and see the telemetry events Key Points Easy Extensibility – The ability to add new data source without writing any code and in an easy mann Repeatable Pattern - The following represents a repeatable pattern that you can apply to most data s
高级操作:

怎样增加探测源


你可能感兴趣的:(Metron,翻译)