在《Information Security》的2014年4月刊中,有一篇大数据安全分析的访谈。现摘录转载于此:

核心观点:

大数据安全分析并非想象中那么美好,是一个正确的方向,但还有很多工作要做,目前还是要谨慎行事。

Marcus Ranum: Anton, today I thought we could talk about big data and one of the firstquestions I should ask: Is it still just marketing hype? What do you think big data is?

Anton Chuvakin: As I mentioned in a recent blog post, if you fertilizethe field of big data with enough marketing hype, something will grow. Well, keep waiting for it.Use of big data analytics approaches for security seems like the most ‘BS-rich' area of theentire InfoSec realm. However, there are definitely end-user organizations doing it for real.


You and I both come from backgrounds involving a lot of trolling through data --specifically, system logs -- so I tend to see big data as a sort of ‘buy a great big backhoebecause you can do anything with a big enough backhoe' approach to data exploration, ratherthan data analysis.

Once you've figured out what fields you want to analyze and what you want to do with them,precomputing the data as it comes into your input stream makes more sense. It seems to me, big datais predicated on you not knowing what you're going to do with your data, so you should just throwlots of storage and CPUs at it. Does that sound right?

Chuvakin: Big data is predicated on you not knowing what to do with it in advance, but that isactually a good thing. The magic here comes from so-called late schema binding. If you have nasty,messy data and you do want to know what to do with it, you can come up with a schema basedon that knowledge, normalize the data to that schema and then toss it into an RDBMS.On the other hand nasty, messy data that you want to explore somehow may not be easy tonormalize, at least not at once. Thus, big data does often mean exploration and flexibility.

【不要为了大数据而大数据。否则,你将掉入另一个黑洞。】


Is big data only going to appeal to large businesses that are retrofitting new analysis atopold data dumps? It seems to me that it's something an organization can avoid, if IT departmentsactually think about what data they're collecting, what it means and then preprocessing itaccordingly.

Chuvakin: At this point, building your own big data platform is not just for the large, matureType A organizations. At Gartner, we say that big data analytics for security is for the ‘Type A ofType A.'【先进企业中的先进企业可以考虑用BDSA】

Our research shows that big data use for security will continue to be populated by the mostadvanced, mature, Type A organizations for the near future. Security may well be becoming a bigdata problem, but riding that big data wave will stay difficult and expensive for mostorganizations, at least for the next one to two years.

To add to this, several factors will make any semblance of massadoption of big data technology for security unlikely in the near term.

More informally, you think Oracle/SQL is hard and scary? Don't evencome within a mile radius of Hadoop.【用Hadoop其实比用传统关系型数据更复杂,你hold住吗?】


1) Load your data into Hadoop; 2) !?!?; and 3) Profit! Ultimately, it seems like big dataisn't going to solve the age-old problem: If you don't know what you're looking for you won't knowhow to look for it.

We've both been bumping up against this issue for a very long time in our system log analysisefforts. Do you see anything coming down the pike that's promising?

Chuvakin: Well, if you phrase it like that, it starts to sound pessimistic. However, if I insert‘data exploration' as step 2, it changes now, doesn't it? Big data approaches often do go bythat flow: collect->explore->profit. And big data tools make this possible, even if it's not easy.

Exploring unstructured big data piles, however, is much harder than running SIEM reports and mayinvolve text analytics, hard-core statistical methods and other esoteric disciplines that are farremoved from traditional security skill sets. It is not all about the keyword search.

Apart from exploration, more goal-driven approaches were also found to work for big data. Start thinking of clear goals and then testing them on data. Some organizations report success from usingthis model on security data as well as other big data.【对大数据的探索是大数据分析的关键,但这并非易事,绝不是全文检索那么简单。分析方法、算法、模式很重要】


【参考】

Gartner:对于2014年的大数据安全分析的运用审慎乐观