14,000 blocked reqs/sec
1.2 billion blocked reqs/day
Goal: exec all rules <= 1ms
actual execution ~400µs
1,937 string matches
5,682 general rules
102 Cloudflare Rules
CloudFlare wants to provide a WAF to a very large number of customers. To do so meant two things: being compatible with the existing mod_security WAF so that we could leverage existing rulesets and allow people familiar with mod_security (both CloudFlare people and customers) to write new rules.
CloudFlare’s WAF stops attacks at the network edge, protecting your website from common web threats and specialized attacks before they reach your servers. It covers both desktop and mobile websites as well as applications.
The Web Application Firewall (WAF) works by examining HTTP requests to your website. It looks at both GET and POST requests and applies rules to help filter out illegitimate traffic from legitimate website visitors. You can decide whether to block, challenge or simulate an attack. With blocking and challenging, CloudFlare’s WAF will block any traffic identified as illegitimate before it reaches your origin web server.
CloudFlare’s Web Application Firewall (WAF) automatically protects your website from these types of attacks:
• SQL injection, comment spam | • Cross-site scripting (XSS) |
• Distributed denial of service (DDoS) attacks | • Application-specific attacks (WordPress, CoreCommerce) |
Using www.jgc.org, it’s very easy to see the CloudFlare WAF in action. Using a simple GET operation with a dummy variable that contains a basic XSS script will trigger the security feature and show a page saying that you have been blocked.
GET /?user=<script>alert("test")</script> HTTP/1.1
Host: jgc.org
Connection: keep-alive
...
HTTP/1.1 403 Forbidden
Date: Wed, 10 Dec 2014 06:56:35 GMT
Content-Type: text/html; charset=UTF-8
...
Click here to see the error screen generated by the WAF
We use both the open source OWASP ruleset plus we developed our own internal rules based on attack traffic against CloudFlare customers. Today the majority of blocked requested are being stopped by our custom rules.
We develop rules internally based on attacks or vulnerabilities and then build a test suite (positive and negative tests to ensure that the rules are blocking only what we want). We have a large automatic test suite for the WAF which gets run across the entire rule set to ensure that it’s working correctly.
Recently added WAF Rules | ||
---|---|---|
Description | Exploit | Blog Post |
Drupal 7 sql injection | SA-CORE-2014-005 | Drupal 7 SA-CORE-2014-005 SQL Injection Protection |
Shellshock | Shellshock (software bug) | Inside Shellshock: How hackers are using it to exploit systems Shellshock protection enabled for all customers |
WHMCS Zero Day Vulnerability | WHMCS Security Advisory for 5.x | Patching a WHMCS zero day on day zero Protect Your Sites With Rapidly Deployed WAF Rules |
We process all requests. GETs, POSTs, etc. and the bodies that go with them. We have a custom routine inside the WAF that looks at POST data (for example) and identifies it by both the MIME type and by sniffing the actual bytes looking to see what the data is.
The WAF is not enabled for all customers. Only paying customers receive the WAF.
We work with our customers to define site specific rules for them and regularly put in place WAF rules to block site specific attacks. In future, we plan to roll out a user interface where customers can write and upload their rules for their sites.
Yes, speed matters enormously because of the scale of CloudFlare and because part of our service is performance. We have a variety of benchmarking tools but perhaps more important is our metrics system that allows us to examine real-time and historical performance information (including WAF performance).
Our goal is to run on average in under 1ms for each request being processed by the WAF. Currently we are in the 100s of µs (10th’s of milliseconds) per request. As an example, in the last 24 hours we have blocked 1.2 billion HTTP requests (that’s about 14,000 per second).
• 14,000 blocked reqs/sec | • 1.2 billion blocked reqs/day |
• Goal: exec all rules <= 1ms | • actual execution ~400µs |
• 1,937 string matches | • 5,682 general rules |
• 102 Cloudflare Rules |
When the code was first written and tested we were seeing about 10ms latency on a laptop machine. That was optimized using techniques like function memoization and then some architectural changes (mostly the elimination of the use of closures) and the latency was close to 1ms. After that the WAF was put into production and work was done using systemtap and internal tools to analyze LuaJIT and PCRE performance. We worked closely with Mike Pall (the LuaJIT maintainer) to ensure that WAF-specific functions we need are JITed.
Using LuaJIT is night and day. We would not ever use lua itself in production. LuaJIT is way more performant than Lua on x64 hardware (see http://luajit.org/performance_x86.html).
For the initial tuning of the WAF code we used Lua-based profiling tools (and wrote one ourselves) to look at performance of the Lua code that implements the WAF. Once in production we usedsystemtap and flamegraphs to identify hotspots and optimize them. When launching into production, we did not need to change anything in our physical infrastructure. We did not purchase or use any new hardware. The WAF is mostly CPU intensive.
Before we implemented the new WAF, CloudFlare has been running Apache alongside nginx just to be able to use mod_security. This combination was very slow and cumbersome. Ultimately it didn’t scale with CloudFlare’s growing business so we started working on a new WAF using nginx + LuaJIT.
CloudFlare is operating one of the world’s largest deployments of nginx + LuaJIT. Every fraction of a microsecond that can be shaved off for processing a request has significant impact so we decided to sponsor some changes to the LuaJIT opensource project.
The overall goal of the project was to get the median WAF block/allow decision made under 1ms in real world scenarios. Optimizations were made by examining the WAF’s performance under a test harness with line-level timing information. We ran the WAF in CloudFlare’s network with very detailed systemtap-based instrumentation.
Information from the systemtap is fed into a pastebin which parses it and produces a flame graph showing where the code is running.
The flamegraphs early on showed extensive uses of closures which was causing slowness in LuaJIT. Some parts of the compiler were rewritten to remove their use and make it run faster.
Here’s another view generated from the same information which identified hot functions. Here it shows that string matching and regular expressions are the most expensive operations.
To make these matching functions run faster, We have implemented our own version of the Aho-Corasick algorithm. The Aho-Corasick algorithm is a fast string matching algorithm that can match a large set of keywords simultaneously against incoming text. The advantage of the algorithm is that it can match multiple strings in a single pass over a large body of text, compared to searching for the strings individually using the Boyer-Moore search which requires multiple passes over the text. In this article, the author shows how Aho-Corasick is implemented using Haskell. CloudFlare has also open-sourced a custom Aho-Corasick implementation in Golang and C++ with LUA.
Optimizations in the Lua language, the LuaJIT compiler and the WAF core meant that for a very fast and flexible all Lua WAF which runs within nginx’s core.
See an example in LUA »
Watch John’s presentation on “Building a low-latency WAF inside NGINX using Lua” on YouTube. You can also download the presentation used in this video here.