nodejs 上下文_opentelemetry nodejs中的上下文管理

nodejs 上下文

This post shares some of our experience gained by writing nodejs plugins and debugging context issues at Aspecto.

这篇文章分享了我们通过在Aspecto上编写nodejs插件和调试上下文问题而获得的一些经验。

You might find it interesting if:

在以下情况下,您可能会发现它很有趣:

  • You are developing an instrumentation plugin for opentelemetry in node.

    您正在为node中的opentelemetry开发一个检测插件。
  • You get broken-traces or the structure of the trace tree is not as expected.

    您会得到断线或跟踪树的结构与预期不符。
  • You would like to have a deeper understanding of how the magic works under the hood.

    您想对魔术在引擎盖下的工作方式有更深入的了解。

So you started using OpenTelemetry in your javascript application to collect traces. Since you are reading an article about context management, I assume that you already read and understand what OpenTelemetry is all about and the basics of how to setup tracing for your application. You probably already run some example application like the basic-tracer-node under the “examples” directory in opentelemetry-js repository, and got a trace in Jaeger:

因此,您开始在JavaScript应用程序中使用OpenTelemetry来收集跟踪。 由于您正在阅读有关上下文管理的文章,因此我假设您已经阅读并了解OpenTelemetry的全部内容以及如何为应用程序设置跟踪的基础。 您可能已经在opentelemetry-js存储库中“ examples”目录下运行了一些示例应用程序,例如basic-tracer-node ,并在Jaeger中获得了跟踪:

In the trace above we have one main span, with 10 doWork child spans. This child-parent connection between main and doWork is what we want to talk about - how doWork knows to link to its parent?

在上面的跟踪中,我们有一个main跨度,其中有10个doWork子跨度。 maindoWork之间的这种子对父的连接是我们要讨论的doWork如何知道链接到其父级?

什么是跨度上下文 (What is Span Context)

A trace is a collection of spans, with some relation between the spans (parent <> children). In the example above we have 11 different spans which all belong to the same trace.

跟踪是跨度的集合,跨度之间有某些关系(父级<>子级)。 在上面的示例中,我们有11个不同的跨度,它们都属于同一条迹线。

In OpenTelemetry, each span is handled autonomously — it starts, ends, and processed orthogonally to other spans. Jaeger received each of the above spans individually and might receive many other spans from other traces in between. Since those traces share the same traceId, Jaeger can fuse them into one trace. It draws the spans as a tree because each span record the span id of it’s parent. We’ll call these two properties SpanContext. They are captured each time a span is created via the tracer.startSpan() function.

在OpenTelemetry中,每个跨度是自主处理的-它与其他跨度正交开始,结束和处理。 Jaeger分别接收了上述每个跨度,并且可能会从中间的其他迹线接收到许多其他跨度。 由于这些跟踪共享相同的traceId ,因此Jaeger可以将它们融合为一条跟踪。 它将跨度绘制为树,因为每个跨度记录其父级的跨度ID。 我们将这两个属性SpanContext 。 每次通过tracer.startSpan()函数创建跨度时都会捕获它们。

Span context is the glue that connects those separate spans into a trace. The SpanContext interface is defined in @opentelemetry/api:

跨度上下文是将那些单独的跨度连接到轨迹中的粘合剂。 SpanContext接口在@ opentelemetry / api中定义:

export interface SpanContext {
traceId: string;
spanId: string;
// and few other context attributes
}

目标 (The Goal)

The goal is to have a complete, well-structured trace tree.

目标是拥有完整的,结构良好的跟踪树。

If everything works properly, all you need to do is to install open telemetry plugins for the libraries you are using and want to auto instrument, setup and configure OpenTelemetry tracing, and should not bother yourself with context.

如果一切正常,您要做的就是为正在使用的库安装开放的遥测插件,并希望自动进行检测,设置和配置OpenTelemetry跟踪,并且不要为上下文而烦恼。

When things don’t work well (for various reasons), you might encounter:

当事情无法正常运行(由于各种原因)时,您可能会遇到:

  • Broken traces — where you expect to see one trace for a single execution, but receive multiple traces where each one is a sub-part of the full trace. For example: If your application receives an http call and writes something to the database, and you get two traces: one with http span, and another with database span.

    中断的跟踪-您希望在一次执行中看到一个跟踪,但是会收到多个跟踪,其中每个跟踪都是完整跟踪的一部分。 例如:如果您的应用程序收到一个http调用并将某些内容写入数据库,则您将获得两条跟踪:一条带有http span,另一条带有数据库span。
  • Wrong tree hierarchy — when you logically expect a span A to be the child of some span B, but instead it resides under another span C, giving the impression that the flow of the code is different from what it really is.

    错误的树层次结构-在逻辑上期望范围A是某个范围B的子级,但它驻留在另一个范围C下时,给人的印象是代码流与实际流不同。

跨度如何获得上下文? (How a Span Gets its Context?)

The most straightforward way to link a span to its parent is to set the parent option on SpanOptionswhen starting the span:

将范围链接到其父级的最直接方法是在启动范围时在SpanOptions上设置parent选项:

const span = tracer.startSpan('doWork', { parent });

The parentspan stores its own id and the trace id it belongs to, which becomes the parentSpanId and traceId of the new span. This method is sometimes useful, but it doesn’t fit for general instrumentation plugins which patch module functions and cannot usually pass around the parent span object to other plugins or places where it is needed.

parent范围存储了自己的ID和它所属的跟踪ID,它们成为新范围的parentSpanIdtraceId 。 该方法有时很有用,但不适用于修补模块功能的常规工具插件,并且通常无法将父span对象传递给其他插件或需要它的地方。

Let's illustrate with an example - we’ll take the basic-trace-node example that we used earlier and change it a bit:

让我们用一个例子来说明-我们将使用我们先前使用的basic-trace-node示例,并对其进行一些更改:

  • Replace the doWork function with a more realistic example that uses async-await semantic and axios to GETsome data from an external HTTP endpoint.

    doWork函数替换为一个更实际的示例,该示例使用async-await语义和axios从外部HTTP端点GET一些数据。

  • Install the https plugin, so outgoing HTTP traffic from thedoWork function will create a span per request.

    安装https插件,以便从doWork函数传出的HTTP通信将为每个请求创建一个跨度。

  • Reduce the work spans from 10 to 2, so Jaeger will be less noisy, and await for there completion with `Promise.all()`

    work范围从10减少到2,这样Jaeger的噪音就会减少,并等待通过Promise.all()完成。

The code after those modifications becomes:

修改后的代码变为:

"use strict";


const opentelemetry = require("@opentelemetry/api");
const { NodeTracerProvider } = require("@opentelemetry/node");
const { SimpleSpanProcessor } = require("@opentelemetry/tracing");
const { JaegerExporter } = require("@opentelemetry/exporter-jaeger");


// Use NodeTracerProvider which instruments https module automatically
const provider = new NodeTracerProvider();
const axios = require("axios").default;


const exporter = new JaegerExporter({ serviceName: "basic-service" });
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
provider.register();
const tracer = opentelemetry.trace.getTracer("example-basic-tracer-node");


const parentSpan = tracer.startSpan("main");
// call the doWork async function twice and await it's result
Promise.all([doWork(1), doWork(2)]).then(() => {
  parentSpan.end();
  exporter.shutdown();
});


async function doWork(index) {
  await axios.get(`https://jsonplaceholder.typicode.com/todos/${index}`);
}

If we run the above example and check Jaeger, we’ll see broken-traces — there are 3 traces instead of one! Our old friend “main” span as one-span trace, and two additional traces with single-span each, for the two HTTP operations. The issue is that the two HTTP spans were created internally in the @opentelemetry/https plugin and can’t tell their parent is “main” span. To fix the issue, we need to set a scope for the “main” span with the `tracer.withSpan` function:

如果运行上面的示例并检查Jaeger,我们将看到断线-3条线而不是1条! 我们的老朋友“主要”跨度为一个跨度跟踪,另外两个跟踪均为单跨度,用于两个HTTP操作。 问题在于,这两个HTTP跨度是在@opentelemetry/https插件内部创建的,无法区分其父级是“主”跨度。 为了解决这个问题,我们需要使用`tracer.withSpan`函数设置“主”范围的范围:

const parentSpan = tracer.startSpan("main");
tracer.withSpan(parentSpan, () => {
  // call the doWork async function twice and await it's result
  Promise.all([doWork(1), doWork(2)]).then(() => {
    parentSpan.end();
    exporter.shutdown();
  });
});

ThewithSpan function executes its callback parameter in such a way that when a new span is started inside that function or one of the cascading sync or async calls, the new span context is set correctly to parentSpan. Since we called doWork() from within the scope of withSpan , and it calls axios.get() which calls the instrumented https module - the HTTP spans are now created with the right traceId and parentSpan:

withSpan函数以这样的方式执行其回调参数:当在该函数内或级联同步或异步调用之一中启动新跨度时,新跨度上下文将正确设置为parentSpan。 由于我们在withSpan的范围内调用了doWork() ,并且它调用了axios.get() ,后者调用了已检测的https模块-现在使用正确的traceId和parentSpan创建HTTP span:

nodejs 上下文_opentelemetry nodejs中的上下文管理_第1张图片

We can see one complete trace, as well as that the two requests are executed simultaneously since they run with Promise.all and async function. We fixed the broken-traces issue by setting the scope for the span.

我们可以看到一个完整的跟踪,并且由于这两个请求是使用Promise.all和async函数运行的, Promise.all它们是同时执行的。 我们通过设置范围的范围来修复断线问题。

上下文管理器 (ContextManager)

ContextManager is a global, singleton service of the OpenTelemetry js framework which tracks the currently active context at any time. It is used by the tracer to query for the current active Context using the active()function, and to set scoped Context for following function calls using the with() and bind() functions.

ContextManager是OpenTelemetry js框架的全局单例服务,可随时跟踪当前活动的上下文。 跟踪程序使用它来使用active()函数查询当前的活动Context,并使用with()bind()函数为后续函数调用设置作用域的Context。

AsyncHooksContextManager (AsyncHooksContextManager)

OpenTelemetry will take care of setting the correct context for any function called from inside the with() scope, even for async function and promises which start new spans after being awaited.

OpenTelemetry将负责为在with()范围内调用的任何函数设置正确的上下文,甚至是异步函数,并保证在等待后开始新的跨度。

For example, the following code will produce a correct trace where the three HTTP spans are children of the main span:

例如,以下代码将产生正确的跟踪,其中三个HTTP跨度是main跨度的子级:

const parentSpan = tracer.startSpan("main");
tracer.withSpan(parentSpan, async () => {
  setTimeout(() => {
    axios.get(`https://jsonplaceholder.typicode.com/todos/1`);
  }, 1000);
  await axios.get(`https://jsonplaceholder.typicode.com/todos/2`);
  axios.get(`https://jsonplaceholder.typicode.com/todos/3`);
});


parentSpan.end();

The context will propagate correctly to the timer callback being executed after 1 second, to the awaited `axios.get` call, and to the non awaited, second axios.get call. This is done by subscribing to various lifecycle events of asynchronous resources using the async_hooks node module.

上下文将正确传播到1秒后执行的计时器回调,等待的`axios.get`调用和未等待的第二个axios.get调用。 这是通过使用async_hooks节点模块订阅异步资源的各种生命周期事件来完成的。

调试 (Debugging)

If you are writing a plugin, or need to debug an issue with span context not being set correctly, here are some tips from our experience at Aspecto:

如果您正在编写插件,或者需要调试跨度上下文设置不正确的问题,请参考我们在Aspecto的经验中的一些技巧:

  • To see the current active context, you can examine the return value from opentelemetry.context.active() imported as const opentelemetry = require("@opentelemetry/api") . This function will return a Context object, which is an empty object if there is no active context, and object with key such as “Symbol(OpenTelemetry Context Key ACTIVE_SPAN)” if a context is set.

    要查看当前的活动上下文,可以检查从opentelemetry.context.active()的返回值,该返回值以const opentelemetry = require("@opentelemetry/api")导入。 此函数将返回一个Context对象,如果没有活动的上下文,则为空对象;如果设置了上下文,则返回键为“ Symbol(OpenTelemetry Context Key ACTIVE_SPAN)”之类的对象。

  • If the code is using `thenables` objects (objects that has a then()method, and behave like a promise, but are not a real promise), then due to a bug in V8, context will not propagate across the thenable call. The following example code will create child_span as a new trace, and will not link it correctly under the main parent, even though it is awaited from inside the withSpan call.

    如果代码使用的是“ thenables”对象(具有then()方法的对象,其行为类似于promise,但不是真正的promise),则由于V8中的错误,上下文不会在thenable调用中传播。 以下示例代码将创建child_span作为新的跟踪,即使在withSpan调用中等待它,也不会在main父下正确链接它。

const parentSpan = tracer.startSpan("main");
tracer.withSpan(parentSpan, async () => {
  const thenable = {
    then: (onFulfilled) => {
      const span = tracer.startSpan("child span");
      onFulfilled(42);
      span.end();
    },
  };
  await thenable;
});


parentSpan.end();
  • The withSpan function is a common practice in js plugins. It scopes the part of code that should run under the context of a specific span. Another common practice is to use the bind() function, which binds a context to a later run of an arbitrary function or EventEmitter. It can sometimes be surprising to run a function and receive a context that is different from what’s stated in withSpan. So keep that in mind in case you see some voodoo context being set and not sure where it is coming from.

    withSpan函数是js插件中的常见做法。 它确定应在特定范围的上下文中运行的部分代码。 另一种常见的做法是使用bind()函数,该函数将上下文绑定到以后运行的任意函数或EventEmitter。 运行函数并接收与withSpan声明的上下文不同的上下文有时可能会令人惊讶。 因此,请记住这一点,以防您看到设置了一些伏都教上下文并且不确定其来源。

  • Make sure your spans are scoped correctly. In many modules, there are complicated callbacks logic and interactions with other modules. If your trace is not structured correctly, check which parts of the code resides under the withSpan() and bind() functions.

    确保您的跨度范围正确。 在许多模块中,都有复杂的回调逻辑以及与其他模块的交互。 如果跟踪的结构不正确,请检查代码的哪些部分位于withSpan()bind()函数之下。

if you have any questions on how to leverage OpenTelemetry on your environment feel free to reach me or the Aspecto team.

如果您对如何在您的环境中利用OpenTelemetry有任何疑问,请随时与我或Aspecto团队联系。

翻译自: https://medium.com/aspecto/context-managment-in-opentelemetry-nodejs-e5cddf22a09e

nodejs 上下文

你可能感兴趣的:(python,nodejs)