WebRTC Native 源码1:相机采集实现分析

国内Agora为开发者们提供了一份 WebRTC 镜像源,可以更快速地下载、编译 WebRTC。

镜像说明

支持编译版本包括:Linux、Android、iOS、Windows

本镜像将根据官方版本动向,不定期更新

镜像地址

配置与使用指南,也在该页面中,按照步骤操作即可。

WebRTC 的代码量不小,一次性看明白不太现实,在本系列中,我将试图搞清楚三个问题:

1、客户端之间如何建立连接?

2、客户端之间如何实现数据传输?

3、音视频数据的采集、预览、编码、传输、解码、渲染完整流程。

本文是第一篇,我将从Android平台入手,分析一下 WebRTC相机采集的实现。
代码位置:webrtc/src/sdk/objc/components/capturer
RTCCameraVideoCapturer.h RTCCameraVideoCapturer.m
RTCFileVideoCapturer.h RTCFileVideoCapturer.m

视频采集主要类

VideoCapturer

如下所示,webrtc针对视频采集对外主要提供的是VideoCapturer接口,实现类有ScreenCapturerAndroid、FileVideoCapturer和CameraCapturer,分别表示屏幕、文件、摄像头三种不同的视频来源,因为android系统先后提供了camera1.0和camera2.0接口,因此CameraCapturer又用Camera1Capturer和Camera2Capturer两个子类分别表示。


image.png

VideoCapturer定义了CapturerObserver接口,如下所示,可通过实现该接口来接收图像数据。

// Interface used for providing callbacks to an observer.
  public interface CapturerObserver {
    // Notify if the camera have been started successfully or not.
    // Called on a Java thread owned by VideoCapturer.
    void onCapturerStarted(boolean success);
    void onCapturerStopped();

    // Delivers a captured frame. Called on a Java thread owned by VideoCapturer.
    void onByteBufferFrameCaptured(
        byte[] data, int width, int height, int rotation, long timeStamp);

    // Delivers a captured frame in a texture with id |oesTextureId|. Called on a Java thread
    // owned by VideoCapturer.
    void onTextureFrameCaptured(int width, int height, int oesTextureId, float[] transformMatrix,
        int rotation, long timestamp);

    // Delivers a captured frame. Called on a Java thread owned by VideoCapturer.
    void onFrameCaptured(VideoFrame frame);
  }

webrtc中的AndroidVideoTrackSourceObserver类实现了该接口,然后由其分发到Sink模块,比如编码器、本地预览等需要图像数据的模块。

CameraEnumerator

CameraCapturer对象的创建是由CameraEnumerator来完成,考虑到有前后摄像头之分,CameraEnumerator为不同的摄像头分配特定的名称,并根据名称来创建CameraCapturer对象,Camera1Enumerator和Camera2Enumerator创建的分别是Camera1Capturer和Camera2Capturer对象。

camera1

  • 创建 Camera 对象:Camera.open
  • 设置预览 SurfaceTexture,用来接收帧数据(位于显存中):camera.setPreviewTexture
  • 选择合适的相机预览参数(尺寸、帧率、对焦):Parameterscamera.setParameters
  • 如果需要获取内存数据回调,则需要设置 buffer 和 listener:camera.addCallbackBuffercamera.setPreviewCallbackWithBuffer
  • 如果需要相机服务为我们调整数据方向,则可以设置旋转角度:camera.setDisplayOrientation
  • 开启预览:camera.startPreview
  • 停止预览:camera.stopPreviewcamera.release

Camera2

  • 创建 CameraManager 对象,相机操作始于“相机管家”:context.getSystemService(Context.CAMERA_SERVICE)
  • 创建 CameraDevice 对象:cameraManager.openCamera
  • 和 Camera1 不同,Camera2 的操作都是异步的,调用 openCamera 时我们会传入一个回调,在其中接收相机操作状态的事件;
  • 创建成功:CameraDevice.StateCallback#onOpened
  • 创建相机对象后,开启预览 session,设置数据回调:camera.createCaptureSession,同样,这个操作也会传入一个回调;
  • session 开启成功:CameraCaptureSession.StateCallback#onConfigured
  • 开启 session 后,设置数据格式(尺寸、帧率、对焦),发出数据请求:CaptureRequest.Buildersession.setRepeatingRequest
  • 停止预览:cameraCaptureSession.stopcameraDevice.close

CameraEnumerator 接口如下:

public interface CameraEnumerator {
  public String[] getDeviceNames();
  public boolean isFrontFacing(String deviceName);
  public boolean isBackFacing(String deviceName);
  public List getSupportedFormats(String deviceName);
  public CameraVideoCapturer createCapturer(
      String deviceName, CameraVideoCapturer.CameraEventsHandler eventsHandler);
}

CameraSession

CameraCapturer主要是实现接口和状态维护,与android camera接口打交道的是通过创建的CameraSession对象来完成,相对应的也有Camera1Session和Camera2Session不同的实现,Camera1Session封装是camera1.0接口,即android.hardware.Camera,Camera2Session封装的是camera2.0接口,即android.hardware.camera2。camera1.0和camera2.0可以通过SurfaceTexture来接收图像数据,webrtc提供了SurfaceTextureHelper来帮助管理SurfaceTexture,camera1.0还可以通过注册PreviewCallback来接收YUV数据。

CameraSession通过CreateSessionCallback和Events接口来分发事件和图像数据,如下所示,CameraCapturer实现了这两个接口来接收并转发事件和图像数据。

 public interface CreateSessionCallback {
    void onDone(CameraSession session);
    void onFailure(FailureType failureType, String error);
  }
  // Events are fired on the camera thread.
  public interface Events {
    void onCameraOpening();
    void onCameraError(CameraSession session, String error);
    void onCameraDisconnected(CameraSession session);
    void onCameraClosed(CameraSession session);
    void onFrameCaptured(CameraSession session, VideoFrame frame);

    // The old way of passing frames. Will be removed eventually.
    void onByteBufferFrameCaptured(
        CameraSession session, byte[] data, int width, int height, int rotation, long timestamp);
    void onTextureFrameCaptured(CameraSession session, int width, int height, int oesTextureId,
        float[] transformMatrix, int rotation, long timestamp);
  }

视频数据表示

视频是由一帧一帧图像组成的,图像格式有RGB和YUV两类,每类又有不同的格式,webrtc中统一用VideoFrame来表示,不管是什么格式,本质就是一段buffer,只是buffer的格式不一样,webrtc图像数据需要在java层和native层相互传递,因此在java层和native层都有定义。

java层主要类如下:

image.png

Buffer接口定义如下:

public interface Buffer {
    /**
     * Resolution of the buffer in pixels.
     */
    @CalledByNative("Buffer") int getWidth();
    @CalledByNative("Buffer") int getHeight();

    /**
     * Returns a memory-backed frame in I420 format. If the pixel data is in another format, a
     * conversion will take place. All implementations must provide a fallback to I420 for
     * compatibility with e.g. the internal WebRTC software encoders.
     */
    @CalledByNative("Buffer") I420Buffer toI420();

    /**
     * Reference counting is needed since a video buffer can be shared between multiple VideoSinks,
     * and the buffer needs to be returned to the VideoSource as soon as all references are gone.
     */
    @CalledByNative("Buffer") void retain();
    @CalledByNative("Buffer") void release();

    /**
     * Crops a region defined by |cropx|, |cropY|, |cropWidth| and |cropHeight|. Scales it to size
     * |scaleWidth| x |scaleHeight|.
     */
    @CalledByNative("Buffer")
    Buffer cropAndScale(
        int cropX, int cropY, int cropWidth, int cropHeight, int scaleWidth, int scaleHeight);
  }

native层主要类如下:

image.png

VideoFrameBuffer接口定义如下:

class VideoFrameBuffer : public rtc::RefCountInterface {
 public:
  // New frame buffer types will be added conservatively when there is an
  // opportunity to optimize the path between some pair of video source and
  // video sink.
  enum class Type {
    kNative,
    kI420,
    kI420A,
    kI444,
  };

  // This function specifies in what pixel format the data is stored in.
  virtual Type type() const = 0;

  // The resolution of the frame in pixels. For formats where some planes are
  // subsampled, this is the highest-resolution plane.
  virtual int width() const = 0;
  virtual int height() const = 0;

  // Returns a memory-backed frame buffer in I420 format. If the pixel data is
  // in another format, a conversion will take place. All implementations must
  // provide a fallback to I420 for compatibility with e.g. the internal WebRTC
  // software encoders.
  virtual rtc::scoped_refptr ToI420() = 0;

  // These functions should only be called if type() is of the correct type.
  // Calling with a different type will result in a crash.
  // TODO(magjed): Return raw pointers for GetI420 once deprecated interface is
  // removed.
  rtc::scoped_refptr GetI420();
  rtc::scoped_refptr GetI420() const;
  I420ABufferInterface* GetI420A();
  const I420ABufferInterface* GetI420A() const;
  I444BufferInterface* GetI444();
  const I444BufferInterface* GetI444() const;

 protected:
  ~VideoFrameBuffer() override {}
};

从上面两个图可以看到,java和native的定义比较类似,而且都实现了转化为I420格式的接口,以java为例,定义了几种YUV Buffer和Texture Buffer,对于YUV Buffer对象,成员就是ByteBuffer了,每一帧的ByteBuffer内容肯定不一样,而对于Texture Buffer对象,涉及到opengl这块,成员主要是id和矩阵,这两个信息每一帧都是一样的,id对应一个native层的对象,而这个对象拥有一个buffer内容可变的成员,buffer格式也是YUV或者RGB。

视频采集

以android camera1.0 PreviewCallback方式获取图像数据为例,视频采集和分发流程如下所示:

image.png

分发流程中主要类如下所示,VideoBroadcaster有个std::vector类型的sinks_成员,存储了需要分发的sink对象,通过AddOrUpdateSink和RemoveSink函数来添加和删除。

image.png

从中可以看出,从camera获取到图像数据后,通过AndroidVideoTrackSourceObserver传递给native层的AndroidVideoTrackSource对象,再由VideoBroadcaster分发给不同的sink,通过VideoStreamEncoder分发给编码器,通过VideoSinkWrapper分发给java层的VideoSink对象,比如用于本地预览的SurfaceViewRenderer对象。

java层的VideoSink定义如下,onFrame是从native层的VideoSinkWrapper回调上来的。

public interface VideoSink {
  /**
   * Implementations should call frame.retain() if they need to hold a reference to the frame after
   * this function returns. Each call to retain() should be followed by a call to frame.release()
   * when the reference is no longer needed.
   */
  @CalledByNative void onFrame(VideoFrame frame);
}

从camera到AndroidVideoTrackSourceObserver的流程比较简单,下面分析一下从AndroidVideoTrackSourceObserver之后的流程:

AndroidVideoTrackSourceObserver的nativeOnByteBufferFrameCaptured函数实现如下:

static void JNI_AndroidVideoTrackSourceObserver_OnByteBufferFrameCaptured(
    JNIEnv* jni,
    const JavaParamRef&,
    jlong j_source,
    const JavaParamRef& j_frame,
    jint length,
    jint width,
    jint height,
    jint rotation,
    jlong timestamp) {
  AndroidVideoTrackSource* source =
      AndroidVideoTrackSourceFromJavaProxy(j_source);
  jbyte* bytes = jni->GetByteArrayElements(j_frame.obj(), nullptr);
  source->OnByteBufferFrameCaptured(bytes, length, width, height,
                                    jintToVideoRotation(rotation), timestamp);
  jni->ReleaseByteArrayElements(j_frame.obj(), bytes, JNI_ABORT);
}

其中source是一个AndroidVideoTrackSource对象,它的OnByteBufferFrameCaptured函数最后调用的是父类AdaptedVideoTrackSource的OnFrame函数,定义如下:

void AdaptedVideoTrackSource::OnFrame(const webrtc::VideoFrame& frame) {
  rtc::scoped_refptr buffer(
      frame.video_frame_buffer());
  /* Note that this is a "best effort" approach to
     wants.rotation_applied; apply_rotation_ can change from false to
     true between the check of apply_rotation() and the call to
     broadcaster_.OnFrame(), in which case we generate a frame with
     pending rotation despite some sink with wants.rotation_applied ==
     true was just added. The VideoBroadcaster enforces
     synchronization for us in this case, by not passing the frame on
     to sinks which don't want it. */
  if (apply_rotation() && frame.rotation() != webrtc::kVideoRotation_0 &&
      buffer->type() == webrtc::VideoFrameBuffer::Type::kI420) {
    /* Apply pending rotation. */
    broadcaster_.OnFrame(webrtc::VideoFrame(
        webrtc::I420Buffer::Rotate(*buffer->GetI420(), frame.rotation()),
        webrtc::kVideoRotation_0, frame.timestamp_us()));
  } else {
    broadcaster_.OnFrame(frame);
  }
}

其中broadcaster_是一个VideoBroadcaster对象,在OnFrame函数中通过for循环分发到注册好的sink对象,如下所示:

void VideoBroadcaster::OnFrame(const webrtc::VideoFrame& frame) {
  rtc::CritScope cs(&sinks_and_wants_lock_);
  for (auto& sink_pair : sink_pairs()) {
    if (sink_pair.wants.rotation_applied &&
        frame.rotation() != webrtc::kVideoRotation_0) {
      // Calls to OnFrame are not synchronized with changes to the sink wants.
      // When rotation_applied is set to true, one or a few frames may get here
      // with rotation still pending. Protect sinks that don't expect any
      // pending rotation.
      RTC_LOG(LS_VERBOSE) << "Discarding frame with unexpected rotation.";
      continue;
    }
    if (sink_pair.wants.black_frames) {
      sink_pair.sink->OnFrame(webrtc::VideoFrame(
          GetBlackFrameBuffer(frame.width(), frame.height()), frame.rotation(),
          frame.timestamp_us()));
    } else {
      sink_pair.sink->OnFrame(frame);
    }
  }
}

如果sink是一个VideoStreamEncoder对象,则是分发给编码器,如果sink是一个VideoSinkWrapper对象,则是分发给java层的VideoSink对象,比如用于本地预览的SurfaceViewRenderer对象。

VideoSinkWrapper的OnFrame定义如下:

void VideoSinkWrapper::OnFrame(const VideoFrame& frame) {
  JNIEnv* jni = AttachCurrentThreadIfNeeded();
  Java_VideoSink_onFrame(jni, j_sink_, NativeToJavaFrame(jni, frame));
}

Java_VideoSink_onFrame完成了从native层到java层VideoSink对象的回调。调用原理就是根据java类方法的签名获取到jmethodID,然后再用jni提供的接口调用jmethodID对应的java类方法。不过在源码里用Java_VideoSink_onFrame这个名称搜索不到源码,因为这个函数的代码是自动生成的,在c/c++开发中,一般有两种方式来完成代码自动生成工作:

宏定义,这种方式实际上是在编译代码时由预处理器来完成的,在c/c++开发中属于比较常见的一种方式,不过这种方式不够灵活,对于复杂点的代码就有点力不从心了。

工具,这种方式实际上是在编译前用其他工具根据写好的配置信息(比如IDL)来生成代码,这种方式灵活性要好很多,写配置信息比写一堆代码简单多了,像那种没啥技术含量的,重复度高的代码应该都能做到,比如aidl、probuffer、gsoap都属于这种。

webrtc就是采用第二种方式,编译后在顶层源码目录下搜索一下,在./out/Debug/gen/sdk/android/generated_video_jni/jni/VideoSink_jni.h这个文件中。

这个文件的内容如下所示:

// Copyright 2019 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

// This file is autogenerated by
//     base/android/jni_generator/jni_generator.py
// For
//     org/webrtc/VideoSink

#ifndef org_webrtc_VideoSink_JNI
#define org_webrtc_VideoSink_JNI

#include 

#include "../../../../../../../sdk/android/src/jni/jni_generator_helper.h"

// Step 1: forward declarations.
JNI_REGISTRATION_EXPORT extern const char kClassPath_org_webrtc_VideoSink[];
const char kClassPath_org_webrtc_VideoSink[] = "org/webrtc/VideoSink";

// Leaking this jclass as we cannot use LazyInstance from some threads.
JNI_REGISTRATION_EXPORT base::subtle::AtomicWord g_org_webrtc_VideoSink_clazz =
    0;
#ifndef org_webrtc_VideoSink_clazz_defined
#define org_webrtc_VideoSink_clazz_defined
inline jclass org_webrtc_VideoSink_clazz(JNIEnv* env) {
  return base::android::LazyGetClass(env, kClassPath_org_webrtc_VideoSink,
      &g_org_webrtc_VideoSink_clazz);
}
#endif

// Step 2: method stubs.

static base::subtle::AtomicWord g_org_webrtc_VideoSink_onFrame = 0;
static void Java_VideoSink_onFrame(JNIEnv* env, const
    base::android::JavaRef& obj, const base::android::JavaRef&
    frame) {
  CHECK_CLAZZ(env, obj.obj(),
      org_webrtc_VideoSink_clazz(env));
  jmethodID method_id =
      base::android::MethodID::LazyGet<
      base::android::MethodID::TYPE_INSTANCE>(
      env, org_webrtc_VideoSink_clazz(env),
      "onFrame",
"("
"Lorg/webrtc/VideoFrame;"
")"
"V",
      &g_org_webrtc_VideoSink_onFrame);

     env->CallVoidMethod(obj.obj(),
          method_id, frame.obj());
  jni_generator::CheckException(env);
}

#endif  // org_webrtc_VideoSink_JNI

从文件的内容可以看出,这个文件是用jni_generator.py脚本生成的。

分发流程

视频分发主要类的对象的创建和关联过程是由PeerConnectionClient的以下代码完成的:

  private VideoTrack createVideoTrack(VideoCapturer capturer) {
    videoSource = factory.createVideoSource(capturer);
    capturer.startCapture(videoWidth, videoHeight, videoFps);
    localVideoTrack = factory.createVideoTrack(VIDEO_TRACK_ID, videoSource);
    localVideoTrack.setEnabled(renderVideo);
    localVideoTrack.addSink(localRender);
    return localVideoTrack;
  }

创建流程和简化后的主要对象如下所示:


image.png
image.png

黄色来表示java层类,用紫色来表示native层类。
通过以上流程,在java层和native层创建了source、track、sink等主要的对象,并将这些对象关联起来,流程主要如下:

step2~step13,根据VideoCapturer对象创建VideoSource,VideoCapturer和VideoSource通过AndroidVideoTrackSourceObserver关联起来,AndroidVideoTrackSourceObserver对象和VideoSource对象的nativeSource成员对应的都是native层的AndroidVideoTrackSource对象,后面来自VideoCapturer对象的图像数据就可以经过AndroidVideoTrackSourceObserver传递给native层的AndroidVideoTrackSource对象,最后通过VideoBroadcaster分发到sink对象上。

step13~step18,根据VideoSource对象创建VideoTrack,主要是创建了native层的VideoTrack对象,video_source_成员就是上面流程中创建的AndroidVideoTrackSource对象,这样track就和source关联起来,往track添加sink其实是添加到source上。

step19~step25,往VideoTrack添加VideoSink对象,流程最后其实是把创建的native层的VideoSinkWrapper对象添加到AndroidVideoTrackSource的broadcaster_成员中,VideoSinkWrapper对象的j_sink_成员引用的是是java层的VideoSink对象,这样就可以从native层回调到java层。可以往VideoTrack中添加多个VideoSink对象,demo中添加了一个用于本地预览的SurfaceViewRenderer对象(真正添加到track中其实是SurfaceViewRenderer的代理对象ProxyVideoSink),而用于编码传输的VideoStreamEncoder对象是在建立RTP传输会话时由PeerConnection来创建和添加的,因此需要将VideoTrack信息添加到PeerConnection对象中,相关代码如下所示:

    mediaStream = factory.createLocalMediaStream("ARDAMS");
    if (videoCallEnabled) {
      mediaStream.addTrack(createVideoTrack(videoCapturer));
    }
    mediaStream.addTrack(createAudioTrack());
    peerConnection.addStream(mediaStream);

可见是先把VideoTrack对象添加到MediaStream中,再将MediaStream添加到PeerConnection对象中。

你可能感兴趣的:(WebRTC Native 源码1:相机采集实现分析)