美颜相机(1)-GPUImage/openGL相机预览性能问题分析及优化

  • 前言

目前移动端相机sdk主体功能基本已开发完毕,产品上线也有很长时间了,回顾开发周期,美颜相机设计的知识点比较多,基本功能实现比较简单,网上很多现成开源工程.但是感觉要做一个性能达标产品场景,各类业务功能符合需求且易于维护的的美颜相机也绝非易事.开辟此系列博客目的在于汇总下美颜相机开发过程中所遇到的技术点和难点,大部分解决方案网上都能找到,不过也补充了下我自己实践效果和补充

  • 美颜相机业务功能

美颜相机(1)-GPUImage/openGL相机预览性能问题分析及优化_第1张图片

目前已实现功能如上图,主体技术框架基于GPUImage实现

笔者会不定期把上述功能有价值的技术点做博客探讨分析,也欢迎读者留言想要分析的技术点

  • 抛出问题

由于gpuimage默认为ios方案,Android方案仅是另外团队参考ios实现,目前已停止维护,在相机集成开发的第一个问题便发现gpuimage的相机纹理上载速度在中低端机是比较慢的,远远达不到720@30的需求速度,会使得画面严重掉帧

在GPUImageRenderer文件中我们发现onPreviewFrame函数 

美颜相机(1)-GPUImage/openGL相机预览性能问题分析及优化_第2张图片

这个函数的目的为将相机数据data转换为纹理数据并返回glTextureId给gpu,函数做了两个功能处理:

  1. 将相机数据转换为rgba数据
  2. 将rgab数据上载到纹理glTextureId

继续跟踪GPUImageNativeLibrary_YUVtoRBGA函数我们发现gpu是通过jni实现了nv21数据转换为rgba数据美颜相机(1)-GPUImage/openGL相机预览性能问题分析及优化_第3张图片

这个函数遍历nv21数据然后根据yuv与rgb换算公式逐字节转换

回到转换函数,我们发现OpenGlUtils.loadTexture其实就是把rgba数据加载到纹理

美颜相机(1)-GPUImage/openGL相机预览性能问题分析及优化_第4张图片

我们发现gpuimage的处理就是没有纹理的时候创建纹理 使用glTexImage2D填充数据,如果是纹理已存在则使用glTexSubImage2D更新纹理数据,避免纹理空间重复申请

此处有非常大的优化空间,当然在笔者开发过程中也尝试了不同程度的优化,其一是采用libyuv进行格式转换,libyuv采用simd指令neon指令集加速,性能提升速度是非常明显的

  • 分析

到这里我们已经知道gpu纹理上传的大致流程和性能瓶颈就在此处,一处是nv21数据转换rgba数据性能瓶颈,一处是rgba数据上载到纹理id

  1. nv21转rgba
    逐字节遍历和按公式做色彩空间必然是效率最低的,性能瓶颈在此方法消耗大量的cpu资源,只要图像尺寸不断偏高,计算复杂度是非常高的,所以我们第一反应是做算法优化,目前常规的是做simd指令集加速,android端即接入neon加速技术,我们可以自己手工优化上图for循环,降低复杂度,或者更好的方案是直接集成libyuv库进行转换,效率会提升非常大,特别在高分辨率情况
  2. rgba纹理上载
    opengl纹理上载方案是一个比较经典的性能优化问题,直接rgba数据上传,会造成glTexSubImage2D会等待较长的cpu时间,根据opengl3.0编程指南,动态大量的纹理数据刷新建议用pbo方式进行上传,pbo方式会在内存和显存之间开辟一个内存映射,实际数据拷贝由dma镜像拷贝,这部分工作是gpu直接完成,可以节省大量cpu时间.当然pbo还需要结合对内对齐和双缓冲机制才能发挥最佳性能 pbo纹理方式笔记后期会开专题分析

分析完上述两个问题,笔记在开发实际过程中确实性能提高不少,至少大部分中等性能机器目前已经可以实现720p30性能了,但是发现低端机还是达不到,经过笔记继续研究将方案进一步优化,发现性能瓶颈还存在其他几个地方:

  1. nv21数据直接上传纹理
    这样做有几个好处,一来可以避免cpu做色彩转换,将此工作在gpu进行,gpu对数据转换有这个天然优势,   二来大大降低了内存和显存数据量交换,rgba是32位数据 nv21数据为12位数据,数据量传输节省1倍以上
  2. 过度绘制问题
    经过实测发现,gpuiamge布局对性能也会影响很大,开启过度绘制检测发现,如果布局设置了背景色或者层级叠加则会在低端机性能影响较大,所以需要根据各自项目排查过度绘制问题,详见过度绘制检查方法
  3. gpu滤镜数量性能瓶颈
    前面已提到将数据转换工作再gpu处理,那实际项目可能滤镜功能较多,gpu性能也会遇到瓶颈,故这里就设计到离屏渲染ebo和共享纹理多线程处理(后面博客深入分析)
  • 方案

回到我们的主题上来,相机预览纹理数据上载最佳方案为nv21数据直接上载,那具体如何做,这里我们可以基于opengl亮度通道数据去把数据先加载入gpu,然后在编写glsl代码把nv21数据转换rgba数据即可.

public static int loadLuminanceTexture(final ByteBuffer byteBuffer,final int width,final int height, final int usedTexId, int glFormat){
        int textures[] = new int[1];
        if (usedTexId == NO_TEXTURE) {
            GLES20.glGenTextures(1, textures, 0);
            GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, textures[0]);
            GLES20.glTexParameterf(GLES20.GL_TEXTURE_2D,
                    GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_LINEAR);
            GLES20.glTexParameterf(GLES20.GL_TEXTURE_2D,
                    GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_LINEAR);
            GLES20.glTexParameterf(GLES20.GL_TEXTURE_2D,
                    GLES20.GL_TEXTURE_WRAP_S, GLES20.GL_CLAMP_TO_EDGE);
            GLES20.glTexParameterf(GLES20.GL_TEXTURE_2D,
                    GLES20.GL_TEXTURE_WRAP_T, GLES20.GL_CLAMP_TO_EDGE);
            GLES20.glTexImage2D(
                    GLES20.GL_TEXTURE_2D, 0,
                    glFormat, width, height, 0,
                    glFormat, GLES20.GL_UNSIGNED_BYTE, byteBuffer);
        } else {
            GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, usedTexId);
            GLES20.glTexSubImage2D(
                    GLES20.GL_TEXTURE_2D, 0,
                    0,0, width, height,
                    glFormat,
                    GLES20.GL_UNSIGNED_BYTE, byteBuffer);
            textures[0] = usedTexId;
        }
        return textures[0];
    }

这里我封装了一个简单上载代码,只需要将nv21片段数据传入然后指定格式即可

public void loadNV21ByteBuffer(ByteBuffer yByteBuffer, ByteBuffer uvByteBuffer, int width, int height) {
        if (isPush){
            Log.w(TAG, "loadNV21ByteBuffer: isPush false");
            return;
        }
        isPush = true;
        runOnDraw(new Runnable() {
            @Override
            public void run() {
                samplerYTexture = OpenGlUtils.loadLuminanceTexture(yByteBuffer, width, height,
                        samplerYTexture, GLES20.GL_LUMINANCE);
                samplerUVTexture = OpenGlUtils.loadLuminanceTexture(uvByteBuffer, width / 2, height / 2,
                        samplerUVTexture, GLES20.GL_LUMINANCE_ALPHA);

                isPush = false;
            }
        });
    }

调用方式如上,这样我们nv21数据就已经上载到gpu了,接下来就是gpu换算

void nv12ToRGB(){
    vec3 yuv = vec3(
    1.1643 * (texture2D(yTexture, textureCoordinate).r - 0.0625),
    texture2D(uvTexture, textureCoordinate).r - 0.5,
    texture2D(uvTexture, textureCoordinate).a - 0.5
    );
    vec3 rgb = yuv * yuv2rgb;
    gl_FragColor = vec4(rgb, 1.0);
}

void main() {
    if(blankMode == 0){
        gl_FragColor = vec4(0.0,0.0,0.0,1.0);
    }
    else{
        if(yuvType == 0){
            I420ToRGB();
        }else if(yuvType == 1){
            nv12ToRGB();
        }else{
            nv21ToRGB();
        }
    }
}

这里就已经把nv21数据在gpu转换为rgba数据了 返回的纹理id需要自己绑定到gpugroupfiter做数据源输入进行下一步滤镜处理

  • 完整代码
package com.litalk.media.core.filter;

import android.content.Context;
import android.opengl.GLES20;
import android.util.Log;

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.FloatBuffer;

import jp.co.cyberagent.android.gpuimage.filter.GPUImageFilter;
import jp.co.cyberagent.android.gpuimage.util.OpenGlUtils;

public class GPUImageYUVFilter extends GPUImageFilter {
    private final String TAG = GPUImageYUVFilter.class.getSimpleName();
    public static final String fragmentShaderCode = "precision mediump float;" +
            "uniform sampler2D yTexture;" +
            "uniform sampler2D uTexture;" +
            "uniform sampler2D vTexture;" +
            "uniform sampler2D uvTexture;" +
            "uniform int yuvType;" +
            "varying vec2 textureCoordinate;" +
            "uniform sampler2D inputImageTexture;" +
            "void main() {" +
            "  vec4 c = vec4((texture2D(yTexture, textureCoordinate).r - 0.0627) * 1.1643);" +
            "  vec4 U; vec4 V;" +
            "  if (yuvType == 0){" +
            "    U = vec4(texture2D(uTexture, textureCoordinate).r - 0.5);" +
            "    V = vec4(texture2D(vTexture, textureCoordinate).r - 0.5);" +
            "  } else if (yuvType == 1){" +
            "    U = vec4(texture2D(uvTexture, textureCoordinate).r - 0.5);" +
            "    V = vec4(texture2D(uvTexture, textureCoordinate).a - 0.5);" +
            "  } else {" +
            "    U = vec4(texture2D(uvTexture, textureCoordinate).a - 0.5);" +
            "    V = vec4(texture2D(uvTexture, textureCoordinate).r - 0.5);" +
            "  } " +
            "  c += V * vec4(1.596, -0.813, 0, 0);" +
            "  c += U * vec4(0, -0.392, 2.017, 0);" +
            "  c.a = 1.0;" +
            "  gl_FragColor = c;" +
            "}";

    private String FRAGMENT_SHADER_NAME = "shaders/fragment_yuv2rgb.glsl";

    private YUVType yuvType;

    private int yuvTypeHandle;
    private int blankModeHandle;
    private int yTextureHandle;
    private int uTextureHandle;
    private int vTextureHandle;
    private int uvTextureHandle;

    private int samplerYTexture = OpenGlUtils.NO_TEXTURE;
    private int samplerUTexture = OpenGlUtils.NO_TEXTURE;
    private int samplerVTexture = OpenGlUtils.NO_TEXTURE;
    private int samplerUVTexture = OpenGlUtils.NO_TEXTURE;

    public enum YUVType {
        I420,
        NV12,
        NV21
    }

    volatile boolean isPush = false;
    Object nv21BufferLock = new Object();

    //0:i420 1:nv12 2:nv21
    public GPUImageYUVFilter(Context appContext, YUVType yuvType) {
        super(NO_FILTER_VERTEX_SHADER, fragmentShaderCode);

        //替换Shader
        String fragmentShaderCode = null;
        try {
            fragmentShaderCode = GPUImageFilter.readShaderFileFromAssets(appContext, FRAGMENT_SHADER_NAME);
        } catch (IOException e) {
            e.printStackTrace();
        }
        setFragmentShader(fragmentShaderCode);

        this.yuvType = yuvType;
    }

    @Override
    public void onInit() {
        super.onInit();

        blankModeHandle = GLES20.glGetUniformLocation(getProgram(), "blankMode");
        yuvTypeHandle = GLES20.glGetUniformLocation(getProgram(), "yuvType");
        yTextureHandle = GLES20.glGetUniformLocation(getProgram(), "yTexture");
        uTextureHandle = GLES20.glGetUniformLocation(getProgram(), "uTexture");
        vTextureHandle = GLES20.glGetUniformLocation(getProgram(), "vTexture");
        uvTextureHandle = GLES20.glGetUniformLocation(getProgram(), "uvTexture");

        int type = 0;
        switch (yuvType) {
            case I420:
                type = 0;
                break;
            case NV12:
                type = 1;
                break;
            case NV21:
                type = 2;
                break;
            default:
                break;
        }
        setInteger(yuvTypeHandle, type);
    }

    public void loadNV21Bytes(byte[] nv21Bytes, int width, int height) {
        //节省一次copy
        ByteBuffer yByteBuffer = ByteBuffer.wrap(nv21Bytes, 0, width * height);
        ByteBuffer uvByteBuffer = ByteBuffer.allocate(width * height >> 1);
        System.arraycopy(nv21Bytes,width * height,uvByteBuffer.array(),0,uvByteBuffer.capacity());

        //测试
//        Bitmap bitmap = Bitmap.createBitmap(width,height, Bitmap.Config.ARGB_8888);
//        int ret = VideoConvertNative.ltNV21ToBitmap(yByteBuffer,uvByteBuffer,bitmap);

        loadNV21ByteBuffer(yByteBuffer, uvByteBuffer, width, height);
    }

    public void loadNV21ByteBuffer(ByteBuffer yByteBuffer, ByteBuffer uvByteBuffer, int width, int height) {
        if (isPush){
            Log.w(TAG, "loadNV21ByteBuffer: isPush false");
            return;
        }
        isPush = true;
        runOnDraw(new Runnable() {
            @Override
            public void run() {
                samplerYTexture = OpenGlUtils.loadLuminanceTexture(yByteBuffer, width, height,
                        samplerYTexture, GLES20.GL_LUMINANCE);
                samplerUVTexture = OpenGlUtils.loadLuminanceTexture(uvByteBuffer, width / 2, height / 2,
                        samplerUVTexture, GLES20.GL_LUMINANCE_ALPHA);

                isPush = false;
            }
        });
    }

    public void syncLoadNV21ByteBuffer(ByteBuffer yByteBuffer, ByteBuffer uvByteBuffer, int width, int height) {
        runOnDraw(new Runnable() {
            @Override
            public void run() {
                samplerYTexture = OpenGlUtils.loadLuminanceTexture(yByteBuffer, width, height,
                        samplerYTexture, GLES20.GL_LUMINANCE);
                samplerUVTexture = OpenGlUtils.loadLuminanceTexture(uvByteBuffer, width / 2, height / 2,
                        samplerUVTexture, GLES20.GL_LUMINANCE_ALPHA);

                synchronized (nv21BufferLock){
                    nv21BufferLock.notify();
                }
            }
        });
    }

    public void syncWaitNV21BufferLoad(){
        synchronized (nv21BufferLock){
            try {
                nv21BufferLock.wait();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

    public void loadI420Buffer(ByteBuffer yByteBuffer, ByteBuffer uByteBuffer, ByteBuffer vByteBuffer, int width, int height) {
        if (isPush){
            Log.w(TAG, "loadI420Buffer: isPush false");
            return;
        }
        isPush = true;
        runOnDraw(new Runnable() {
            @Override
            public void run() {
                samplerYTexture = OpenGlUtils.loadLuminanceTexture(yByteBuffer, width, height,
                        samplerYTexture, GLES20.GL_LUMINANCE);
                samplerUTexture = OpenGlUtils.loadLuminanceTexture(uByteBuffer, width / 2, height / 2,
                        samplerUTexture, GLES20.GL_LUMINANCE);
                samplerVTexture = OpenGlUtils.loadLuminanceTexture(vByteBuffer, width / 2, height / 2,
                        samplerVTexture, GLES20.GL_LUMINANCE);
                isPush = false;
            }
        });
    }

    @Override
    protected void onDrawArraysPre() {
        super.onDrawArraysPre();

        //绑定纹理
        if (yuvType == YUVType.I420){
            if (samplerYTexture != OpenGlUtils.NO_TEXTURE){
                GLES20.glUniform1i(blankModeHandle, 1);

                GLES20.glActiveTexture(GLES20.GL_TEXTURE1); // 设置当前活动的纹理单元为纹理单元0
                GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, samplerYTexture);
                GLES20.glUniform1i(yTextureHandle, 1);

                GLES20.glActiveTexture(GLES20.GL_TEXTURE2);
                GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, samplerUTexture);
                GLES20.glUniform1i(uTextureHandle, 2);

                GLES20.glActiveTexture(GLES20.GL_TEXTURE3);
                GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, samplerVTexture);
                GLES20.glUniform1i(vTextureHandle, 3);
            }
            else {
                GLES20.glUniform1i(blankModeHandle, 0);
            }
        }
        else {
            if (samplerYTexture != OpenGlUtils.NO_TEXTURE) {
                GLES20.glUniform1i(blankModeHandle, 1);

                GLES20.glActiveTexture(GLES20.GL_TEXTURE1); // 设置当前活动的纹理单元为纹理单元0
                GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, samplerYTexture);
                GLES20.glUniform1i(yTextureHandle, 1);

                GLES20.glActiveTexture(GLES20.GL_TEXTURE2);
                GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, samplerUVTexture);
                GLES20.glUniform1i(uvTextureHandle, 2);
            }
            else {
                GLES20.glUniform1i(blankModeHandle, 0);
            }
        }
    }

    @Override
    protected void onDrawArraysAfter(int textureId, FloatBuffer textureBuffer) {
        super.onDrawArraysAfter(textureId, textureBuffer);
//        checkGLError(TAG, "onDrawArraysAfter");
    }
}
//#version 120 片段代码

precision mediump float;
varying vec2 textureCoordinate;

uniform int blankMode;
uniform int yuvType;
uniform sampler2D yTexture;
uniform sampler2D uTexture;
uniform sampler2D vTexture;
uniform sampler2D uvTexture;

const mat3 yuv2rgb = mat3(
1, 0, 1.2802,
1, -0.214821, -0.380589,
1, 2.127982, 0
);

void I420ToRGB(){
    vec3 yuv = vec3(
    1.1643 * (texture2D(yTexture, textureCoordinate).r - 0.0625),
    texture2D(uTexture, textureCoordinate).r - 0.5,
    texture2D(vTexture, textureCoordinate).r - 0.5
    );
    vec3 rgb = yuv * yuv2rgb;
    gl_FragColor = vec4(rgb, 1.0);
}

void nv12ToRGB(){
    vec3 yuv = vec3(
    1.1643 * (texture2D(yTexture, textureCoordinate).r - 0.0625),
    texture2D(uvTexture, textureCoordinate).r - 0.5,
    texture2D(uvTexture, textureCoordinate).a - 0.5
    );
    vec3 rgb = yuv * yuv2rgb;
    gl_FragColor = vec4(rgb, 1.0);
}

void nv21ToRGB(){
    vec3 yuv = vec3(
    1.1643 * (texture2D(yTexture, textureCoordinate).r - 0.0625),
    texture2D(uvTexture, textureCoordinate).a - 0.5,
    texture2D(uvTexture, textureCoordinate).r - 0.5
    );
    vec3 rgb = yuv * yuv2rgb;
    gl_FragColor = vec4(rgb, 1.0);
}

void main() {
    if(blankMode == 0){
        gl_FragColor = vec4(0.0,0.0,0.0,1.0);
    }
    else{
        if(yuvType == 0){
            I420ToRGB();
        }else if(yuvType == 1){
            nv12ToRGB();
        }else{
            nv21ToRGB();
        }
    }
}
//顶点代码
public static final String NO_FILTER_VERTEX_SHADER = "" +
            "attribute vec4 position;\n" +
            "attribute vec4 inputTextureCoordinate;\n" +
            " \n" +
            "varying vec2 textureCoordinate;\n" +
            " \n" +
            "void main()\n" +
            "{\n" +
            "    gl_Position = position;\n" +
            "    textureCoordinate = inputTextureCoordinate.xy;\n" +
            "}";

你可能感兴趣的:(美颜相机SDK,opengl,android)