CoreML 与Vision使用: iOS 机器学习集成

本文运行环境为XCode9 beta2, iOS 11 beta2

1. CoreML

CoreML是苹果在WWDC2017 新发布的Framework，方便了Machine Learning在苹果自家平台的接接入与使用，同时苹果提供了Python的coremltools，方便将各大开源模型训练工具的现有模型转化为MLModel。
模型训练
- Caffe
- Karas
- XGBoost
- scikit-learn

2. vision

vision是一个新的，强大的，易于使用的框架，是苹果于WWDC 2017上针对CoreML使用所提出的新Framework，能快速有效的用于面部检测、面部特征点、文字、矩形、条形码和物体。

3. 集成机器学习

我们将构建一个通过AVCaptureSession捕获到当前图像，并通过MLModel分析，获取到与图像最匹配的物品名字。
界面大概是

那么我们就正式开始

1. 创建`Single`工程

2. 可以从苹果的“机器学习”页面下载`Inception v3`。

3. 在`Info.plist`中添加`Privacy - Camera Usage Description`

4. 代码编写

首先我们创建一个AVCaptureSession用来获取摄像头的图像


    lazy var avSession: AVCaptureSession = AVCaptureSession()
    lazy var preViewLayer: AVCaptureVideoPreviewLayer = {
    return AVCaptureVideoPreviewLayer(session: self.avSession)
    }()

    override func viewDidLoad() {
        super.viewDidLoad()
    
        setupAVSession()
    
        preViewLayer.frame = view.bounds
        self.view.layer.insertSublayer(preViewLayer, at: 0)
    
        avSession.startRunning()
    }


    fileprivate func setupAVSession() {
    
        guard let device = AVCaptureDevice.default(for: .video) else {
            fatalError("this application cannot be run on simulator")
        }
    
        do {
        
            let input = try AVCaptureDeviceInput(device: device)
            avSession.addInput(input)
        
            let output = AVCaptureVideoDataOutput()
            avSession.addOutput(output)
        
            let queue = DispatchQueue(label: "video queue", qos: .userInteractive)
            output.setSampleBufferDelegate(self, queue: queue)
        } catch let error {
        
            print(error)
        }
}

实现AVCaptureVideoDataOutputSampleBufferDelegate代理

extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {

    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {

    //FIXME: 添加CoreML相关逻辑
    }
}

为工程引入MLModel
直接拖拽，点击Inceptionv3

可以看到模型的详细信息

添加模型处理代码

lazy var inceptionv3ClassificationRequest: VNCoreMLRequest = {
    // Load the ML model through its generated class and create a Vision request for it.
    do {
        let model = try VNCoreMLModel(for: Inceptionv3().model)
        return VNCoreMLRequest(model: model, completionHandler: self.inceptionv3ClassificationHandler)
    } catch {
        fatalError("can't load Vision ML model: \(error)")
    }
}()
    

extension ViewController {

    func inceptionv3ClassificationHandler(request: VNRequest, error: Error?) {
        guard let observations = request.results as? [VNClassificationObservation]
            else { fatalError("unexpected result type from VNCoreMLRequest") }
    
        guard let best = observations.first
         else { fatalError("can't get best result") }
    
        DispatchQueue.main.async {
            print("Classification: \"\(best.identifier)\" Confidence: \(best.confidence)")
            self.classifyLabel.text = best.identifier
        }
    }
}

传入MLModel参数

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
        return
    }
    
    let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:])
    
    do {
        try handler.perform([inceptionv3ClassificationRequest])
    } catch _ {
        
    }
}

4. 效果展示

至此已完成了机器学习的集成，代码已上传到git

参考资料

WWDC Session 506
WWDC Session 703
WWDC Session 710