目录
在我们的应用程序中添加模型
在捕获的视频帧上运行目标检测
绘制边界框
实际应用
下一步?
总目录
将ONNX对象检测模型转换为iOS Core ML(一)
在这里,我们将YOLO v2模型的Core ML版本与我们的iOS应用程序的视频流捕获功能结合在一起,并向该应用程序添加对象检测。
本系列假定您熟悉Python、Conda和ONNX,并且具有使用Xcode开发iOS应用程序的经验。我们将使用macOS 10.15 +、Xcode 11.7+和iOS 13+运行代码。
我们需要做的第一件事是复制yolov2-pipeline.mlmodel(我们之前保存到我们的ObjectDetectionDemo iOS应用程序的Models文件夹中)并将其添加到项目文件中:
要使用我们的模型,我们需要对上一篇文章的代码进行一些更改。
该类的startCapture方法VideoCapture需要接受并存储Vision框架请求参数VNRequest:
public func startCapture(_ visionRequest: VNRequest?) {
if visionRequest != nil {
self.visionRequests = [visionRequest!]
} else {
self.visionRequests = []
}
if !captureSession.isRunning {
captureSession.startRunning()
}
}
现在,使用以下createObjectDetectionVisionRequest方法添加ObjectDetection类:
createObjectDetectionVisionRequest method:
public func createObjectDetectionVisionRequest() -> VNRequest? {
do {
let model = yolov2_pipeline().model
let visionModel = try VNCoreMLModel(for: model)
let objectRecognition = VNCoreMLRequest(model: visionModel, completionHandler: { (request, error) in
DispatchQueue.main.async(execute: {
if let results = request.results {
self.processVisionRequestResults(results)
}
})
})
objectRecognition.imageCropAndScaleOption = .scaleFill
return objectRecognition
} catch let error as NSError {
print("Model loading error: \(error)")
return nil
}
}
请注意,我们使用imageCropAndScaleOption的.scaleFill值。当从捕获的480 x 640尺寸缩放到模型所需的416 x 416尺寸时,这会给图像带来很小的失真。它不会对结果产生重大影响。另一方面,它将使进一步的缩放操作更加简单。
引入的代码需要在主ViewController类中使用:
self.videoCapture = VideoCapture(self.cameraView.layer)
self.objectDetection = ObjectDetection(self.cameraView.layer, videoFrameSize: self.videoCapture.getCaptureFrameSize())
let visionRequest = self.objectDetection.createObjectDetectionVisionRequest()
self.videoCapture.startCapture(visionRequest)
有了这样的框架,我们可以在每次捕获视频帧时执行visionRequest中定义的逻辑:
public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
let frameOrientation: CGImagePropertyOrientation = .up
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: frameOrientation, options: [:])
do {
try imageRequestHandler.perform(self.visionRequests)
} catch {
print(error)
}
}
通过上述更改,将yolov2_pipeline模型用于每个捕获的帧,然后将检测结果传递给ObjectDetection.processVisionRequestResults方法。由于我们之前已经实现了管道模型的model_decoder和model_nms组件,因此iOS端不需要解码逻辑。我们只需阅读最可能的观察值(objectObservation),然后在捕获的帧上绘制相应的盒子(调用createBoundingBoxLayer和addSublayer方法):
private func processVisionRequestResults(_ results: [Any]) {
CATransaction.begin()
CATransaction.setValue(kCFBooleanTrue, forKey: kCATransactionDisableActions)
self.objectDetectionLayer.sublayers = nil
for observation in results where observation is VNRecognizedObjectObservation {
guard let objectObservation = observation as? VNRecognizedObjectObservation else {
continue
}
let topLabelObservation = objectObservation.labels[0]
let objectBounds = VNImageRectForNormalizedRect(
objectObservation.boundingBox,
Int(self.objectDetectionLayer.bounds.width), Int(self.objectDetectionLayer.bounds.height))
let bbLayer = self.createBoundingBoxLayer(objectBounds, identifier: topLabelObservation.identifier, confidence: topLabelObservation.confidence)
self.objectDetectionLayer.addSublayer(bbLayer)
}
CATransaction.commit()
}
绘图盒子相对简单,并不特定于我们应用程序的机器学习部分。这里的主要困难是使用适当的比例尺和坐标系:“0,0”表示模型的左上角,但对于iOS和Vision框架则表示左下角。
ObjectDetection类的两个方法将处理此问题:setupObjectDetectionLayer和createBoundingBoxLayer。前者为盒子准备图层:
private func setupObjectDetectionLayer(_ viewLayer: CALayer, _ videoFrameSize: CGSize) {
self.objectDetectionLayer = CALayer()
self.objectDetectionLayer.name = "ObjectDetectionLayer"
self.objectDetectionLayer.bounds = CGRect(x: 0.0,
y: 0.0,
width: videoFrameSize.width,
height: videoFrameSize.height)
self.objectDetectionLayer.position = CGPoint(x: viewLayer.bounds.midX, y: viewLayer.bounds.midY)
viewLayer.addSublayer(self.objectDetectionLayer)
let bounds = viewLayer.bounds
let scale = fmax(bounds.size.width / videoFrameSize.width, bounds.size.height / videoFrameSize.height)
CATransaction.begin()
CATransaction.setValue(kCFBooleanTrue, forKey: kCATransactionDisableActions)
self.objectDetectionLayer.setAffineTransform(CGAffineTransform(scaleX: scale, y: -scale))
self.objectDetectionLayer.position = CGPoint(x: bounds.midX, y: bounds.midY)
CATransaction.commit()
}
该createBoundingBoxLayer方法创建要绘制的形状:
private func createBoundingBoxLayer(_ bounds: CGRect, identifier: String, confidence: VNConfidence) -> CALayer {
let path = UIBezierPath(rect: bounds)
let boxLayer = CAShapeLayer()
boxLayer.path = path.cgPath
boxLayer.strokeColor = UIColor.red.cgColor
boxLayer.lineWidth = 2
boxLayer.fillColor = CGColor(colorSpace: CGColorSpaceCreateDeviceRGB(), components: [0.0, 0.0, 0.0, 0.0])
boxLayer.bounds = bounds
boxLayer.position = CGPoint(x: bounds.midX, y: bounds.midY)
boxLayer.name = "Detected Object Box"
boxLayer.backgroundColor = CGColor(colorSpace: CGColorSpaceCreateDeviceRGB(), components: [0.5, 0.5, 0.2, 0.3])
boxLayer.cornerRadius = 6
let textLayer = CATextLayer()
textLayer.name = "Detected Object Label"
textLayer.string = String(format: "\(identifier)\n(%.2f)", confidence)
textLayer.fontSize = CGFloat(16.0)
textLayer.bounds = CGRect(x: 0, y: 0, width: bounds.size.width - 10, height: bounds.size.height - 10)
textLayer.position = CGPoint(x: bounds.midX, y: bounds.midY)
textLayer.alignmentMode = .center
textLayer.foregroundColor = UIColor.red.cgColor
textLayer.contentsScale = 2.0
textLayer.setAffineTransform(CGAffineTransform(scaleX: 1.0, y: -1.0))
boxLayer.addSublayer(textLayer)
return boxLayer
}
恭喜–我们有一个工作中的物体检测应用程序,可以在现实生活中进行测试,或者-在我们的情况下-使用Pixels门户提供的免费剪辑。
请注意,至少对于某些对象类(例如,Person类),YOLO v2模型对图像方向非常敏感。如果在处理之前旋转框架,则其检测结果会变差。
可以使用“打开图像”数据集中的任何示例图像对此进行说明。
完全相同的图像——和两个非常不同的结果。我们需要牢记这一点,以确保送入模型的图像具有正确的方向。
就是这样!这是一段漫长的旅程,但我们终于走到了尽头。我们有一个运行中的iOS应用程序,用于实时视频流中的对象检测。
如果你正试图决定下一步做什么,你的局限只是你的想象力。考虑这些问题:
https://www.codeproject.com/Articles/5286805/Building-an-Object-Detection-iOS-App-with-YOLO-Cor