In this article, I will walk through the steps how you can easily build your own real-time object recognition application with Tensorflow’s (TF) new Object Detection API and OpenCV in Python 3 (specifically 3.5). The focus will be on the challenges that I faced when building it. You can find the full code on my repo.
And here is also the app in action:
Google has just released their new TensorFlow Object Detection API. The first release contains:
I wanted to lay my hands on this new cool stuff and had some time to build a simple real-time object recognition demo.
First, I pulled the TensorFlow models repo and then had a looked at thenotebook that they released as well. It basically walked through the all steps of using a pre-trained model. In their example, they used the “SSD with Mobilenet” model but you can also download several other pre-trained models on what they call the “Tensorflow detection model zoo”. Those models are, by the way, trained on the COCO dataset and vary depending on the model speed (slow, medium and fast) and model performance (mAP — mean average precision).
What I did next was to run the example. The example is actually well documented. Essentially this is what it does:
Note: Before running the example, be aware to have a look at the setup note. In particular, the protobuf compilation section is important:
# From tensorflow/models/research/ protoc object_detection/protos/*.proto --python_out=.
Without running this command, the example won’t work.
I then took their code and modified it accordingly:
Then, I used OpenCV to connect it with my webcam. There are many examples out there that explain you how you can do it, even the official documentation. So, I won’t dig deeper into it. The more interesting part is the optimization that I did to increase the performance of the application. In my case I looked at good fps — frame per seconds.
Generally, plain vanilla/naive implementation of many OpenCV examples are not really optimal, for example some of the functions in OpenCV are heavily I/O bounded. So I had to come up with various solutions to encounter this:
Note: If you are on Mac OSX like me and you’re using OpenCV 3.1, there might be a chance that OpenCV’s VideoCapture crashes after a while. There is already an issue filed. Switching back to OpenCV 3.0 solved the issue though.
Give me a ❤️ if you liked this post:) Pull the code and try it out yourself. And definitely have a look at the Tensorflow Object Detection API. It’s pretty neat and simple from the first look so far. The next thing I want to try is to train my own dataset with the API and also use the pre-trained models for other applications that I have on my mind. I’m also not fully satisfied with the performance of the application. The fps rate is still not optimal. There are still many bottlenecks in OpenCV that I can’t influence but there are alternatives that I can try out like using WebRTC. This is however web-based. Moreover, I’m thinking to use asynchronous method calls (async) to improve my fps rate. Stay tuned!
Follow me on twitter: @datitran
https://towardsdatascience.com/building-a-real-time-object-recognition-app-with-tensorflow-and-opencv-b7a2b4ebdc32
https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9