Tensorflow + ImageNet Inception-v3 视频图像识别

Tensorflow + ImageNet Inception-v3 摄像头视频图像识别


准备工作

编程环境:python 3.6.1, win7, x64
材料准备

  1. Google基于2012年ImageNet
    Challenge训练的模型:inception-2015-12-05.tgz
  2. 非官方的python库(官方的python3.6版本还没有,如果是更旧一点的python版本可以查找一下官方的pip或者conda包)http://www.lfd.uci.edu/~gohlke/pythonlibs/。在里面下载:

Pillow-4.2.1-cp36-cp36m-win_amd64.whl

VideoCapture-0.9.5-cp36-cp36m-win_amd64.whl

安装
命令行进入到两个包所在目录,安装PIL和VideoCapture

pip install Pillow-4.2.1-cp36-cp36m-win_amd64.whl
pip install VideoCapture-0.9.5-cp36-cp36m-win_amd64.whl

可能出现:
PermissionError:[WinError 5] Access is denied:...
以python模块方法调用pip可以解决:

python -m pip install Pillow-4.2.1-cp36-cp36m-win_amd64.whl
python -m pip install VideoCapture-0.9.5-cp36-cp36m-win_amd64.whl

inception-2015-12-05.tgz解压后放在当前目录的一个新的文件夹下,我这里取名model,如果自己起另外的目录名,脚本中对应的FLAGS调用地址也要做出修改。

model 中一共包含5个文件:

classify_image_graph_def.pb
cropped_panda.jpg
imagenet_2012_challenge_label_map_proto.pbtxt
imagenet_synset_to_human_label_map.txt
LICENSE

在python中测试摄像机的拍照、导出照片到jpg格式文件的功能:

from VideoCapture import Device
cam=Device()
cam.saveSnapshot('image_cap.jpg')

可能会出现以下错误:

Exception: fromstring() has been removed. Please call frombytes() instead.

这个最标准的解决方法应该是更新PIL包??。。。但是。。我的难道不是最新的吗??

Tensorflow + ImageNet Inception-v3 视频图像识别_第1张图片

所以组织上已经决定了,直接改源代码,在VideoCapture包的这个文件里,在报错信息上可以找到:
C:\Anaconda3\lib\site-packages\VideoCapture__init__.py:


......

def getImage(self, timestamp=0, boldfont=0, textpos=default_textpos):

        ......

        if timestamp:
            #text = now()
            text = time.asctime(time.localtime(time.time()))
        buffer, width, height = self.getBuffer()
        if buffer:
            #im = Image.fromstring
            im = Image.frombytes(
                'RGB', (width, height), buffer, 'raw', 'BGR', 0, -1)
            if timestamp:

            ......

            return im

......

上面这段getImage代码的14行原本是im=Image.fromstring(...),改成im=Image.frombytes(...)就行


识别程序

这个识别程序原本是tensorflow用来测试用ImageNet训练完成的模型效果的官方测试文件,原本还包含了下载官方训练模型等函数。现将其用不到的部分剔除,并且加上摄像头拍摄的代码:

classify_image.py

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os.path
import re
from VideoCapture import Device
import numpy as np
import tensorflow as tf
import warnings
warnings.filterwarnings("ignore")

FLAGS = tf.app.flags.FLAGS

# classify_image_graph_def.pb:
#   Binary representation of the GraphDef protocol buffer.
# imagenet_synset_to_human_label_map.txt:
#   Map from synset ID to a human readable string.
# imagenet_2012_challenge_label_map_proto.pbtxt:
#   Text representation of a protocol buffer mapping a label to synset ID.
tf.app.flags.DEFINE_string(
    'model_dir', 'model',
    """Path to classify_image_graph_def.pb, """
    """imagenet_synset_to_human_label_map.txt, and """
    """imagenet_2012_challenge_label_map_proto.pbtxt.""")
#tf.app.flags.DEFINE_string('image_file', '',
#                           """Absolute path to image file.""")
tf.app.flags.DEFINE_integer('num_top_predictions', 5,
                            """Display this many predictions.""")

可以看到程序有2个输入flag:

  • model_dir: 存放imageNet训练结果模型的目录,此处为model目录
  • num_top_predictions: 输出结果的条数,即图片最有可能匹配的对象输出数目

# pylint: disable=line-too-long
DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
# pylint: enable=line-too-long


class NodeLookup(object):
  """Converts integer node ID's to human readable labels."""

  def __init__(self,
               label_lookup_path=None,
               uid_lookup_path=None):
    if not label_lookup_path:
      label_lookup_path = os.path.join(
          FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')
    if not uid_lookup_path:
      uid_lookup_path = os.path.join(
          FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt')
    self.node_lookup = self.load(label_lookup_path, uid_lookup_path)

  def load(self, label_lookup_path, uid_lookup_path):
    """Loads a human readable English name for each softmax node.

    Args:
      label_lookup_path: string UID to integer node ID.
      uid_lookup_path: string UID to human-readable string.

    Returns:
      dict from integer node ID to human-readable string.
    """
    if not tf.gfile.Exists(uid_lookup_path):
      tf.logging.fatal('File does not exist %s', uid_lookup_path)
    if not tf.gfile.Exists(label_lookup_path):
      tf.logging.fatal('File does not exist %s', label_lookup_path)

    # Loads mapping from string UID to human-readable string
    proto_as_ascii_lines = tf.gfile.GFile(uid_lookup_path).readlines()
    uid_to_human = {}
    p = re.compile(r'[n\d]*[ \S,]*')
    for line in proto_as_ascii_lines:
      parsed_items = p.findall(line)
      uid = parsed_items[0]
      human_string = parsed_items[2]
      uid_to_human[uid] = human_string

    # Loads mapping from string UID to integer node ID.
    node_id_to_uid = {}
    proto_as_ascii = tf.gfile.GFile(label_lookup_path).readlines()
    for line in proto_as_ascii:
      if line.startswith('  target_class:'):
        target_class = int(line.split(': ')[1])
      if line.startswith('  target_class_string:'):
        target_class_string = line.split(': ')[1]
        node_id_to_uid[target_class] = target_class_string[1:-2]

    # Loads the final mapping of integer node ID to human-readable string
    node_id_to_name = {}
    for key, val in node_id_to_uid.items():
      if val not in uid_to_human:
        tf.logging.fatal('Failed to locate: %s', val)
      name = uid_to_human[val]
      node_id_to_name[key] = name

    return node_id_to_name

  def id_to_string(self, node_id):
    if node_id not in self.node_lookup:
      return ''
    return self.node_lookup[node_id]


def create_graph():
  """Creates a graph from saved GraphDef file and returns a saver."""
  # Creates graph from saved graph_def.pb.
  with tf.gfile.FastGFile(os.path.join(
      FLAGS.model_dir, 'classify_image_graph_def.pb'), 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    _ = tf.import_graph_def(graph_def, name='')


def run_inference_on_image(image):
  """Runs inference on an image.

  Args:
    image: Image file name.

  Returns:
    Nothing
  """
  if not tf.gfile.Exists(image):
    tf.logging.fatal('File does not exist %s', image)
  image_data = tf.gfile.FastGFile(image, 'rb').read()

  # Creates graph from saved GraphDef.
  create_graph()

  with tf.Session() as sess:
    # Some useful tensors:
    # 'softmax:0': A tensor containing the normalized prediction across
    #   1000 labels.
    # 'pool_3:0': A tensor containing the next-to-last layer containing 2048
    #   float description of the image.
    # 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG
    #   encoding of the image.
    # Runs the softmax tensor by feeding the image_data as input to the graph.
    softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
    predictions = sess.run(softmax_tensor,
                           {'DecodeJpeg/contents:0': image_data})
    predictions = np.squeeze(predictions)

    # Creates node ID --> English string lookup.
    node_lookup = NodeLookup()

    top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
    for node_id in top_k:
      human_string = node_lookup.id_to_string(node_id)
      score = predictions[node_id]
      print('%s (score = %.5f)' % (human_string, score))



def main(_):
    cam=Device()
    try:
        while True:
            cam.saveSnapshot('image_cap.jpg')  
            image = 'image_cap.jpg'
            run_inference_on_image(image)
            print('===========================')
    except KeyboardInterrupt:
        pass


if __name__ == '__main__':
    tf.app.run()

测试

运行classify_image.py,默认情况下程序会将抓取到的图像保存到当前目录的image_cap.jpg中,并不断覆盖。识别函数会调用这个不断被更新的图片,并输出识别到前5个可能性的类别。

识别的实时性取决于识别算法的运行时间,这个就和计算机的性能有关了,我跑这个程序的这台电脑也没有GPU,用一句话来形容就是卡成PPT。

输出实例:

===========================
computer keyboard, keypad (score = 0.62623)
space bar (score = 0.30992)
typewriter keyboard (score = 0.02321)
mouse, computer mouse (score = 0.00303)
notebook, notebook computer (score = 0.00076)
===========================

对应时间。。我把我的破键盘举到了摄像头前。。。

Tensorflow + ImageNet Inception-v3 视频图像识别_第2张图片

识别还是可以的,不过我试了试正对我本人的识别,就没有什么好的识别结果了,可能这样的图片没有在训练集中出现吧。(最接近的分类是浴帽,摔!)


参考资料

https://stackoverflow.com/questions/38497924/capturing-image-from-webcam-using-python-on-windows

https://stackoverflow.com/questions/31172719/pip-install-access-denied-on-windows

Python 调用摄像头并保存图片 王小涛_同學

你可能感兴趣的:(Tensorflow + ImageNet Inception-v3 视频图像识别)