Spark-2.4 Deep Learning Pipelines (Keras)Image Claasifer

(原文链接)-这是Spark2018 Submit 的一个演讲Demo, 针对Keras图片分类和使用Spark做分类的方法做了讲解,供学习使用。

keras_dlp_image_classifier(Python)

 Import Notebook

Part 1: Exploring and Classifying Images with Pretrained Models

We will use Keras with TensorFlow as the backend, and download VGG16 from Keras.

VGG16

from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input, decode_predictions, VGG16
import numpy as np
import os

vgg16Model = VGG16(weights='imagenet')

Function to predict category

def predict_images(images, m):
  for i in images:
    print ('processing image:', i)
    img = image.load_img(i, target_size=(224, 224))
    #convert to numpy array for Keras image formate processing
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    preds = m.predict(x)
    # decode the results into a list of tuples (class, description, probability
    print('Predicted:', decode_predictions(preds, top=3)[0], '\n')

Classify African and Indian Elephants

Load and predict images using VGG16 pretrained model

Spark-2.4 Deep Learning Pipelines (Keras)Image Claasifer_第1张图片 Spark-2.4 Deep Learning Pipelines (Keras)Image Claasifer_第2张图片 Spark-2.4 Deep Learning Pipelines (Keras)Image Claasifer_第3张图片 Spark-2.4 Deep Learning Pipelines (Keras)Image Claasifer_第4张图片 Spark-2.4 Deep Learning Pipelines (Keras)Image Claasifer_第5张图片

elephants_img_paths = ["/dbfs/brooke/spark-summit-sf/elephants/" + path for path in os.listdir("/dbfs/brooke/spark-summit-sf/elephants/")]
predict_images(elephants_img_paths, vgg16Model)

processing image: /dbfs/brooke/spark-summit-sf/elephants/african_elephant_1.jpg Downloading data from https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json

...

processing image: /dbfs/brooke/spark-summit-sf/elephants/indian_elephant_1.jpeg

Predicted: [('n01871265', 'tusker', 0.63030255), ('n02504013', 'Indian_elephant', 0.3172723), ('n02504458', 'African_elephant', 0.052417696)]

processing image: /dbfs/brooke/spark-summit-sf/elephants/indian_elephant_3.jpeg

Predicted: [('n03980874', 'poncho', 0.27748922), ('n02504013', 'Indian_elephant', 0.15591854), ('n03884397', 'panpipe', 0.118131705)]

hotdog_img_paths = ["/dbfs/brooke/spark-summit-sf/hotdog/" + path for path in os.listdir("/dbfs/brooke/spark-summit-sf/hotdog/")]
predict_images(hotdog_img_paths, vgg16Model)

processing image: /dbfs/brooke/spark-summit-sf/hotdog/hamburger_1.jpeg

Predicted: [('n07697313', 'cheeseburger', 0.98272187), ('n07693725', 'bagel', 0.0061851414), ('n07613480', 'trifle', 0.005066536)]

processing image: /dbfs/brooke/spark-summit-sf/hotdog/hotdog_1.jpeg

Predicted: [('n07697537', 'hotdog', 0.9997451), ('n07697313', 'cheeseburger', 8.424303e-05), ('n07615774', 'ice_lolly', 4.0920193e-05)] ...

DeepImagePredictor

Let's make these predictions in parallel on our Spark cluster!

from pyspark.ml.image import ImageSchema
from sparkdl.image import imageIO
from sparkdl import DeepImagePredictor

nerds_df = ImageSchema.readImages("brooke/spark-summit-sf/nerds/")

predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="VGG16", decodePredictions=True, topK=5)
predictions_df = predictor.transform(nerds_df).cache()
predictions_df.count()

INFO:tensorflow:Froze 32 variables. Converted 32 variables to const ops. INFO:tensorflow:Froze 0 variables. Converted 0 variables to const ops. Out[5]: 8

display(predictions_df)
image predicted_labels
[{"class":"n03045698","description":"cloak","probability":0.39211166},{"class":"n03787032","description":"mortarboard","probability":0.091029376},{"class":"n03404251","description":"fur_coat","probability":0.08471853},{"class":"n04371774","description":"swing","probability":0.056981083},{"class":"n04370456","description":"sweatshirt","probability":0.028172707}]
[{"class":"n04350905","description":"suit","probability":0.41138414},{"class":"n02916936","description":"bulletproof_vest","probability":0.10152984},{"class":"n03763968","description":"military_uniform","probability":0.09318812},{"class":"n04591157","description":"Windsor_tie","probability":0.07702819},{"class":"n02669723","description":"academic_gown","probability":0.03608404}]
[{"class":"n03630383","description":"lab_coat","probability":0.23492548},{"class":"n04591157","description":"Windsor_tie","probability":0.09487544},{"class":"n03838899","description":"oboe","probability":0.049194943},{"class":"n04350905","description":"suit","probability":0.043708242},{"class":"n03832673","description":"notebook","probability":0.041520175}]
  [{"class":"n04350905","description":"suit","probability":0.25813994},{"class":"n01440764","description":"tench","probability":0.03799466},{"class":"n03838899","description":"oboe","probability":0.03496751},{"class":"n02883205","description":"bow_tie","probability":0.033893984},{"class":"n03394916","description":"French_horn","probability":0.03332546}]
[{"class":"n03595614","description":"jersey","probability":0.3530513},{"class":"n04370456","description":"sweatshirt","probability":0.13232166},{"class":"n03942813","description":"ping-pong_ball","probability":0.097091846},{"class":"n03141823","description":"crutch","probability":0.018438742},{"class":"n04270147","description":"spatula","probability":0.017245641}]
[{"class":"n03000247","description":"chain_mail","probability":0.14295407},{"class":"n02672831","description":"accordion","probability":0.10376813},{"class":"n02787622","description":"banjo","probability":0.069579415},{"class":"n02804610","description":"bassoon","probability":0.061210092},{"class":"n03838899","description":"oboe","probability":0.058611386}]
[{"class":"n03630383","description":"lab_coat","probability":0.6628539},{"class":"n04317175","description":"stethoscope","probability":0.12459004},{"class":"n04370456","description":"sweatshirt","probability":0.038363792},{"class":"n04039381","description":"racket","probability":0.0132558},{"class":"n03595614","description":"jersey","probability":0.008680802}]
[{"class":"n04350905","description":"suit","probability":0.17892571},{"class":"n02883205","description":"bow_tie","probability":0.0947369},{"class":"n04591157","description":"Windsor_tie","probability":0.08621924},{"class":"n04162706","description":"seat_belt","probability":0.07562429},{"class":"n03630383","description":"lab_coat","probability":0.06052583}]

 

 Show image preview 

Detected data types for which enhanced rendering is supported. For details, see the Databricks Guide.

Let's change the model

inception = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="InceptionV3", decodePredictions=True, topK=5)
inception_df = inception.transform(nerds_df).cache()
inception_df.count()

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.5/inception_v3_weights_tf_dim_ordering_tf_kernels.h5

0us/step INFO:tensorflow:Froze 378 variables. Converted 378 variables to const ops. INFO:tensorflow:Froze 0 variables. Converted 0 variables to const ops. Out[7]: 8

display(inception_df)
image predicted_labels
[{"class":"n04350905","description":"suit","probability":0.14165702},{"class":"n04479046","description":"trench_coat","probability":0.11945703},{"class":"n03404251","description":"fur_coat","probability":0.027757034},{"class":"n04370456","description":"sweatshirt","probability":0.024130477},{"class":"n02837789","description":"bikini","probability":0.021668304}]
[{"class":"n02916936","description":"bulletproof_vest","probability":0.81811774},{"class":"n03787032","description":"mortarboard","probability":0.036147255},{"class":"n03763968","description":"military_uniform","probability":0.02605114},{"class":"n02669723","description":"academic_gown","probability":0.01372046},{"class":"n04350905","description":"suit","probability":0.008794671}]
[{"class":"n04479046","description":"trench_coat","probability":0.054543473},{"class":"n03838899","description":"oboe","probability":0.042113375},{"class":"n03630383","description":"lab_coat","probability":0.0317195},{"class":"n02787622","description":"banjo","probability":0.029045274},{"class":"n02804610","description":"bassoon","probability":0.026370155}]
[{"class":"n04350905","description":"suit","probability":0.7987358},{"class":"n02883205","description":"bow_tie","probability":0.053425536},{"class":"n04591157","description":"Windsor_tie","probability":0.011151478},{"class":"n02992529","description":"cellular_telephone","probability":0.0053684525},{"class":"n03763968","description":"military_uniform","probability":0.0039382246}]
[{"class":"n03595614","description":"jersey","probability":0.15574361},{"class":"n03942813","description":"ping-pong_ball","probability":0.05970348},{"class":"n04370456","description":"sweatshirt","probability":0.048369024},{"class":"n02804610","description":"bassoon","probability":0.034532476},{"class":"n03838899","description":"oboe","probability":0.03400313}]
[{"class":"n03763968","description":"military_uniform","probability":0.117212564},{"class":"n04350905","description":"suit","probability":0.035018962},{"class":"n02787622","description":"banjo","probability":0.033046678},{"class":"n04584207","description":"wig","probability":0.032433487},{"class":"n04317175","description":"stethoscope","probability":0.028688557}]
[{"class":"n03630383","description":"lab_coat","probability":0.36856785},{"class":"n04317175","description":"stethoscope","probability":0.037452906},{"class":"n03832673","description":"notebook","probability":0.03503557},{"class":"n04350905","description":"suit","probability":0.028838113},{"class":"n03787032","description":"mortarboard","probability":0.020943912}]
[{"class":"n03763968","description":"military_uniform","probability":0.9151895},{"class":"n03787032","description":"mortarboard","probability":0.012946242},{"class":"n04350905","description":"suit","probability":0.007009363},{"class":"n02669723","description":"academic_gown","probability":0.0068341326},{"class":"n02883205","description":"bow_tie","probability":0.0064667594}]

 

 Show image preview 

Detected data types for which enhanced rendering is supported. For details, see the Databricks Guide.

Part 2: Transfer Learning with Deep Learning Pipelines (DLP)

Deep Learning Pipelines provides utilities to perform transfer learning on images, which is one of the fastest (code and run-time-wise) ways to start using deep learning. Using Deep Learning Pipelines, it can be done in just several lines of code.

The idea behind transfer learning is to take knowledge from one model doing some task, and transfer it to build another model doing a similar task.

from pyspark.ml.image import ImageSchema
from pyspark.sql.functions import lit
from sparkdl.image import imageIO

img_dir = 'dbfs:/brooke/spark-summit-sf'
cats_df = ImageSchema.readImages(img_dir + "/cats").withColumn("label", lit(1))
dogs_df = ImageSchema.readImages(img_dir + "/dogs").withColumn("label", lit(0))

cats_train, cats_test = cats_df.randomSplit([.8, .2], seed=42)
dogs_train, dogs_test = dogs_df.randomSplit([.8, .2], seed=42)

train_df = cats_train.unionAll(dogs_train).cache()
test_df = cats_test.unionAll(dogs_test).cache()
display(train_df.select("image", "label"))
image label
1
1
1
1
1
0
0
0
0
0

 

 Show image preview 

Detected data types for which enhanced rendering is supported. For details, see the Databricks Guide.

Build the MLlib Pipeline

Use DeepImageFeaturizer and LogisticRegression

from pyspark.ml.classification import LogisticRegression
from pyspark.ml import Pipeline
from sparkdl import DeepImageFeaturizer 

featurizer = DeepImageFeaturizer(inputCol="image", outputCol="features", modelName="InceptionV3")
lr = LogisticRegression(maxIter=20, regParam=0.05, elasticNetParam=0.3, labelCol="label")
p = Pipeline(stages=[featurizer, lr])

p_model = p.fit(train_df)

Evaluate the Accuracy

from pyspark.ml.evaluation import MulticlassClassificationEvaluator

pred_df = p_model.transform(test_df).cache()
evaluator = MulticlassClassificationEvaluator(metricName="accuracy")
print("Test set accuracy = " + str(evaluator.evaluate(pred_df.select("prediction", "label"))*100) + "%")

Test set accuracy = 100.0%

display(pred_df.select("image", "label", "probability"))
image label probability
1 [1,2,[],[0.07983768538504338,0.9201623146149567]]
1 [1,2,[],[0.0735124824751803,0.9264875175248197]]
1 [1,2,[],[0.0688419453818859,0.9311580546181142]]
0 [1,2,[],[0.9475188514834973,0.0524811485165027]]
0 [1,2,[],[0.9026450467442289,0.0973549532557711]]
0 [1,2,[],[0.7731177886783923,0.2268822113216076]]

 

 Show image preview 

Detected data types for which enhanced rendering is supported. For details, see the Databricks Guide.

 

 

 

你可能感兴趣的:(spark,深度学习,Keras)