NXP和STM32 深度学习框架支持对比

来源IMX_LINUX_USERS_GUIDE

NXP eIQ Machine Learning


The NXP® eIQTM for i.MX toolkit provides a set of libraries and development tools for machine learning applications
targeting NXP microcontrollers and application processors. The toolkit is contained in the meta-imx/meta-ml layer.
For details about Machine Learning Security, see Security for Machine Learning Package (AN12867)


OpenCV machine learning-OpenCV DNN

Image classification demo
This demo performs image classification using a pretrained SqueezeNet network. Demo dependencies from
../opencv_extra-4.2.0/testdata/dnn


YOLO object detection example
The YOLO object detection demo performs object detection using You Only Look Once (YOLO ) detector. It detects objects
on camera, video, or image. Find out more information about this demo at OpenCV Yolo DNNs page. Demo dependencies
from
../opencv_extra-4.2.0/testdata/dnn

 

 

Image segmentation demo
The image segmentation means dividing the image into groups of pixels based on some criteria grouping based on color,
texture, or some other criteria. Demo dependencies from
../opencv_extra-4.2.0/testdata/dnn
 

 

Image colorization demo
This sample demonstrates recoloring grayscale images with DNN. The demo supports input images only, not the live camera
input. Demo dependencies from
../opencv_extra-4.2.0/testdata/dnn
 

Human pose detection demo
This application demonstrates human or hand pose detection with a pretrained OpenPose DNN. The demo supports input
images only and no live camera input. Demo dependencies from
../opencv_extra-4.2.0/testdata/dnn
 

 

Object Detection Example
This demo performs object detection using a pretrained SqueezeDet network. The demo supports input images only, not the
live camera input. Demo dependencies are the following:
• SqueezeDet.caffemodel model weight file
• SqueezeDet_deploy.prototxt model definition file
• Input image aeroplane.jpg
Running the C++ example with image input from the default location:
./example_dnn_objdetect_obj_detect SqueezeDet_deploy.prototxt SqueezeDet.caffemodel
aeroplane.jpg
 

 

CNN image classification example
This demo performs image classification using a pretrained SqueezeNet network. The demo supports input images only, not
the live camera input. Demo dependencies are the following:
• SqueezeNet.caffemodel model weight file
 

 

Text detection
This demo is used for text detection in the image using EAST algorithm. Demo dependencies from
../opencv_extra-4.2.0/testdata/dnn
• frozen_east_text_detection.pb
Other demo dependencies are imageTextN.png from
/usr/share/OpenCV/samples/data
 

 

SVM Introduction
This example demonstrates how to create and train an SVM model using training data. Once the model is trained, labels for
test data are predicted. The full description of the example can be found in (tutorial_introduction_to_svm ). For displaying
the result, an image with Qt5 enabled is required.
After running the demo, the graphics result is shown on the screen:
./example_tutorial_introduction_to_svm
 

Prinicipal Component Analysis (PCA) introduction
Principal Component Analysis (PCA) is a statistical method that extracts the most important features of a dataset. In this
tutorial you will learn how to use PCA to calculate the orientation of an object. For more details, check the OpenCV tutorial
Introduction_to_PCA.
 

 

Logistic regression
In this sample, logistic regression is used for prediction of two characters (0 or 1) from an image. First, every image matrix is
reshaped from its original size of 28x28 to 1x784. A logistic regression model is created and trained on 20 images. After
training, the model can predict labels of test images. The source code is located on the logistic_regression link, and can be
run by typing the following command.
Demo dependencies (preparing the train data files):
wget https://raw.githubusercontent.com/opencv/opencv/4.2.0/samples/data/data01.xml
After running the demo, the graphics result is shown on the screen (it requires Qt5 support):
./example_cpp_logistic_regression
 

Arm Compute Library


Arm Compute Library is a collection of low-level functions optimized for Arm CPU and GPU architectures targeted at image
processing, computer vision, and machine learning
 

TensorFlow Lite


TensorFlow Lite is a light-weight version of and a next step from TensorFlow. TensorFlow Lite is an open-source software
library focused on running machine learning models on mobile and embedded devices (available at www.tensorflow.org/
lite ). It enables on-device machine learning inference with low latency and small binary size. TensorFlow Lite also supports
hardware acceleration using Android OS Neural Networks API.
Features:
• Multithreaded computation with acceleration using Arm Neon SIMD instructions on Cortex-A cores
• Parallel computation using GPU/ML hardware acceleration (on shader or convolution units)
• C++ and Python API (supported Python version 3
 

Arm NN


Arm NN is an open-source inference engine framework developed by Linaro Artificial Intelligence Initiative, which NXP is a
part of and supporting a wide range of neural-network model formats, such as Caffe, TensorFlow, TensorFlow Lite, and
ONNX. For i.MX8, Arm NN is able to run on the CPU accelerated using Arm NEON (SIMD architecture extension for Arm
Cortex-A/R processors) and on GPUs/NPUs accelerated using the VSI NPU backend distributed exlusively as a component
of NXP® eIQTM. For more details about Arm NN, check the Arm NN SDK webpage.
Source codes in order to develop a custom application or build Arm NN are available on https://source.codeaurora.org/
external/imx/armnn-imx.
 

 

Caffe tests


Arm NN SDK provides the following set of tests for Caffe models:
/usr/bin/CaffeAlexNet-Armnn
/usr/bin/CaffeCifar10AcrossChannels-Armnn
/usr/bin/CaffeInception_BN-Armnn
/usr/bin/CaffeMnist-Armnn
/usr/bin/CaffeResNet-Armnn
/usr/bin/CaffeVGG-Armnn
/usr/bin/CaffeYolo-Armnn
Two important limitations might require preprocessing of the Caffe model file prior to running an Arm NN Caffe test. First,
Arm NN tests require batch size to be set to 1. Second, Arm NN does not support all Caffe syntaxes, therefore some older
neural network model files will require updates to the latest Caffe syntax.
Details about how to perform these preprocessing steps are described on Arm NN GitHub page. Install Caffe on the host.
Also check Arm NN documentation for Caffe support.
 

 

ONNX Runtime


ONNX Runtime version 1.1.2 with NXP improvements supports both ArmNN and the default CPU execution providers with
optimization level 2. In addition, ACL execution provider and optimization level 99 are provided as preview for a subset of
models (mobilenet v2, resnet50 v1 and v2).
NOTE
For the full list of the CPU supported operators, see the 'operator kernels' documentation
section: https://source.codeaurora.org/external/imx/onnxruntime-imx/tree/docs/
OperatorKernels.md
 

Profiling NN execution on GPU/NPU
 

This section describes the steps to enable profiler and capture logs.
1. Stop the EVK board in U-Boot by pressing Enter.
2. Update mmcargs by adding galcore.showArgs=1 and galcore.gpuProfiler=1.
u-boot=> editenv mmcargs
edit: setenv bootargs ${jh_clk} console=${console} root=${mmcroot} galcore.showArgs=1
galcore.gpuProfiler=1
u-boot=> boot
3. Boot the board and wait for the Linux OS prompt.
4. The following environment flags should be enabled before executing the application. VIV_VX_DEBUG_LEVEL and
VIV_VX_PROFILE flags should always be 1 during the process of profiling. The CNN_PERF flag enables the driver’s
ability to generate perlayer profile log. NN_EXT_SHOW_PERF shows the details of how compiler estimates performance
and determines tiling based on it.
export CNN_PERF=1 NN_EXT_SHOW_PERF=1 VIV_VX_DEBUG_LEVEL=1 VIV_VX_PROFILE=1
5. Capture the profiler log. We use the sample ML example part of standard NXP Linux release to explain the following
section.
• Tensorflow-Lite profiling
Run the TFLite application with NPU backend as follows:
cd /usr/bin/tensorflow-lite-2.1.0/examples
./label_image -m mobilenet_v1_1.0_224_quant.tflite -t 1 -i grace_hopper.bmp -l
labels.txt -a 1 -v 0 > viv_test_app_profile.log 2>&1
• Armnn profiling
Run the ArmNN application (here TfMobilNet is taken as example) with NPU backend as follows:
/usr/bin/TfMobileNet-Armnn --data-dir=data --model-dir=models --compute=VsiNpu >
viv_test_app_profile.log 2>&1
The log captures detailed information of the execution clock cycles and DDR data transmission in each layer
 

 

STM32Cube.AI的主要特点:

  • 主要特点

    • 从预训练的神经网络模型生成STM32优化的库
    • 各种深学习的原生支持的框架,如Keras, TensorFlow™ Lite, Caffe, ConvNetJs and Lasagne, and suppport of all frameworks that can export to the ONNX standard format such as PyTorch™, 
    • 支持Keras网络和TensorFlow™Lite量化网络的8位量化
    • 通过将权重存储在外部闪存中并将激活缓冲区存储在外部RAM中,允许使用更大的网络
    • 通过STM32Cube集成轻松在不同STM32微控制器系列之间移植
    • 免费的,用户友好的许可条款
  • The STM32Cube.AI is an extension pack of the widely used STM32CubeMX configuration and code generation tool enabling AI on STM32 Arm® Cortex®-M-based microcontrollers. To access it, download and install the STM32CubeMX (version 5.0.1 onwards)

The STM32Cube.AI is fully integrated into STM32 software development ecosystem as an extension of the widely used STM32CubeMX tool.

It allows fast, automatic conversion of pre-trained ANNs into optimized code that can run on an MCU. The tool guides users through the selection of the right MCU and provides rapid feedback on the performance of the Neural Network in the chosen MCU, with validation running both on your PC and the target STM32 MCU. Check out our Getting Started video.

 

X-CUBE-AI is an STM32Cube Expansion Package part of the STM32Cube.AI ecosystem and extending STM32CubeMX capabilities with automatic conversion of pre-trained Neural Network and integration of generated optimized library into the user's project. The easiest way to use it is to download it inside the STM32CubeMX tool (version 5.0.1 or newer) as described in user manual Getting started with X-CUBE-AI Expansion Package for Artificial Intelligence (AI) (UM2526).

 

 

STM32MP1

linux。。。待续

NXP和STM32 深度学习框架支持对比_第1张图片

第一个是手写字识别 A7控制M4调用CUBE.AI生成的模型(Keras model  model-ABC123-112.h5)

/**
  *************************************************************************************************
  * @file    readme.txt
  * @author  MCD Application Team
  * @brief   Description of the Artificial Intelligence Hand Writing Character Recognition example.
  *************************************************************************************************
  *
  * Copyright (c) 2019 STMicroelectronics. All rights reserved.
  *
  * This software component is licensed by ST under BSD 3-Clause license,
  * the "License"; You may not use this file except in compliance with the
  * License. You may obtain a copy of the License at:
  *                       opensource.org/licenses/BSD-3-Clause
  *
  *************************************************************************************************
  */

This project demonstrate a complex application that is running on both CPU1(CA7) and CPU2(CM4).
The application is a launcher that recognize hand writing character drawn on the touch screen in order
to execute specific actions.

CPU1 (CA7) control the touch event and the Graphic User Interface.
CPU2 (CM4) is used to offload the processing of a Cube.AI pre-build Neural Network.

The communication between the CPU1(CA7) and the CPU2(CM4) is done through a Virtual UART to create an
Inter-Processor Communication channel seen as a TTY device in Linux.
The implementation is based on:
    * RPMSG framework on CPU1(CA7) side
    * and OpenAMP MW on the CPU2(CM4) side

OpenAMP MW uses the following HW resources
    * IPCC peripheral for event signal (mailbox) between CPU1(CA7) and CPU2(CM4)
    * MCUSRAM peripheral for buffer communications (virtio buffers) between CPU1(CA7) and CPU2(CM4)
            Reserved shared memeory region for this example: SHM_ADDR=0x10040000 and SHM_SIZE=128k.
            It is defined in platform_info.c file

A communication protocol has been defined between the CPU1(CA7 and the CPU2(CM4).
The data frames exchanged have the follwowing structure:
    ----------------------------------------------------------------
    | msg ID | data Length | data Byte 1 | ... | data Byte n | CRC |
    ----------------------------------------------------------------

    - 3 types of message could be received by CPU2(CM4):
        * Set the Neural Network input type (0x20, 0x01, data, CRC)
            * data = 0 => NN input is letter or digit
            * data = 1 => NN input is letter only
            * data = 2 => NN input is digit only

        * Provide the touch screen coordinate (0x20, n, data_x1, data_y1, ... , data_xn, data_yn, CRC)
            * n       => the number of coordinate points
            * data_xn => x coordinate of the point n
            * data_yn => y coordinate of the point n

        * Start ai nn processing (0x22, 0x00, CRC)

    - 4 types of acknowledges could be received on CPU1(CA7) side:
        * Bad acknowledge (0xFF, 0x00, CRC)

        * Good acknowledge (0xF0, 0x00, CRC)

        * Touch screen acknowledge (0xF0, 0x01, n, CRC)
            * n => number of screen coordinate points acknowledged

        * AI processing result acknowledge (0xF0, 0x04, char, accuracy, time_1, time_2, CRC)
            * char     => this is the recognized letter (or digit)
            * accuracy => this is the confidence expressed in percentage
            * time_1   => upper Bytes of the time (word) expressed in ms
            * time_2   => lower Bytes of the time (word) expressed in ms

On CPU2(CM4) side:
    - CPU2(CM4) initialize OPenAMP MW which initializes/configures IPCC peripheral through HAL
      and setup openamp-rpmsg framwork infrastructure
    - CPU2(CM4) creates 1 rpmsg channels for 1 virtual UART instance UART0
    - CPU2(CM4) initialize the Character Recognition Neural Network
    - CPU2(CM4) is waiting for messages from CPU1(CA7) on this channels
    - When CPU2(CM4) receives a message on 1 Virtual UART instance/rpmsg channel, it processes the message
      to execute the associated action:
        * set the NN input type to the desire value
        * or register the touch event coordinate to generate the picture that will be processed by the NN
        * or start the NN processing and wait for the results
    - On every previous action, the CPU(CM4) is sending back to the CPU1(CA7) and acknowledge already defined
      above.

On CPU1(CA7) side:
    - CPU1(CA7) open the input event to register the touch events generated by the user's finger drawing
    - CPU1(CA7) configure the input type (Letter only) of the Neural Network running on the CPU2(CM4) by
      sending a message throught the virtual TTY communication channel
    - when the drawing is finished, CPU1(CA7) process the touch event data and send it to the CPU2(CM4)
    - CPU1(CA7) start the Neural Network processing wait for the result and display the recognized character on
      the diplay

Some information about the Character Recognition Neural Network:
    The Character Recognition Neural Network used is a Keras model processed by Cube.AI to generate the executable
    that can be run on the CPU2(CM4).
    The Keras model used is located in the root directory of this project:
      model-ABC123-112.h5
    This model has been used in Cube.AI to generate the Neural Network binary.
    The model accept as input a 28x28 picture encoded with float in black and white (black = 0.0 or White = 1.0).
    The output layer of the Neural Network contains 36 neurons (A -> Z and 0 -> 9).

Notes:
    - It requires Linux console to run the application.
    - CM4 logging is redirected in Shared memory in MCUSRAM and can be displayed using following command:
          cat /sys/kernel/debug/remoteproc/remoteproc0/trace0

    Following command should be done in Linux console on CA7 to run the example :
    > /usr/local/demo/bin/ai_char_reco_launcher /usr/local/demo/bin/apps_launcher_example.sh

    You are ready to draw letter on the touch screen

Hardware and Software environment:
    - This example runs on STM32MP157CACx devices.
    - This example has been tested with STM32MP157C-DK2 and STM32MP157c-EVAL board and can be
      easily tailored to any other supported device and development board.

Where to find the M4 firmware source code:
    The M4 firmware source code is delivered as demonstration inside the STM32CubeMP1.
    For the DK2 board:
    /Firmware/Projects/STM32MP157C-DK2/Demonstrations/AI_Character_Recognition
    For the EV1 board:
    /Firmware/Projects/STM32MP157C-DK2/Demonstrations/AI_Character_Recognition

 

 

 

STM32 MPU OpenSTLinux Expansion Pack for AI computer vision application

 

https://www.st.com/content/st_com/zh/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32-mpu-openstlinux-expansion-packages/x-linux-ai.html

Version v2.0.0

  This version has been validated against the OpenSTLinux ecosystem release v2.0.0 More info.png and validated on STM32MP157x-DKx and STM32MP157x-EV1 boards.

1.1 Contents↑

  • TensorFlow Lite[1] 2.2.0
  • Coral Edge TPU[2] accelerator support
  • armNN[3] 20.05
  • OpenCV[4] 4.1.x
  • Python[5] 3.8.x (enabling Pillow module)
  • Support STM32MP15xF[6] devices operating at up to 800MHz
  • Python and C++ application samples
    • Image classification using TensorFlow Lite based on MobileNet v1 quantized model
    • Object detection using TensorFlow Lite based on COCO SSD MobileNet v1 quantized model
    • Image classification using Coral Edge TPU based on MobileNet v1 quantized model and compiled for the Coral Edge TPU
    • Object detection using Coral Edge TPU based on COCO SSD MobileNet v1 quantized model and compiled for the Coral Edge TPU
    • Image classification using armNN TensorFlow Lite parser based on MobileNet v1 float model
    • Object detection using armNN TensorFlow Lite parser based on COCO SSD MobileNet v1 quantized model

 

X-LINUX-AI OpenSTLinux Expansion Package: 

https://wiki.dh-electronics.com/index.php/Avenger96#Downloads

Description: Expansion Package that targets artificial intelligence for STM32MP1 Series devices.

  • NEW X-LINUX-AI OpenSTLinux Expansion Package
  • NEW How to install X-LINUX-AI v2.0.0 on Avenger96 board
  •  

X-LINUX-AI应用示例程序

 
 

本页列出了所有X-LINUX- AI应用程序示例。

1个 TensorFlow Lite应用程序样本↑

 

  • 使用TensorFlow Lite C ++ API进行图像分类
  • 使用TensorFlow Lite C ++ API进行对象检测

 

  • 使用TensorFlow Lite Python运行时进行图像分类
  • 使用TensorFlow Lite Python运行时进行对象检测

2个 Coral Edge TPU应用示例↑

 

  • 使用Coral Edge TPU TensorFlow Lite C ++ API进行图像分类
  • 使用Coral Edge TPU TensorFlow Lite C ++ API进行对象检测

 

  • 使用Coral Edge TPU TensorFlow Lite Python运行时进行图像分类
  • 使用Coral Edge TPU TensorFlow Lite Python运行时进行对象检测

3个 armNN应用样本↑

 

  • 使用armNN TensorFlow Lite解析器进行图像分类
  • 使用armNN TensorFlow Lite解析器进行对象检测

 

 

meta-st-stm32mpu-ai

https://github.com/STMicroelectronics/meta-st-stm32mpu-ai

OpenEmbedded meta layer to install AI frameworks and tools for the STM32MP1. It also provide application samples.

Compatibility

This version has been validated against the OpenSTLinux ecosystem release v2.0.0 and validated on STM32MP157x-DKx and STM32MP157x-EV1 boards.

Available frameworks and tools within the meta-layer

X-LINUX-AI v2.0.0 expansion package:

  • TensorFlow Lite 2.2.0
  • Coral Edge TPU accelerator support
  • armNN 20.05
  • OpenCV 4.1.x
  • Python 3.8.x (enabling Pillow module)
  • Support STM32MP15xF devices operating at up to 800MHz
  • Python and C++ application samples
    • Image classification using TensorFlow Lite based on MobileNet v1 quantized model
    • Object detection using TensorFlow Lite based on COCO SSD MobileNet v1 quantized model
    • Image classification using Coral Edge TPU based on MobileNet v1 quantized model and compiled for the Coral Edge TPU
    • Object detection using Coral Edge TPU based on COCO SSD MobileNet v1 quantized model and compiled for the Coral Edge TPU
    • Image classification using armNN TensorFlow Lite parser based on MobileNet v1 float model
    • Object detection using armNN TensorFlow Lite parser based on COCO SSD MobileNet v1 quantized model

Further information on how to install and how to use

https://wiki.st.com/stm32mpu/wiki/X-LINUX-AI_OpenSTLinux_Expansion_Package

Application samples

https://wiki.st.com/stm32mpu/wiki/X-LINUX-AI_application_samples_zoo

 

百问:

http://wiki.100ask.org/index.php?title=STM32MP1_artificial_intelligence_expansio&variant=zh-mo

 

盘古:

Skip to end of banner

  •  
  •  

Go to start of banner

体验支持TensorFlow的Weston系统

Skip to end of metadataGo to start of metadata

人工智能扩展包包含Linux AI框架,支持可以在STM32MP1系列设备上运行的AI应用程序示例。

该系统是通过添加了一个名为metaa-st-stm32mpu-ai的OpenEmbedded layer,它带来了一个完整且一致的易于构建/安装的环境,以利用STM32MP1系列上的AI。

该系统包含运行AI示例的框架、工具和应用程序。将针对不同的用例(如计算机视觉CV)提供不同的图像例程。

 

制作好的SD卡可以用来启动和运行Weston系统,所有数据保存在 SD存储卡内。

 

将Micro SD存储卡放入读卡器,插入到安装有Linux系统PC的USB接口。下面使用dd命令将Weston系统镜像写入Micro SD存储卡,/dev/sdb是对应的Micro SD卡设备(设备编号有差异,请谨慎选择)。

sudo dd if=st-image-ai-cv_sdcard_stm32mp157a-panguboard-basic.raw of=/dev/sdb conv=fdatasync bs=20M status=progress

写入完成后,将Micro SD存储卡插入开发板的SD卡槽(J7),切换启动开关状态为SD卡启动。连接串口线和电源,即可看到Weston系统的启动信息。

 

AI演示启动程序是一个衍生的GTK launcher应用程序。

它用python3 编写,并使用GTK 作为显示用户界面。 它可以轻松启动AI应用程序示例。

触摸屏上的``单击''或将鼠标连接到板上的``单击''足以启动AI应用程序。

你可能感兴趣的:(深度学习,单片机ARM)