学习tensorrt主要目的还是将其应用到板子上,在上一篇博客中,介绍了怎么在主机中使用tensorrt,现在将其放到jetson设备上执行试试
jetpack | tensorflow |
---|---|
4.3 | 2.0 |
tensorflow2.0安装需要jetpack4.3,因此需要先进行jetpack更新
$ sudo apt-get update
$ sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev
$ sudo apt-get install python3-pip
$ sudo pip3 install -U pip testresources setuptools
安装Python软件包依赖项。
$ sudo pip3 install -U numpy==1.16.1 future==0.17.1 mock==3.0.5 h5py==2.9.0 keras_preprocessing==1.0.5 keras_applications==1.0.8 gast==0.2.2 enum34 futures protobuf
如果你已经安装过了,加上–upgrade参数进行更新
$ sudo pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v43 tensorflow-gpu
这个命令用来安装兼容jetpack4.3的最新版本,可能不具有时效性.
安装后测试一下import tensorflow
新建项目
#train.py
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import RMSprop
batch_size = 128
num_classes = 10
epochs = 5
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))
model.summary()
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
model.save('tf_savedmodel', save_format='tf')
#convert.py
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import tensorflow as tf
params=trt.DEFAULT_TRT_CONVERSION_PARAMS
params._replace(precision_mode=trt.TrtPrecisionMode.FP32)
params._replace(is_dynamic_op=True)
params._replace(max_workspace_size_bytes=10000000000)
converter = trt.TrtGraphConverterV2(input_saved_model_dir='tf',conversion_params=params)
converter.convert()
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_test = x_test.astype('float32')
x_test = x_test.reshape(10000, 784)
x_test /= 255
#converter._converted_func(tf.constant(x_test[:1]))
def a():
yield (x_test[:1]),
converter.build(a)#这三行相当于上面注释掉的过程
converter.save('trt_savedmodel')
#runtrt.py
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
from tensorflow.keras.datasets import mnist
import time
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_test = x_test.astype('float32')
x_test = x_test.reshape(10000, 784)
x_test /= 255
saved_model_loaded = tf.saved_model.load(
"trt_savedmodel")
graph_func = saved_model_loaded.signatures[
trt.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
frozen_func = trt.convert_to_constants.convert_variables_to_constants_v2(
graph_func)
t=time.time()
graph_func(tf.constant(x_test))
print(time.time()-t)
代码解释请参考这里
由于在主机上优化好的模型在板子上没办法调用,只能在主机上训练号后将模型传到板子上优化.
运行train.py进行训练
众所周知,pycharm有很方便的远程工具,我们用它连接板子吧.先用usb连接上或者让板子和主机连接同一个局域网,修改python解释器,添加一个ssh解释器,如果是usb连接,板子的ip192.168.55.1,板子上解释器位置为/usr/bin/python3,可以修改path mappings也就是项目同步到板子上的路径.
运行convert.py进行优化并保存
运行runtrt.py使用优化好的图进行推理
试一试注释掉convert.py中的converter.build()运行,
对比一下runtrt.py推理速度