博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
TensoRT开发记录
阅读量:4142 次
发布时间:2019-05-25

本文共 11721 字,大约阅读时间需要 39 分钟。

Keras到TensorRT部署

简述

  1. TensorRT是什么 TensorRT是一个高性能的深度学习推理(Inference,相当于predict)优化器,可以为深度学习应用提供低延迟、高吞吐率的部署推理。可以认为TensorRT是一个只有前向传播的深度学习框架,这个框架可以将TensorFlow的网络模型解析,然后与TensorRT中对应的层进行一一映射、统一转换到TensorRT中,针对NVIDIA自家GPU实施优化策略,并进行部署加速。

  2. 学习TensorRT遇到的困难 初步接触TensorRT时发现,在各大博客上对此框架仅有简略介绍、没有较详细的开发教程资料。此框架需要使用C++语言编写,在环境搭建和代码测试上我们毫无经验,需要不断试错。对于代码报错,网上极少有对应的可行解决方案借鉴。最后我们学习英伟达官方提供的样例,模仿搭建了一个能够实现解析模型结构和权重、调用GPU的CUDA进行预测等功能的C++工程。

  3. 开发进度和流程优化 经过不断试错,我们成功开发了符合我们需求的TensorRT C++工程,并且融入了OpenCV图片处理、自定义HWC-CHW通道转换函数、语义分割结果的可视化处理等模块功能。同时发现,可以通过将keras框架的h5模型文件转换为ONNX格式喂入RT,但是存在网络结构解析较慢的情况;网上有没有可以加速模型解析的方案,但是通过研究英伟达官方提供的案例,我们尝试了将onnx转换为engine引擎的隐藏方法,省去了模型解析的步骤,使得模型可以直接加载到显存之中,极大压缩了程序加载时间。

格式转换

注意点:

to_channel_first指的是是否需要把NHWC通道转换为NCHW
TensorRT 5.1.5.0 GA 对于 ONNX target_opset=7 兼容性较好

import warnings,osos.environ["CUDA_VISIBLE_DEVICES"] = "0"os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'warnings.filterwarnings("ignore")# in tfrt 2.1.0# pip install keras2onnximport tensorflow as tfimport keras2onnximport onnxfrom tensorflow.keras.optimizers import Adam'''注意点:to_channel_first指的是是否需要把NHWC通道转换为NCHWTensorRT 5.0 GA 对于 ONNX target_opset=7 兼容性较好'''def h52onnx(h5_model_path, onnx_model_path, to_channel_first):    opt = Adam(lr=1e-4)    def get_lr_metric(optimizer):  # printing the value of the learning rate            def lr(y_true, y_pred):                return optimizer.lr            return lr    lr_metric = get_lr_metric(opt)    model = tf.keras.models.load_model(h5_model_path)#, {'lr': lr_metric})    # model.summary()    inputLayerName = model.get_layer(index=0).name    print('First Layer is', inputLayerName)    if(not to_channel_first):        inputLayerName = None    onnx_model = keras2onnx.convert_keras(model, '',target_opset=7,                                          channel_first_inputs=inputLayerName,                                          doc_string='Straka\'s model.')    onnx.save_model(onnx_model, onnx_model_path)    print('onnx saved.')def onnx2engine(onnxPath, enginePath):    cmd = r'D:\TensorRT\TensorRT-5.1.5.0\bin\trtexec --verbose --onnx='    cmd += onnxPath    cmd += r' --saveEngine='    cmd += enginePath    # print(cmd)    # os.system(cmd)    # print('engine saved.\n')    result = os.popen(cmd)    context = result.read()    result.close()    if(context.splitlines()[-1][5:11]=='FAILED'):        print('生成引擎失败.\n')        print(context)    elif(context.splitlines()[-1][5:11]=='PASSED'):        print('生成引擎成功.\n')    return context

修改C++工程

以英伟达官方的 sampleOnnxMNIST.cpp为例,加入OpenCV和自定义序列化引擎的导入 。。

  1. 图像输入处理(包括HWC->CHW通道转换和BGR->RGB转换还有BGR->GRAY)
// ==============Pre Process=============>void idxTransformParall(std::vector
* in_file, std::vector
* out_file, unsigned long start_h, unsigned long length, unsigned long image_h, unsigned long image_w, unsigned long start, float (*pFun)(const unsigned char&), bool HWC) {
if (HWC) {
// HWC and BRG=>RGB for (unsigned long h = start_h; h < start_h + length; ++h) {
for (unsigned long w = 0; w < image_w; ++w) {
(*out_file)[start + h * image_w * 3 + w * 3 + 0] = (*pFun)((*in_file)[h * image_w * 3 + w * 3 + 2]); (*out_file)[start + h * image_w * 3 + w * 3 + 1] = (*pFun)((*in_file)[h * image_w * 3 + w * 3 + 1]); (*out_file)[start + h * image_w * 3 + w * 3 + 2] = (*pFun)((*in_file)[h * image_w * 3 + w * 3 + 0]); } } } else {
// CHW and BRG=>RGB for (unsigned long h = start_h; h < start_h + length; ++h) {
for (unsigned long w = 0; w < image_w; ++w) {
(*out_file)[start + 0 * image_h * image_w + h * image_w + w] = (*pFun)((*in_file)[h * image_w * 3 + w * 3 + 2]); (*out_file)[start + 1 * image_h * image_w + h * image_w + w] = (*pFun)((*in_file)[h * image_w * 3 + w * 3 + 1]); (*out_file)[start + 2 * image_h * image_w + h * image_w + w] = (*pFun)((*in_file)[h * image_w * 3 + w * 3 + 0]); } } }}// 图片 高 宽 按比例缩放 数值处理函数 通道模式 线程数std::vector
imagePreprocess(const std::vector
& images, const int& image_h, const int& image_w, bool is_padding, float(*pFun)(const unsigned char&), bool HWC, int worker) {
// image_path ===> cv::Mat ===> resize(padding) ===> CHW/HWC (BRG=>RGB) // 测试发现RGB转BGR的cv::cvtColor 和 HWC 转 CHW非常耗时,故将其合并为一次操作 const unsigned long image_length = image_h * image_w * 3; std::vector
fileData(images.size() * image_length); for (unsigned long img_count = 0; img_count < images.size(); ++img_count) {
cv::Mat image = images[img_count].clone(); cv::Mat prodessed_image(image_h, image_w, CV_8UC3); if (is_padding) {
int ih = image.rows; int iw = image.cols; float scale = std::min(static_cast
(image_w) / static_cast
(iw), static_cast
(image_h) / static_cast
(ih)); int nh = static_cast
(scale * static_cast
(ih)); int nw = static_cast
(scale * static_cast
(iw)); int dh = (image_h - nh) / 2; int dw = (image_w - nw) / 2; cv::Mat resized_image(nh, nw, CV_8UC3); cv::resize(image, resized_image, cv::Size(nw, nh)); cv::copyMakeBorder(resized_image, prodessed_image, dh, image_h - nh - dh, dw, image_w - nw - dw, cv::BORDER_CONSTANT, cv::Scalar(128, 128, 128)); } else { cv::Mat resized_image(image_h, image_w, CV_8UC3); cv::resize(image, prodessed_image, cv::Size(image_w, image_h)); } std::vector
file_data = prodessed_image.reshape(1, 1); // 并发 unsigned long min_threads; if (worker < 0) { const unsigned long min_length = 64; min_threads = (image_h - 1) / min_length + 1; } else if (worker == 0) { min_threads = 1; } else { min_threads = worker; } const unsigned long cpu_max_threads = std::thread::hardware_concurrency(); const unsigned long num_threads = std::min(cpu_max_threads != 0 ? cpu_max_threads : 1, min_threads); const unsigned long block_size = image_h / num_threads; std::vector
threads(num_threads - 1); unsigned long block_start = 0; for (auto& t : threads) { t = std::thread(idxTransformParall, &file_data, &fileData, block_start, block_size, image_h, image_w, img_count * image_length, pFun, HWC); block_start += block_size; } idxTransformParall(&file_data, &fileData, block_start, image_h - block_start, image_h, image_w, img_count * image_length, pFun, HWC); for (auto& t : threads) { t.join(); } } return fileData;}//=========================main========================== startTime = clock(); //程序开始计时 float data[INPUT_H * INPUT_W * INPUT_C]; // 以单通道灰度格式读入 cv::ImreadModes mode = cv::ImreadModes::IMREAD_GRAYSCALE; if (INPUT_C == 3) // 以3通道彩色格式读入 mode = cv::ImreadModes::IMREAD_COLOR; cv::Mat img = cv::imread(locateFile(imgPath, gArgs.dataDirs), mode); gLogInfo << imgPath << " imgRawSize = " << img.size << std::endl; if (INPUT_C == 3){ float(*pFunc)(const unsigned char&); pFunc = [](const unsigned char& x) { return static_cast
(x) / 255; }; auto* pFuncc = pFunc; std::vector
imgData; imgData = imagePreprocess(std::vector
{ img}, INPUT_H, INPUT_W, doPadding, pFuncc, isHWC, 16); for (int i = 0; i < INPUT_H * INPUT_W * INPUT_C; i++) data[i] = (float)imgData[i]; //cv::cvtColor(img, img, cv::COLOR_BGR2RGB); // 改变颜色通道 //cv::resize(img, img, cv::Size(INPUT_W, INPUT_H), cv::INTER_CUBIC); //三次插值缩放 //for (int i = 0; i < INPUT_H * INPUT_W * INPUT_C; i++) // data[i] = float(img.data[i]) / 255.0f; } else { cv::resize(img, img, cv::Size(INPUT_W, INPUT_H), cv::INTER_CUBIC); //三次插值缩放 //if (INPUT_C == 3) //cv::cvtColor(img, img, cv::COLOR_BGR2RGB); // 改变颜色通道 for (int i = 0; i < INPUT_H * INPUT_W * INPUT_C; i++) data[i] = float(img.data[i]) / 255.0f; } endTime = clock(); //程序结束用时 gLogInfo << "img dealt, using " << (double)(endTime - startTime) / CLOCKS_PER_SEC << "s." << std::endl;
  1. 序列化引擎
    作用:当加载ONNX模型较慢时,可以先把ONNX转格式为ENGINE,然后直接加载引擎进行推理。
ICudaEngine* loadEngine(const std::string& engine, int DLACore, std::ostream& err, IRuntime* runtime){
IBuilder* builder = createInferBuilder(gLogger.getTRTLogger()); assert(builder != nullptr); nvinfer1::INetworkDefinition* network = builder->createNetwork(); auto parser = nvonnxparser::createParser(*network, gLogger.getTRTLogger()); std::ifstream engineFile(engine, std::ios::binary); if (!engineFile) {
err << "Error opening engine file: " << engine << std::endl; return nullptr; } engineFile.seekg(0, engineFile.end); long int fsize = engineFile.tellg(); engineFile.seekg(0, engineFile.beg); std::vector
engineData(fsize); engineFile.read(engineData.data(), fsize); if (!engineFile) {
err << "Error loading engine file: " << engine << std::endl; return nullptr; } if (DLACore != -1) {
runtime->setDLACore(DLACore); } ICudaEngine* Engine = runtime->deserializeCudaEngine(engineData.data(), fsize, nullptr); Engine->serialize(); //!!!序列化引擎!!! parser->destroy(); network->destroy(); builder->destroy(); return Engine;}bool saveEngine(const ICudaEngine& engine, const std::string& fileName, std::ostream& err){
std::ofstream engineFile(fileName, std::ios::binary); if (!engineFile) {
err << "Cannot open engine file: " << fileName << std::endl; return false; } IHostMemory* serializedEngine{
engine.serialize() }; if (serializedEngine == nullptr) {
err << "Engine serialization failed" << std::endl; return false; } engineFile.write(static_cast
(serializedEngine->data()), serializedEngine->size()); return !engineFile.fail();}//in mainengine = loadEngine(locateFile(enginePath, gArgs.dataDirs), gArgs.useDLACore, gLogError, runtime);
  1. 结果输出
if (saveOutputImg) {
//输出语义分割图片 cv::Mat imageOut = cv::Mat::zeros(INPUT_H, INPUT_W, CV_8UC3); for (int i = 0; i < INPUT_H; ++i) {
for (int j = 0; j < INPUT_W; ++j) {
imageOut.at
(i, j)[0] = int(prob[i * INPUT_W + j] * 255); imageOut.at
(i, j)[1] = int(prob[i * INPUT_W + j] * 255); imageOut.at
(i, j)[2] = int(prob[i * INPUT_W + j] * 255); } } cv::imwrite((locateFile(imgPath, gArgs.dataDirs).substr(0, (locateFile(imgPath, gArgs.dataDirs).length() - imgPath.length())) + "result.jpg"), imageOut); gLogInfo << "result saved." << std::endl; }
  1. 有报错stack flow
vs的项目属性->链接器->系统->堆栈保留大小,设置到160000000,可支持2000*2000*3
  1. 查看模型信息
//mInputDims = Engine->getBindingDimensions(Engine->getBindingIndex(inputLayerName));    //mOutputDims = Engine->getBindingDimensions(Engine->getBindingIndex(outputLayerName));    //gLogInfo << "getBindingIndex " << Engine->getBindingIndex(inputLayerName) << " " << Engine->getBindingIndex(outputLayerName) << std::endl;    //gLogInfo << "getNbLayers " << Engine->getNbLayers() << std::endl;    //gLogInfo << "getNbBindings " << Engine->getNbBindings() << std::endl;    //for (int i = 0; i < Engine->getNbBindings(); i++) {
// if(Engine ->bindingIsInput(i)) // mInputDims = Engine->getBindingDimensions(i); // else // mOutputDims = Engine->getBindingDimensions(i); //} //gLogInfo << "Input Dims = " << mInputDims.nbDims << std::endl; //for (int i = 0; i < mInputDims.nbDims; i++) {
// gLogInfo << "Dim " << i << " includes " << mInputDims.d[i] << std::endl; //} //gLogInfo << "Output Dims = " << mOutputDims.nbDims << std::endl; //for (int i = 0; i < mOutputDims.nbDims; i++) {
// gLogInfo << "Dim " << i << " includes " << mOutputDims.d[i] << std::endl; //}

转载地址:http://srzti.baihongyu.com/

你可能感兴趣的文章
【视频教程】Javascript ES6 教程27—ES6 构建一个Promise
查看>>
【5分钟代码练习】01—导航栏鼠标悬停效果的实现
查看>>
127个超级实用的JavaScript 代码片段,你千万要收藏好(中)
查看>>
127个超级实用的JavaScript 代码片段,你千万要收藏好(下)
查看>>
【web素材】03-24款后台管理系统网站模板
查看>>
Flex 布局教程:语法篇
查看>>
年薪50万+的90后程序员都经历了什么?
查看>>
2019年哪些外快收入可达到2万以上?
查看>>
【JavaScript 教程】标准库—Date 对象
查看>>
前阿里手淘前端负责人@winter:前端人如何保持竞争力?
查看>>
【JavaScript 教程】面向对象编程——实例对象与 new 命令
查看>>
我在网易做了6年前端,想给求职者4条建议
查看>>
SQL1015N The database is in an inconsistent state. SQLSTATE=55025
查看>>
RQP-DEF-0177
查看>>
MySQL字段类型的选择与MySQL的查询效率
查看>>
Java的Properties配置文件用法【续】
查看>>
JAVA操作properties文件的代码实例
查看>>
IPS开发手记【一】
查看>>
Java通用字符处理类
查看>>
文件上传时生成“日期+随机数”式文件名前缀的Java代码
查看>>