林楚海

CRNN文本识别与tensorflow实现

1.引言

文本识别即对一张文本图像进行识别，将其中的文字转化为文本信息，这样才能变成计算机可以理解的语言。前面我们介绍了两种文本检测方法，请参见《CTPN文本检测与tensorflow实现》、《EAST文本检测与Keras实现》，在文本检测之后，我们可以获得了一张图像中各个文本的位置，这时，我们可以将各个文本片段剪切出来，进行仿射变换，得到类似图1这样的文本图像，但是，这时计算机还是没法理解图像中具体是什么文字，因此，需要进行文本识别，即将图像中的文本转化为纯文本，我们平时见到的验证码识别其实也是文字识别的一种场景。

图1 从自然场景图像中剪切出来的文本片段

在以往的文本识别模型中，习惯是采用一种滑动窗口的方式，逐步检测每个窗口下的文本，这种做法对于不同的字体、字体检测效果就特别差，特别对于中文文字的识别。然后也有一些模型采用对齐的方式，对图像的每一帧都进行文本标注，然后采用类似encoder-decoder这样的结构来进行文本识别，但是这样的做法需要耗费大量的人力进行对齐标注，特别是当文本前后带有空白字符时，标注起来就特别繁琐。因此，文本将介绍一个在文本识别中效果相对比较好的模型——CRNN，该模型不需要对图像进行对齐标注，直接输入文本图像，然后就可以输出对应的识别结果，而且准确率非常高！

2.模型介绍

2.1 模型结构介绍

CRNN的模型结构总共包含三部分，分别是卷积层、RNN层和转录层，如图2所示。

图2 CRNN模型结构

在卷积层部分，首先将每一张图像的高度固定在某一个值，然后对图像进行卷积操作，接着，对于卷积后得到的feature maps构建RNN层的输入特征序列，具体的操作就是，将这些feature maps从左到右每次取出一列，然后将每个feature map对应该列的向量进行拼接，拼接后的向量就作为RNN该时间步对应的特征输入。由于卷积后得到的feature maps每一列都对应原图的一个矩形区域，因此，按照这种操作得到的feature Sequence中每一个向量其实也是与原图的某个矩形区域相对应，并且这些矩形区域也是按照从左到右顺序排列的，因此，每个特征向量之间其实是带有时序关系的。如图3所示。

图3 卷积层得到的特征序列与原图区域的对应关系

接着，是模型的RNN层部分，由前面我们知道，卷积层结束后得到的feature Sequence中，每个向量之间是具有时序关系的，不是独立的，因此，很自然就会想到用RNN来操作，作者在论文中采用的是深层双向递归神经网络，其中RNN单元采用的是LSTM单元，如图4所示。引入RNN主要有三个好处：①有些比较大的字符同时横跨多列，采用RNN可以记住前面序列的信息，另外，有些字符放在一起时，可以进行高度对比，更容易识别出其标签，比如‘i’和‘l’。②RNN可以将误差传递给CNN层，从而使得模型可以同时训练RNN和CNN的参数。③RNN可以解决文本序列变长的问题。

图4 LSTM单元和深层双向RNN

假设在卷积层得到的feature Sequence为 $\mathbf { x } = x _ { 1 } , \dots , x _ { T }$ ，则对于每个时间步的输入 $x _ { t }$ ，RNN将输出该时间步对应的类别分布 $y _ { t }$ ，其中 $y _ { t }$ 的长度即为所有字符类别的长度。记RNN层得到的输出序列为 $y= y _ { 1 } , \dots , y _ { T }$ ，其中T为序列的长度，其中， $y _ { t } \in \Re ^ { \left| \mathcal { L } ^ { \prime } \right| }$ 表示第t个时间步的字符类别概率分布， $\mathcal { L } ^ { \prime } =\mathcal { L } \cup$ 表示所有字符类别和空字符的集合。这里可能有人会觉得，既然已经输出了各个时间步的输出，那么可不可以像机器翻译那样，直接对输出序列的前后标记start和end字符，然后从输出里面进行截取，获得预测的标签序列，这么想是可以的，不过呢，就需要人为对整个图像每个时间步对应的感受野事先标记好其标签，会产生很繁琐的手工标注工作，因此，作者并没有这样操作，而是采用了一种转录方法，即模型中的转录层。

在转录层，作者引入了一个 $\mathcal { B }$ 变换，即对于一个字符序列 $\pi \in \mathcal { L } ^ { \prime T }$ ， $\mathcal { B }$ 变换会将其中的重复字符、空字符移除，得到最后的字符序列，比如对于预测序列“--hh-e-l-ll-oo--”，其中“-”表示空字符，则经过 $\mathcal { B }$ 变换后得到的输出为“hello”，这里需要注意的是，当两个字符相同，并且中间隔着“-”时，则去重时不移除，因此，的条件概率即为那些经过 $\mathcal { B }$ 变换后得到的字符序列 $\pi$ 的概率加总，具体表达式如下:

$p ( l | \mathbf { y } ) = \sum _ { \boldsymbol { \pi } : \mathcal { B } ( \boldsymbol { \pi } ) = l } p ( \pi | \mathbf { y } )$

其中， $p ( \pi | y ) =\prod _ { t = 1 } ^ { T } y _ { \pi _ { t } } ^ { t }$ 为每个字符序列中每个字符概率的乘积， $y _ { \pi _ { t } } ^ { t }$ 表示第t个时间步为字符 $\pi _ { t }$ 的概率，但是，这种算法将非常耗时，因此，作者借鉴了CTC中的forward-backward的算法使其更有效率。

关于CTC中forward-backward的算法原理介绍可以参见我另一篇博文《CTC原理介绍》，这里不再具体展开。

转录的时候有两种方式，一种是无词典的转录方式，一种是基于词典的转录方式。

对于无词典的转录方式，其计算公式如下：

$l ^ { * } \approx \mathcal { B } \left( \arg \max _ { \pi } p ( \pi | \mathbf { y } ) \right)$

其实就是对每个时间步选择概率最大的字符，最后将该字符序列用 $\mathcal { B }$ 变换得到对应的。

对于基于词典的转录方式，其思想是构建一个词典集，然后计算词典中每个字符序列的概率，从中选择概率最大的作为最终的转录文本，其计算公式如下：

$l ^ { * } =\arg \max _ { \mathrm { l } \in \mathcal { D } } p ( \mathrm { l } | \mathrm { y } )$

其中， $\mathcal{D}$ 即为构建的词典集，基于这种计算方法有个缺点，就是当词典集比较大时，计算复杂度比较大，因此，作者提出了一种改进方法，作者发现基于无词典的转录方式其实与真实的标签很接近，因此，作者首先采用无词典的转录方式获得转录文本 $l ^ { \prime }$ ，然后用BK-tree从词典集中搜索与它编辑距离（有关编辑距离的概念可以参考这篇文章：《Edit Distance(编辑距离)》）小于 $\delta$ 的词典，记为 $\mathcal { N } _ { \delta } \left( \mathrm { l } ^ { \prime } \right)$ ，然后再从近邻词典里面计算每个字符序列的概率，选择概率最大的作为最后的转录文本，其计算公式如下：

$\mathrm { l} ^ { * } = \arg \max _ { \mathrm {l} \in \mathcal { N } _ { \delta } \left( \mathrm { l} ^ { \prime } \right) } p ( \mathrm { l } | \mathrm { y } )$

2.2 模型的损失函数

CRNN的损失函数采用的是负对数似然函数，记训练集为 $\mathcal { X } = \left\{ I _ { i } , l _ { i } \right\} _ { i }$ ，其中， $I _ { i }$ 表示输入的图像， $l _ { i }$ 表示真实的字符序列，则对应的损失函数为：

$\mathcal { O } = - \sum _ { I _ { i } , \mathbf { l } _ { i } \in \mathcal { X } } \log p \left( \mathbf { l } _ { i } | \mathbf { y } _ { i } \right)$

3.tensorflow实现

本文采用tensorflow对CRNN原理进行复现，项目的结构如图5所示，下面将对每个模块进行具体介绍。

图5 项目结构

首先是data路径，存放的是训练集和测试集，train_images存放的是训练时的数据集，test_images存放的是测试时的数据集，本文的数据有两种来源，一种是ICPR比赛数据集，一种是模拟的数据集。

图6 data路径下结构

dict下存放的是字符集文档，有三种可以选择，chinese.txt存放的是中文常用3000字，english.txt存放的是英文字母以及一些标点符号，而english_chinese.txt则是前面两个文档的集合，当选择english_chinese.txt时，则支持对中英文的文本识别，本文训练时默认使用的是english_chinese.txt。

图7 字符集合文档

fonts路径存放的是生成模拟数据时的字体文件，window系统一般可以在C:\Windows\Fonts下查找，这个可以自己选择字体文件。

图8 字体文件

images_base存放的是模拟数据的背景图像，models文件夹存放的是训练后的模型文件。接着，是各个py脚本文件的功能介绍，其中，charset_generate.py，该脚本存放的是字符集文本生成函数，从图像的label中提取字符集合，存生成charset.txt存放在data路径下。其代码如下：

import tqdm
from crnn import config as crnn_config


def generate_charset(labels_path, charset_path):
    """
    generate char dictionary with text label
    :param labels_path:label_path: path of your text label
    :param charset_path: path for restore char dict
    :return:
    """
    with open(labels_path, 'r', encoding='utf-8') as fr:
        lines = fr.read().split('\n')
    dic = str()
    for label in tqdm.tqdm(lines[:-1]):
        for char in label:
            if char in dic:
                continue
            else:
                dic += char
    with open(charset_path, 'w', encoding='utf-8')as fw:
        fw.write(dic)


if __name__ == '__main__':
    label_path = crnn_config.train_label_path
    char_dict_path = crnn_config.charset_path
    generate_charset(label_path, char_dict_path)

然后是data_provider.py文件，该文件一方面用于从自然场景图像中对文本进行切割，然后进行放射片段，并保存到data下的训练集和测试集路径下，用于训练和测试时使用，另一方面用于生成模拟的数据，模拟的数据也同样会存放在训练集路劲下。

import os
import cv2
import math
import random
import shutil
import numpy as np
from tqdm import trange
from collections import Counter
from crnn import charset_generate
from multiprocessing import Process
from crnn import config as crnn_config
from PIL import Image, ImageDraw, ImageFont


class TextCut(object):
    def __init__(self,
                 org_images_path,
                 org_labels_path,
                 cut_train_images_path,
                 cut_train_labels_path,
                 cut_test_images_path,
                 cut_test_labels_path,
                 train_test_ratio=0.8,
                 filter_ratio=1.5,
                 filter_height=25,
                 is_transform=True,
                 angle_range=[-15.0, 15.0],
                 write_mode='w',
                 use_blank=False,
                 num_process=1):
        """
            对ICPR原始图像进行切图
            :param org_images_path: ICPR数据集原始图像路径，[str]
            :param org_labels_path: ICPR数据集原始label路径，[str]
            :param cut_train_images_path: 训练集切图的保存路径，[str]
            :param cut_train_labels_path: 训练集切图对应label的保存路径，[str]
            :param cut_test_images_path: 测试集切图的保存路径，[str]
            :param cut_test_labels_path: 测试集切图对应label的保存路径，[str]
            :param train_test_ratio: 训练测试数据集比例，[float]
            :param filter_ratio: 图片过滤的高宽比例，高于该比例的图片将被过滤，default:1.5 ，[float]
            :param filter_height:高度过滤，切图后的图像高度低于该值的将被过滤掉，[int]
            :param is_transform: 是否进行仿射变换，default:True [boolean]
            :param angle_range: 不进行仿射变换的角度范围default:[-15.0, 15.0]，[list]
            :param write_mode: 数据写入模式，'w':write,'a':add，[str]
            :param use_blank: 是否使用空格,[boolean]
            :param num_process: 并行处理的进程数
            :return:
        """
        self.org_images_path = org_images_path
        self.org_labels_path = org_labels_path
        self.cut_train_images_path = cut_train_images_path
        self.cut_train_labels_path = cut_train_labels_path
        self.cut_test_images_path = cut_test_images_path
        self.cut_test_labels_path = cut_test_labels_path
        self.train_test_ratio = train_test_ratio
        self.filter_ratio = filter_ratio
        self.filter_height = filter_height
        self.is_transform = is_transform
        self.angle_range = angle_range
        assert write_mode in ['w', 'a'], "write mode should be 'w'(write) or 'a'(add)"
        self.write_mode = write_mode
        self.use_blank = use_blank
        self.num_process = num_process
        self.org_labels_list = None
        super().__init__()

    def data_load(self, org_images_list):
        """
        对ICPR图像做文本切割处理
        :param org_images_list: 原始图片文件名
        :return:
        """
        data_len = len(org_images_list)
        train_test_offset = data_len * self.train_test_ratio
        for data_i in range(len(org_images_list)):
            org_image_path = org_images_list[data_i]
            org_image_name = os.path.basename(org_image_path)[:-4]
            org_label_path = org_image_name + ".txt"
            if org_label_path not in self.org_labels_list:
                continue
            org_image = Image.open(os.path.join(self.org_images_path, org_image_path))
            with open(os.path.join(self.org_labels_path, org_label_path), 'r', encoding='utf-8') as fr:
                org_label = fr.read().split('\n')
            cut_images_list, cut_labels_list = self.cut_text(org_image, org_label,
                                                             self.filter_ratio,
                                                             self.is_transform,
                                                             self.angle_range)
            if data_i < train_test_offset:
                img_save_path = self.cut_train_images_path
                label_save_path = self.cut_train_labels_path
            else:
                img_save_path = self.cut_test_images_path
                label_save_path = self.cut_test_labels_path
            for i in range(len(cut_images_list)):
                cut_img = cut_images_list[i]
                if cut_img.shape[0] >= self.filter_height:
                    cut_img = Image.fromarray(cut_img)
                    cut_img = cut_img.convert('RGB')
                    cut_label = cut_labels_list[i]
                    cut_img_name = org_image_name + '_' + str(i) + '.jpg'
                    cut_img.save(os.path.join(img_save_path, cut_img_name))
                    with open(label_save_path, 'a', encoding='utf-8') as fa:
                        fa.write(cut_img_name + '\t' + cut_label + '\n')

    def data_load_multi_process(self, num_process=None):
        """
        多进程对ICPR图像做文本切割处理
        :param num_process:进程数，默认16,[int]
        :return:
        """
        if num_process is None:
            num_process = self.num_process
        org_images_list = os.listdir(self.org_images_path)
        self.org_labels_list = os.listdir(self.org_labels_path)
        # clear label.txt at first step
        check_path([self.cut_train_images_path,
                    self.cut_train_labels_path,
                    self.cut_test_images_path,
                    self.cut_test_labels_path])
        if self.write_mode == 'w':
            clear_content([self.cut_train_images_path,
                           self.cut_train_labels_path,
                           self.cut_test_images_path,
                           self.cut_test_labels_path])
        all_data_len = len(org_images_list)
        data_offset = all_data_len // num_process
        processes = list()
        for data_i in trange(0, all_data_len, data_offset):
            if data_i + data_offset >= all_data_len:
                processes.append(Process(target=self.data_load, args=(org_images_list[data_i:],)))
            else:
                processes.append(Process(target=self.data_load, args=(org_images_list[data_i:data_i + data_offset],)))
        for process in processes:
            process.start()
        for process in processes:
            process.join()

    def cut_text(self, image, labels, filter_ratio, is_transform, angle_range):
        """
        文本切图
        :param image: 原始图像，[array]
        :param labels: 文本的label，[str]
        :param filter_ratio: 图片过滤的高宽比例，高于该比例的图片将被过滤，e.g. 1.5 ，[float]
        :param is_transform: 是否进行仿射变换，[boolean]
        :param angle_range: 不进行仿射变换的角度范围e.g.[-15.0, 15.0]，[list]
        :return:
        """
        cut_images = list()
        cut_labels = list()
        w, h = image.size
        for label in labels:
            if label == '':
                continue
            label_text = label.split(',')
            text = label_text[-1]
            if not self.use_blank:
                text = text.replace(' ', '')
            if text == '###' or text == '★' or text == '':
                continue
            position = self.reorder_vertexes(
                np.array([[round(float(label_text[i])), round(float(label_text[i + 1]))] for i in range(0, 8, 2)]))
            position = np.reshape(position, 8).tolist()
            left = max(min([position[i] for i in range(0, 8, 2)]), 0)
            right = min(max([position[i] for i in range(0, 8, 2)]), w)
            top = max(min([position[i] for i in range(1, 8, 2)]), 0)
            bottom = min(max([position[i] for i in range(1, 8, 2)]), h)
            if (bottom - top) / (right - left + 1e-3) > filter_ratio:
                continue
            image = np.asarray(image)
            cut_image = image[top:bottom, left:right]
            if is_transform:
                trans_img = self.transform(image, position, angle_range)
                if trans_img is not None:
                    cut_image = trans_img
            cut_images.append(cut_image)
            cut_labels.append(text)
        return cut_images, cut_labels

    def transform(self, image, position, angle_range):
        """
        仿射变换
        :param image: 原始图像，[array]
        :param position: 文本所在的位置e.g.[x0,y0,x1,y1,x2,y2]，[list]
        :param angle_range: 不进行仿射变换的角度范围e.g.[-15.0, 15.0]，[list]
        :return: 变换后的图像
        """
        from_points = [position[2:4], position[4:6]]
        width = round(float(self.calc_dis(position[2:4], position[4:6])))
        height = round(float(self.calc_dis(position[2:4], position[0:2])))
        to_points = [[0, 0], [width, 0]]
        from_mat = self.list2col_matrix(from_points)
        to_mat = self.list2col_matrix(to_points)
        tran_m, tran_b = self.get_transform(from_mat, to_mat)
        probe_vec = np.matrix([1.0, 0.0]).transpose()
        probe_vec = tran_m * probe_vec
        scale = np.linalg.norm(probe_vec)
        angle = 180.0 / np.pi * math.atan2(probe_vec[1, 0], probe_vec[0, 0])
        if (angle > angle_range[0]) and (angle < angle_range[1]):
            return None
        else:
            from_center = position[2:4]
            to_center = [0, 0]
            dx = to_center[0] - from_center[0]
            dy = to_center[1] - from_center[1]
            trans_m = cv2.getRotationMatrix2D((from_center[0], from_center[1]), -1 * angle, scale)
            trans_m[0][2] += dx
            trans_m[1][2] += dy
            dst = cv2.warpAffine(image, trans_m, (int(width), int(height)))
            return dst

    def get_transform(self, from_shape, to_shape):
        """
        计算变换矩阵A,使得y=A*x
        :param from_shape: 变换之前的形状x，形式为矩阵，[list]
        :param to_shape: 变换之后的形状y，形式为矩阵，[list]
        :return: A
        """
        assert from_shape.shape[0] == to_shape.shape[0] and from_shape.shape[0] % 2 == 0
        sigma_from = 0.0
        sigma_to = 0.0
        cov = np.matrix([[0.0, 0.0], [0.0, 0.0]])
        # compute the mean and cov
        from_shape_points = from_shape.reshape(from_shape.shape[0] // 2, 2)
        to_shape_points = to_shape.reshape(to_shape.shape[0] // 2, 2)
        mean_from = from_shape_points.mean(axis=0)
        mean_to = to_shape_points.mean(axis=0)
        for i in range(from_shape_points.shape[0]):
            temp_dis = np.linalg.norm(from_shape_points[i] - mean_from)
            sigma_from += temp_dis * temp_dis
            temp_dis = np.linalg.norm(to_shape_points[i] - mean_to)
            sigma_to += temp_dis * temp_dis
            cov += (to_shape_points[i].transpose() - mean_to.transpose()) * (from_shape_points[i] - mean_from)
        sigma_from = sigma_from / to_shape_points.shape[0]
        sigma_to = sigma_to / to_shape_points.shape[0]
        cov = cov / to_shape_points.shape[0]
        # compute the affine matrix
        s = np.matrix([[1.0, 0.0], [0.0, 1.0]])
        u, d, vt = np.linalg.svd(cov)
        if np.linalg.det(cov) < 0:
            if d[1] < d[0]:
                s[1, 1] = -1
            else:
                s[0, 0] = -1
        r = u * s * vt
        c = 1.0
        if sigma_from != 0:
            c = 1.0 / sigma_from * np.trace(np.diag(d) * s)
        tran_b = mean_to.transpose() - c * r * mean_from.transpose()
        tran_m = c * r
        return tran_m, tran_b

    def list2col_matrix(self, pts_list):
        """
        列表转为列矩阵
        :param pts_list:点列表e.g[x0,y0,x1,y1,x2,y1],[list]
        :return:
        """
        assert len(pts_list) > 0
        col_mat = []
        for i in range(len(pts_list)):
            col_mat.append(pts_list[i][0])
            col_mat.append(pts_list[i][1])
        col_mat = np.matrix(col_mat).transpose()
        return col_mat

    def calc_dis(self, point1, point2):
        """
        计算两个点的欧式距离
        :param point1:二维坐标e.g.[12.3, 34.1],list
        :param point2:二维坐标e.g.[12.3, 34.1],list
        :return:两个点的欧式距离
        """
        return np.sqrt((point2[1] - point1[1]) ** 2 + (point2[0] - point1[0]) ** 2)

    def reorder_vertexes(self, xy_list):
        """
        对文本线的四个顶点坐标进行重新排序，按照逆时针排序
        :param xy_list: 文本线的四个顶点坐标, [array]
        :return:
        """
        reorder_xy_list = np.zeros_like(xy_list)

        # 确定第一个顶点的坐标，选择横坐标最小的作为第一个顶点
        ordered = np.argsort(xy_list, axis=0)
        xmin1_index = ordered[0, 0]
        xmin2_index = ordered[1, 0]
        if xy_list[xmin1_index, 0] == xy_list[xmin2_index, 0]:
            if xy_list[xmin1_index, 1] <= xy_list[xmin2_index, 1]:
                reorder_xy_list[0] = xy_list[xmin1_index]
                first_v = xmin1_index
            else:
                reorder_xy_list[0] = xy_list[xmin2_index]
                first_v = xmin2_index
        else:
            reorder_xy_list[0] = xy_list[xmin1_index]
            first_v = xmin1_index

        # 计算另外三个顶点与第一个顶点的正切，将值处于中间的顶点作为第三个顶点
        others = list(range(4))
        others.remove(first_v)
        k = np.zeros((len(others),))
        for index, i in zip(others, range(len(others))):
            k[i] = (xy_list[index, 1] - xy_list[first_v, 1]) \
                   / (xy_list[index, 0] - xy_list[first_v, 0] + crnn_config.epsilon)
        k_mid = np.argsort(k)[1]
        third_v = others[k_mid]
        reorder_xy_list[2] = xy_list[third_v]

        # 比较第二个顶点与第四个顶点与第一个顶点的正切与第三个顶点与第一个顶点的正切的大小，
        # 将大于中间值的顶点作为第二个顶点，另一个作为第四个顶点
        others.remove(third_v)
        b_mid = xy_list[first_v, 1] - k[k_mid] * xy_list[first_v, 0]
        second_v, fourth_v = 0, 0
        for index, i in zip(others, range(len(others))):
            # delta = y - (k * x + b)
            delta_y = xy_list[index, 1] - (k[k_mid] * xy_list[index, 0] + b_mid)
            if delta_y > 0:
                second_v = index
            else:
                fourth_v = index
        reorder_xy_list[1] = xy_list[second_v]
        reorder_xy_list[3] = xy_list[fourth_v]

        # 判断是否需要对顶点进行旋转，当第一个顶点是四边形的左下顶点时，则按照逆时针旋转一个单位
        k13 = k[k_mid]
        k24 = (xy_list[second_v, 1] - xy_list[fourth_v, 1]) / (
                xy_list[second_v, 0] - xy_list[fourth_v, 0] + crnn_config.epsilon)
        if k13 < k24:
            tmp_x, tmp_y = reorder_xy_list[3, 0], reorder_xy_list[3, 1]
            for i in range(2, -1, -1):
                reorder_xy_list[i + 1] = reorder_xy_list[i]
            reorder_xy_list[0, 0], reorder_xy_list[0, 1] = tmp_x, tmp_y
        return [reorder_xy_list[1], reorder_xy_list[0], reorder_xy_list[3], reorder_xy_list[2]]


class ImageGenerate(object):
    def __init__(self,
                 img_base_path,
                 font_style_path,
                 text_size_limit,
                 font_size,
                 font_color,
                 train_images_path,
                 train_labels_path,
                 test_images_path,
                 test_labels_path,
                 train_test_ratio,
                 num_samples,
                 dictionary_file,
                 margin=20,
                 write_mode='w',
                 use_blank=False,
                 num_process=1):
        """
        生成类代码图像
        :param img_base_path: 背景文件夹路径，[str]
        :param font_style_path: 字体文件夹路径，包括中英文字体文件夹，[dict]
        :param text_size_limit: 文本字符个数范围列表e.g.[1,8]，[list]
        :param font_size: 文本字体大小列表e.g.[24,32,36]，[list]
        :param font_color: 文本字体颜色列表e.g.[[0, 0, 0], [255, 36, 36]]，[list]
        :param train_images_path: 训练集图片保存路径，[str]
        :param train_labels_path: 训练集标签保存路径，[str]
        :param test_images_path:测试集图片保存路径，[str]
        :param test_labels_path:测试集标签保存路径，[str]
        :param train_test_ratio: 训练集测试集比例，[float]
        :param num_samples: 生成样本总数，[int]
        :param dictionary_file: 字典文件路径,[str]
        :param margin: 文本离背景图的边距
        :param write_mode: 数据写入模式，'w':write,'a':add，[str]
        :param use_blank: 是否使用空格,[boolean]
        :param num_process: 并行生成样本的进程数
        """
        self.img_base_path = img_base_path
        self.font_style_path = font_style_path
        self.text_size_limit = text_size_limit
        self.font_size = font_size
        self.font_color = font_color
        self.train_images_path = train_images_path
        self.train_labels_path = train_labels_path
        self.test_images_path = test_images_path
        self.test_labels_path = test_labels_path
        self.train_test_ratio = train_test_ratio
        self.num_samples = num_samples
        self.dictionary_file = dictionary_file
        assert write_mode in ['w', 'a'], "write mode should be 'w'(write) or 'a'(add)"
        self.write_mode = write_mode
        self.use_blank = use_blank
        self.num_process = num_process
        self.margin = margin
        self.base_image_paths = None
        self.list_words = None
        self.used_ch_word = list()
        self.ch_fonts_list = os.listdir(self.font_style_path['ch'])
        self.en_fonts_list = os.listdir(self.font_style_path['en'])
        super().__init__()

    def generate_image(self, start_end):
        """
        生成样本图片并保存
        :param start_end: 开始ID和结尾ID的list,[list]
        :return:
        """
        # check dir and files
        train_test_offset = start_end[0] + (start_end[1] - start_end[0]) * self.train_test_ratio
        for i in range(start_end[0], start_end[1]):
            # get base image by order
            base_img_path = self.base_image_paths[
                (i - start_end[0]) * len(self.base_image_paths) // (start_end[1] - start_end[0])]

            # choice font_color depend on base image
            if os.path.basename(base_img_path).split('_')[1] == '0':
                font_color = random.choice(self.font_color[3:])
            elif os.path.basename(base_img_path).split('_')[1] == '1':
                font_color = random.choice(self.font_color[0:6] + self.font_color[12:])
            elif os.path.basename(base_img_path).split('_')[1] == '2':
                font_color = random.choice(self.font_color[0:12] + self.font_color[15:])
            elif os.path.basename(base_img_path).split('_')[1] == '3':
                font_color = random.choice(self.font_color[0:16])

            # create image draw
            base_img = Image.open(base_img_path)
            base_img_width, base_img_height = base_img.size
            draw = ImageDraw.Draw(base_img)
            while 1:
                try:
                    # randomly choice font size
                    font_size = random.choice(self.font_size)
                    # randomly choice words str
                    words_str_len = random.randint(self.text_size_limit[0], self.text_size_limit[1])
                    only_latin, words_str = self.get_word_str(words_str_len)
                    # randomly choice font style
                    if only_latin:
                        font_style_path = random.choice(self.en_fonts_list)
                        font_style_path = os.path.join(self.font_style_path['en'], font_style_path)
                    else:
                        font_style_path = random.choice(self.ch_fonts_list)
                        font_style_path = os.path.join(self.font_style_path['ch'], font_style_path)

                    font = ImageFont.truetype(font_style_path, font_size)
                    words_str_width, words_str_height = draw.textsize(words_str, font)
                    x0 = random.randint(self.margin, base_img_width - self.margin - words_str_width)
                    y0 = random.randint(self.margin, base_img_height - self.margin - words_str_height)
                    draw.text((x0, y0), words_str, tuple(font_color), font=font)
                    # save Image
                    x_left = x0 - random.randint(0, self.margin)
                    y_top = y0 - random.randint(0, self.margin)
                    x_right = x0 + words_str_width + random.randint(0, self.margin)
                    y_bottom = y0 + words_str_height + random.randint(0, self.margin)
                    base_img = np.asarray(base_img)[:, :, 0:3]
                    image = base_img[y_top:y_bottom, x_left:x_right]
                    image = Image.fromarray(image)
                    if i < train_test_offset:
                        image_dir = self.train_images_path
                        labels_path = self.train_labels_path
                    else:
                        image_dir = self.test_images_path
                        labels_path = self.test_labels_path
                    image_name = 'img_' + str(i).zfill(len(str(self.num_samples))) + '.jpg'
                    image_save_path = os.path.join(image_dir, image_name)
                    image.save(image_save_path)
                    # save labels
                    with open(labels_path, 'a', encoding='utf-8')as fa:
                        fa.write(image_name + '\t' + words_str + '\n')
                    break
                except Exception as e:
                    continue

    def generate_image_multi_process(self, num_process=None):
        """
        多进程生成样本图片并保存
        :return:
        """
        if num_process is None:
            num_process = self.num_process
        self.base_image_paths = [os.path.join(self.img_base_path, img) for img in
                                 os.listdir(self.img_base_path)]
        words = [Counter(extract_words_i) for extract_words_i in
                 self.extract_words(open(self.dictionary_file, encoding="utf-8").read())]
        self.list_words = [list(words_i.keys()) for words_i in words]
        # check dir and files
        check_path([self.train_images_path,
                    self.train_labels_path,
                    self.test_images_path,
                    self.test_labels_path])
        if self.write_mode == 'w':
            clear_content([self.train_images_path,
                           self.train_labels_path,
                           self.test_images_path,
                           self.test_labels_path])
        data_offset = self.num_samples // num_process
        processes = list()
        for i in trange(0, self.num_samples, data_offset):
            if i + data_offset >= self.num_samples:
                processes.append(Process(target=self.generate_image, args=([i, self.num_samples],)))
            else:
                processes.append(Process(target=self.generate_image, args=([i, i + data_offset],)))
        for process in processes:
            process.start()
        for process in processes:
            process.join()

    def extract_words(self, text):
        """
        提取文字
        :param text:all char about en and ch divided by \n
        :return:word_list,e.g[['1','2',..],['a','b',...,'A','B',...],[',','!',...],['甲','风',...]]
        """
        words_list = text.split('\n')
        words_list = [i.replace(' ', '') for i in words_list]
        words_list = [[j for j in i] for i in words_list]
        if self.use_blank:
            words_list.append([' '])
        return words_list

    def get_word_str(self, length):
        """
        generate word str randomly
        :param length: length of word str
        :return:
        """
        word_str = ''
        self.used_ch_word = list()
        only_latin = False
        # only latin char
        if random.random() < 0.2:
            for i in range(length):
                if self.use_blank and (i == 0 or i == length - 1):
                    words_list_i = random.choice(self.list_words[:3])
                else:
                    if self.use_blank and random.random() < 0.2:
                        words_list_i = random.choice(self.list_words[:3] + self.list_words[-1])
                    else:
                        words_list_i = random.choice(self.list_words[:3])
                word_str += random.choice(words_list_i)
            only_latin = True
        else:
            for i in range(length):
                if self.use_blank and (i == 0 or i == length - 1):
                    words_list_i = random.choice(self.list_words[:-1])
                else:
                    if self.use_blank and random.random() < 0.2:
                        words_list_i = random.choice(self.list_words)
                    else:
                        words_list_i = random.choice(self.list_words[:-1])
                word_str += random.choice(words_list_i)
        return only_latin, word_str


def check_path(path_list):
    """
    检查路径列表中的路径是否存在，如不存在就生存文件夹或者文件
    :param path_list: path list,[list]
    :return:
    """
    for path in path_list:
        if not os.path.exists(path) and '.' not in path[2:]:
            os.mkdir(path)
        elif not os.path.exists(path) and '.' in path[2:]:
            with open(path, 'w', encoding='utf-8') as fw:
                fw.write('')


def clear_content(path_list):
    """
    清空文件夹和文件内容
    :param path_list: path list,[list]
    :return:
    """
    for path in path_list:
        if os.path.isdir(path):
            shutil.rmtree(path)
            os.mkdir(path)
        elif os.path.isfile(path):
            os.remove(path)
            with open(path, 'w', encoding='utf-8') as fw:
                fw.write('')


def do_text_cut(write_mode):
    print("{0}".format('text cutting...').center(100, '='))
    print('train_test_ratio={0}\nfilter_ratio={1}\nfilter_height={2}'
          '\nis_transform={3}\nangle_range={4}\nwrite_mode={5}\nuse_blank={6}\nnum_process={7}'.format(
        crnn_config.train_test_ratio,
        crnn_config.filter_ratio,
        crnn_config.filter_height,
        crnn_config.is_transform,
        crnn_config.angle_range,
        write_mode,
        crnn_config.use_blank,
        crnn_config.num_process))
    print('=' * 100)
    text_cut = TextCut(org_images_path=crnn_config.org_images_path,
                       org_labels_path=crnn_config.org_labels_path,
                       cut_train_images_path=crnn_config.cut_train_images_path,
                       cut_train_labels_path=crnn_config.cut_train_labels_path,
                       cut_test_images_path=crnn_config.cut_test_images_path,
                       cut_test_labels_path=crnn_config.cut_test_labels_path,
                       train_test_ratio=crnn_config.train_test_ratio,
                       filter_ratio=crnn_config.filter_ratio,
                       filter_height=crnn_config.filter_height,
                       is_transform=crnn_config.is_transform,
                       angle_range=crnn_config.angle_range,
                       write_mode=write_mode,
                       use_blank=crnn_config.use_blank,
                       num_process=crnn_config.num_process
                       )
    text_cut.data_load_multi_process()


def do_image_generate(write_mode):
    print("{0}".format('image generating...').center(100, '='))
    print('train_test_ratio={0}\nnum_samples={1}\nmargin={2}\nwrite_mode={3}\nuse_blank={4}\nnum_process={5}'
          .format(crnn_config.train_test_ratio, crnn_config.num_samples, crnn_config.margin, write_mode, crnn_config.use_blank,
                  crnn_config.num_process))
    image_generate = ImageGenerate(img_base_path=crnn_config.base_img_dir,
                                   font_style_path=crnn_config.font_style_path,
                                   text_size_limit=crnn_config.text_size_limit,
                                   font_size=crnn_config.font_size,
                                   font_color=crnn_config.font_color,
                                   train_images_path=crnn_config.train_images_path,
                                   train_labels_path=crnn_config.train_label_path,
                                   test_images_path=crnn_config.test_images_path,
                                   test_labels_path=crnn_config.test_label_path,
                                   train_test_ratio=crnn_config.train_test_ratio,
                                   num_samples=crnn_config.num_samples,
                                   dictionary_file=crnn_config.dictionary_file,
                                   margin=crnn_config.margin,
                                   write_mode=write_mode,
                                   use_blank=crnn_config.use_blank,
                                   num_process=crnn_config.num_process)
    image_generate.generate_image_multi_process()


def do_generate_charset(label_path, charset_path):
    """
    生成字符集文件
    :param label_path: 训练的label地址
    :param charset_path: 字符集文件地址
    :return:
    """
    print("{0}".format('charset generating...').center(100, '='))
    print('label_path={0}\ncharset_path={1}'.format(label_path, charset_path))
    print('=' * 100)
    charset_generate.generate_charset(label_path, charset_path)


if __name__ == '__main__':
    do_text_cut(write_mode='w')
    do_image_generate(write_mode='a')
    # do_generate_charset(crnn_config.train_label_path, crnn_config.charset_path)

data_gengretor.py则存放的是一些数据的预处理函数，用于训练和测试时调用。

import re
import os
import PIL
import math
import numpy as np
from PIL import Image
from crnn.config import seed
from captcha.image import ImageCaptcha


def get_img_label(label_path, images_path):
    """
    获取图像路径列表和图像标签列表
    :param label_path: 图像路径、标签存放文件对应的路径. [str]
    :param images_path: 图像路径. [str]
    :return:
    """
    with open(label_path, 'r', encoding='utf-8') as f:
        lines = f.read()
    lines = lines.split('\n')
    img_path_list = []
    img_label_list = []
    for line in lines[:-1]:
        this_img_path, this_img_label = line.split('\t')
        this_img_path = os.path.join(images_path, this_img_path)
        img_path_list.append(this_img_path)
        img_label_list.append(this_img_label)
    return img_path_list, img_label_list


def get_charsets(dict=None, mode=1, charset_path=None):
    """
    生成字符集
    :param mode: 当mode=1时，则生成实时验证码进行训练，此时生成验证码的字符集存放在dict路径下的charsets.txt下，
                 当mode=2时，则采用真实场景的图像进行训练，此时会读取data文件夹下label.txt中所有的文本标签，
                 然后汇总去重得到所有的字符集
    :param dict: 字符集文件路径
    :param charset_path: 字符集文件存储路径，only use with mode=2
    :return:
    """
    if mode == 1:
        with open(dict, 'r', encoding='utf-8') as f:
            lines = f.readlines()
        charsets = ''.join(lines)
    else:
        with open(charset_path, 'r', encoding='utf-8') as fr:
            charsets = fr.read()
    charsets = re.sub('\n|\t|', '', charsets)
    charsets = list(set(list(charsets)))
    charsets = sorted(charsets)
    charsets = ''.join(charsets)
    charsets = charsets.encode('utf-8').decode('utf-8')
    return charsets


def gen_random_text(charsets, min_len, max_len):
    """
    生成长度在min_len到max_len的随机文本
    :param charsets: 字符集合. [str]
    :param min_len: 最小文本长度. [int]
    :param max_len: 最长文本长度. [int]
    :return:返回生成的文本编码序列和文本字符串
    """
    length = seed.random_integers(low=min_len, high=max_len)
    idxs = seed.randint(low=0, high=len(charsets), size=length)
    str = ''.join([charsets[i] for i in idxs])
    return idxs, str


def captcha_gen_img(text, image_shape, fonts):
    """
    将文本生成对应的验证码图像
    :param text: 输入的文本. [str]
    :param image_shape: 图像的尺寸. [list]
    :param fonts: 字体文件路径列表. [list]
    :return:
    """
    image = ImageCaptcha(height=image_shape[0], width=image_shape[1], fonts=fonts)
    data = image.generate_image(text)
    data = np.reshape(np.frombuffer(data.tobytes(), dtype=np.uint8), image_shape)
    return data


def captcha_batch_gen(batch_size, charsets, min_len, max_len, image_shape, blank_symbol, fonts):
    """
    生成一个batch验证码数据集，每个batch包含三部分，分别是图像、每张图像的宽度、图像的标签
    :param batch_size: batch_size
    :param charsets: 字符集合
    :param min_len: 最小的文本长度
    :param max_len: 最大的文本长度
    :param image_shape: 生成的图像尺寸
    :param blank_symbol: 当文本长度小于最大的长度时，对其尾部进行padding的数字
    :param fonts: 字体文件路径列表
    :return:
    """
    batch_labels = []
    batch_images = []
    batch_image_widths = []

    for _ in range(batch_size):
        idxs, text = gen_random_text(charsets, min_len, max_len)
        image = captcha_gen_img(text, image_shape, fonts)
        image = image / 255

        pad_size = max_len - len(idxs)
        if pad_size > 0:
            idxs = np.pad(idxs, pad_width=(0, pad_size), mode='constant', constant_values=blank_symbol)
        batch_image_widths.append(image.shape[1])
        batch_labels.append(idxs)
        batch_images.append(image)

    batch_labels = np.array(batch_labels, dtype=np.int32)
    batch_images = np.array(batch_images, dtype=np.float32)
    batch_image_widths = np.array(batch_image_widths, dtype=np.int32)

    return batch_images, batch_image_widths, batch_labels


def scence_batch_gen(batch_img_list, batch_img_label_list,
                     charsets, image_shape, max_len, blank_symbol):
    """
    生成一个batch真实场景数据集，每个batch包含三部分，分别是图像、每张图像的宽度、图像的标签
    :param batch_img_list: 图像路径列表
    :param batch_img_label_list: 图像标签列表
    :param charsets: 字符集字符串
    :param image_shape: 生成的图像尺寸
    :param max_len: 文本序列的最大长度
    :param blank_symbol: 当文本长度小于最大的长度时，对其尾部进行padding的数字
    :return:
    """
    batch_labels = []
    batch_image_widths = []
    batch_size = len(batch_img_label_list)
    batch_images = np.zeros(shape=(batch_size, image_shape[0], image_shape[1], image_shape[2]), dtype=np.float32)

    for i, path, label in zip(range(batch_size), batch_img_list, batch_img_label_list):
        # 对图像进行放缩
        image = Image.open(path)
        img_size = image.size
        height_ratio = image_shape[0] / img_size[1]
        if int(img_size[0] * height_ratio) > image_shape[1]:
            new_img_size = (image_shape[1], image_shape[0])
            image = image.resize(new_img_size, Image.ANTIALIAS).convert('RGB')
            image = np.array(image, np.float32)
            image = image / 255
            batch_images[i, :, :, :] = image
        else:
            new_img_size = (int(img_size[0] * height_ratio), image_shape[0])
            image = image.resize(new_img_size, Image.ANTIALIAS).convert('RGB')
            image = np.array(image, np.float32)
            image = image / 255
            batch_images[i, :image.shape[0], :image.shape[1], :] = image

        # 对标签进行编码
        if len(label) > max_len:
            label = label[:max_len]
        idxs = [charsets.index(i) for i in label]

        # 对标签进行padding
        pad_size = max_len - len(idxs)
        if pad_size > 0:
            idxs = np.pad(idxs, pad_width=(0, pad_size), mode='constant', constant_values=blank_symbol)

        batch_image_widths.append(image_shape[1])
        batch_labels.append(idxs)

    batch_labels = np.array(batch_labels, dtype=np.int32)
    batch_image_widths = np.array(batch_image_widths, dtype=np.int32)

    return batch_images, batch_image_widths, batch_labels


def load_images(batch_img_list, image_shape):
    """
    生成一个batch真实场景数据集，每个batch包含三部分，分别是图像、每张图像的宽度、图像的标签
    :param batch_img_list: 图像路径列表或图像列表[list]
    :param image_shape: 生成的图像尺寸
    :return:
    """
    # 参数为图像路径列表
    if isinstance(batch_img_list[0], str):
        batch_size = len(batch_img_list)
        batch_image_widths = []
        batch_images = np.zeros(shape=(batch_size, image_shape[0], image_shape[1], image_shape[2]), dtype=np.float32)

        for i, path in zip(range(batch_size), batch_img_list):
            # 对图像进行放缩
            image = Image.open(path)
            img_size = image.size
            height_ratio = image_shape[0] / img_size[1]
            if int(img_size[0] * height_ratio) > image_shape[1]:
                new_img_size = (image_shape[1], image_shape[0])
                image = image.resize(new_img_size, Image.ANTIALIAS).convert('RGB')
                image = np.array(image, np.float32)
                image = image / 255
                batch_images[i, :, :, :] = image
            else:
                new_img_size = (int(img_size[0] * height_ratio), image_shape[0])
                image = image.resize(new_img_size, Image.ANTIALIAS).convert('RGB')
                image = np.array(image, np.float32)
                image = image / 255
                batch_images[i, :image.shape[0], :image.shape[1], :] = image
            batch_image_widths.append(image_shape[1])
    # 参数为图像列表
    elif isinstance(batch_img_list[0], PIL.Image.Image):
        batch_size = len(batch_img_list)
        batch_image_widths = []
        batch_images = np.zeros(shape=(batch_size, image_shape[0], image_shape[1], image_shape[2]), dtype=np.float32)

        for i in range(batch_size):
            # 对图像进行放缩
            image = batch_img_list[i]
            img_size = image.size
            height_ratio = image_shape[0] / img_size[1]
            if int(img_size[0] * height_ratio) > image_shape[1]:
                new_img_size = (image_shape[1], image_shape[0])
                image = image.resize(new_img_size, Image.ANTIALIAS).convert('RGB')
                image = np.array(image, np.float32)
                image = image / 255
                batch_images[i, :, :, :] = image
            else:
                new_img_size = (int(img_size[0] * height_ratio), image_shape[0])
                image = image.resize(new_img_size, Image.ANTIALIAS).convert('RGB')
                image = np.array(image, np.float32)
                image = image / 255
                batch_images[i, :image.shape[0], :image.shape[1], :] = image
            batch_image_widths.append(image_shape[1])

    return batch_images, batch_image_widths

最后是模型的类文件，主要是定义模型的结构和损失函数以及训练函数，其代码如下：

import os
import random
import numpy as np
import tensorflow as tf
from tensorflow.contrib import slim
from tensorflow.contrib.rnn import BasicLSTMCell
from crnn.data_generator import get_charsets, captcha_batch_gen, scence_batch_gen, get_img_label


class CRNN(object):
    def __init__(self,
                 image_shape,
                 min_len,
                 max_len,
                 lstm_hidden,
                 pool_size,
                 learning_decay_rate,
                 learning_rate,
                 learning_decay_steps,
                 mode,
                 dict,
                 is_training,
                 train_label_path,
                 train_images_path,
                 charset_path):
        self.min_len = min_len
        self.max_len = max_len
        self.lstm_hidden = lstm_hidden
        self.pool_size = pool_size
        self.learning_decay_rate = learning_decay_rate
        self.learning_rate = learning_rate
        self.learning_decay_steps = learning_decay_steps
        self.mode = mode
        self.dict = dict
        self.is_training = is_training
        self.train_label_path = train_label_path
        self.train_images_path = train_images_path
        self.charset_path = charset_path
        self.charsets = get_charsets(self.dict, self.mode, self.charset_path)
        self.image_shape = image_shape
        self.images = tf.placeholder(dtype=tf.float32,
                                     shape=[None, self.image_shape[0], self.image_shape[1], self.image_shape[2]])
        self.image_widths = tf.placeholder(dtype=tf.int32, shape=[None])
        self.labels = tf.placeholder(dtype=tf.int32, shape=[None, self.max_len])
        self.seq_len_inputs = tf.divide(self.image_widths, self.pool_size, name='seq_len_input_op') - 1
        self.logprob = self.forward(self.is_training)
        self.train_op, self.loss_ctc = self.create_train_op(self.logprob)
        self.dense_predicts = self.decode_predict(self.logprob)

    def vgg_net(self, inputs, is_training, scope='vgg'):
        batch_norm_params = {
            'is_training': is_training
        }
        with tf.variable_scope(scope):
            with slim.arg_scope([slim.conv2d], normalizer_fn=slim.batch_norm, normalizer_params=batch_norm_params):
                with slim.arg_scope([slim.max_pool2d], padding='SAME'):
                    with slim.arg_scope([slim.batch_norm], **batch_norm_params):
                        net = slim.repeat(inputs, 1, slim.conv2d, 64, [3, 3], scope='conv1')
                        net = slim.max_pool2d(net, [2, 2], scope='pool1')
                        net = slim.repeat(net, 1, slim.conv2d, 128, [3, 3], scope='conv2')
                        net = slim.max_pool2d(net, [2, 2], scope='pool2')
                        net = slim.repeat(net, 2, slim.conv2d, 256, [3, 3], scope='conv3')
                        net = slim.max_pool2d(net, [2, 2], stride=[2, 1], scope='pool3')
                        net = slim.repeat(net, 2, slim.conv2d, 512, [3, 3], scope='conv4')
                        net = slim.max_pool2d(net, [2, 2], stride=[2, 1], scope='pool4')
                        net = slim.repeat(net, 1, slim.conv2d, 512, [3, 3], scope='conv5')
                        return net

    def forward(self, is_training):
        dropout_keep_prob = 0.7 if is_training else 1.0
        cnn_net = self.vgg_net(self.images, is_training)

        with tf.variable_scope('Reshaping_cnn'):
            shape = cnn_net.get_shape().as_list()  # [batch, height, width, features]
            transposed = tf.transpose(cnn_net, perm=[0, 2, 1, 3],
                                      name='transposed')  # [batch, width, height, features]
            conv_reshaped = tf.reshape(transposed, [-1, shape[2], shape[1] * shape[3]],
                                       name='reshaped')  # [batch, width, height x features]

        list_n_hidden = [self.lstm_hidden, self.lstm_hidden]

        with tf.name_scope('deep_bidirectional_lstm'):
            # Forward direction cells
            fw_cell_list = [BasicLSTMCell(nh, forget_bias=1.0) for nh in list_n_hidden]
            # Backward direction cells
            bw_cell_list = [BasicLSTMCell(nh, forget_bias=1.0) for nh in list_n_hidden]

            lstm_net, _, _ = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(fw_cell_list,
                                                                            bw_cell_list,
                                                                            conv_reshaped,
                                                                            dtype=tf.float32
                                                                            )
            # Dropout layer
            lstm_net = tf.nn.dropout(lstm_net, keep_prob=dropout_keep_prob)

        with tf.variable_scope('fully_connected'):
            shape = lstm_net.get_shape().as_list()  # [batch, width, 2*n_hidden]
            fc_out = slim.layers.linear(lstm_net, len(self.charsets) + 1)  # [batch x width, n_class]

            lstm_out = tf.reshape(fc_out, [-1, shape[1], len(self.charsets) + 1],
                                  name='lstm_out')  # [batch, width, n_classes]

            # Swap batch and time axis
            logprob = tf.transpose(lstm_out, [1, 0, 2], name='transpose_time_major')  # [width(time), batch, n_classes]

        return logprob

    def create_loss(self, logprob):
        sparse_code_target = self.dense_to_sparse(self.labels, blank_symbol=len(self.charsets) + 1)
        with tf.control_dependencies(
                [tf.less_equal(sparse_code_target.dense_shape[1],
                               tf.reduce_max(tf.cast(self.seq_len_inputs, tf.int64)))]):
            loss_ctc = tf.nn.ctc_loss(labels=sparse_code_target,
                                      inputs=logprob,
                                      sequence_length=tf.cast(self.seq_len_inputs, tf.int32),
                                      preprocess_collapse_repeated=False,
                                      ctc_merge_repeated=True,
                                      ignore_longer_outputs_than_inputs=True,
                                      # returns zero gradient in case it happens -> ema loss = NaN
                                      time_major=True)
            loss_ctc = tf.reduce_mean(loss_ctc)
        return loss_ctc

    def create_train_op(self, logprob):
        loss_ctc = self.create_loss(logprob)
        tf.losses.add_loss(loss_ctc)

        self.global_step = tf.train.get_or_create_global_step()

        learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,
                                                   self.learning_decay_steps, self.learning_decay_rate,
                                                   staircase=True)

        optimizer = tf.train.AdamOptimizer(learning_rate, beta1=0.5)

        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

        train_op = slim.learning.create_train_op(total_loss=tf.losses.get_total_loss(), optimizer=optimizer,
                                                 update_ops=update_ops)
        return train_op, loss_ctc

    def decode_predict(self, logprob):
        with tf.name_scope('decode_conversion'):
            sparse_code_pred, log_probability = tf.nn.ctc_greedy_decoder(logprob,
                                                                         sequence_length=tf.cast(
                                                                             self.seq_len_inputs,
                                                                             tf.int32
                                                                         ))
            sparse_code_pred = sparse_code_pred[0]
            dense_predicts = tf.sparse_to_dense(sparse_code_pred.indices,
                                                sparse_code_pred.dense_shape,
                                                sparse_code_pred.values, default_value=-1)

        return dense_predicts

    def dense_to_sparse(self, dense_tensor, blank_symbol):
        """
        将标签转化为稀疏表示
        :param dense_tensor: 原始的密集标签
        :param blank_symbol: padding的符号
        :return:
        """
        indices = tf.where(tf.not_equal(dense_tensor, blank_symbol))
        values = tf.gather_nd(dense_tensor, indices)
        sparse_target = tf.SparseTensor(indices, values, [-1, self.image_shape[1]])
        return sparse_target

    def train(self,
              epoch=100,
              batch_size=32,
              train_images_path=None,
              train_label_path=None,
              restore=False,
              fonts=None,
              logs_path=None,
              models_path=None,
              ):
        # 创建相关目录
        if not os.path.exists(models_path):
            os.mkdir(models_path)
        if not os.path.exists(logs_path):
            os.mkdir(logs_path)

        # summary
        tf.summary.scalar('loss_ctc', self.loss_ctc)
        merged = tf.summary.merge_all()

        # sess and writer
        sess = tf.Session()
        writer = tf.summary.FileWriter(logs_path, sess.graph)
        saver = tf.train.Saver(max_to_keep=10)
        sess.run(tf.global_variables_initializer())

        # restore model
        last_epoch = 0
        if restore:
            ckpt = tf.train.latest_checkpoint(models_path)
            if ckpt:
                last_epoch = int(ckpt.split('-')[1]) + 1
                saver.restore(sess, ckpt)

        # 计算batch的数量
        if self.mode == 1:
            batch_nums = 1000
        else:
            train_img_list, train_label_list = get_img_label(train_label_path, train_images_path)
            batch_nums = int(np.ceil(len(train_img_list) / batch_size))

        if self.mode == 1:
            for i in range(last_epoch, epoch):
                for j in range(batch_nums):
                    batch_images, batch_image_widths, batch_labels = captcha_batch_gen(
                        batch_size,
                        self.charsets,
                        self.min_len,
                        self.max_len,
                        self.image_shape,
                        len(self.charsets) + 1,
                        fonts
                    )
                    _, loss, predict_label = sess.run(
                        [self.train_op, self.loss_ctc, self.dense_predicts],
                        feed_dict={self.images: batch_images,
                                   self.image_widths: batch_image_widths,
                                   self.labels: batch_labels}
                    )
                    if j % 1 == 0:
                        print('epoch:%d/%d, batch:%d/%d, loss:%.4f, truth:%s, predict:%s' % (
                            i, epoch,
                            j, batch_nums,
                            loss,
                            ''.join([self.charsets[k] for k in batch_labels[0] if k != (len(self.charsets) + 1)]),
                            ''.join([self.charsets[v] for v in predict_label[0] if v != -1])
                        ))

                saver.save(sess, save_path=models_path, global_step=i)
                summary = sess.run(merged,
                                   feed_dict={
                                       self.images: batch_images,
                                       self.image_widths: batch_image_widths,
                                       self.labels: batch_labels
                                   })
                writer.add_summary(summary, global_step=i)
        else:
            for i in range(last_epoch, epoch):
                random_index = random.sample(range(len(train_img_list)), len(train_img_list))
                batch_index = np.array_split(np.array(random_index), batch_nums)
                for j in range(batch_nums):
                    this_batch_index = list(batch_index[j])
                    this_train_img_list = [train_img_list[index] for index in this_batch_index]
                    this_train_label_list = [train_label_list[index] for index in this_batch_index]
                    batch_images, batch_image_widths, batch_labels = scence_batch_gen(
                        this_train_img_list,
                        this_train_label_list,
                        self.charsets,
                        self.image_shape,
                        self.max_len,
                        len(self.charsets) + 1
                    )
                    _, loss, predict_label = sess.run(
                        [self.train_op, self.loss_ctc, self.dense_predicts],
                        feed_dict={self.images: batch_images,
                                   self.image_widths: batch_image_widths,
                                   self.labels: batch_labels}
                    )
                    if j % 1 == 0:
                        print('epoch:%d/%d, batch:%d/%d, loss:%.4f, truth:%s, predict:%s' % (
                            i, epoch,
                            j, batch_nums,
                            loss,
                            ''.join([self.charsets[i] for i in batch_labels[0] if i != (len(self.charsets) + 1)]),
                            ''.join([self.charsets[v] for v in predict_label[0] if v != -1])
                        ))

                saver.save(sess, save_path=models_path, global_step=i)
                summary = sess.run(merged,
                                   feed_dict={
                                       self.images: batch_images,
                                       self.image_widths: batch_image_widths,
                                       self.labels: batch_labels
                                   })
                writer.add_summary(summary, global_step=i)

模型定义结束后，就可以开始训练了，下面是训练的脚本，直接调用模型的类即可。

import os
import tensorflow as tf
from crnn.modules import CRNN
from crnn import config as crnn_config
os.environ["CUDA_VISIBLE_DEVICES"] = "2"


def main(_):
    crnn = CRNN(image_shape=crnn_config.image_shape,
                min_len=crnn_config.min_len,
                max_len=crnn_config.max_len,
                lstm_hidden=crnn_config.lstm_hidden,
                pool_size=crnn_config.pool_size,
                learning_decay_rate=crnn_config.learning_decay_rate,
                learning_rate=crnn_config.learning_rate,
                learning_decay_steps=crnn_config.learning_decay_steps,
                mode=crnn_config.mode,
                dict=crnn_config.dict,
                is_training=True,
                train_label_path=crnn_config.train_label_path,
                train_images_path=crnn_config.train_images_path,
                charset_path=crnn_config.charset_path)
    crnn.train(epoch=crnn_config.epoch,
               batch_size=crnn_config.batch_size,
               train_images_path=crnn_config.train_images_path,
               train_label_path=crnn_config.train_label_path,
               restore=True,
               fonts=crnn_config.fonts,
               logs_path=crnn_config.logs_path,
               models_path=crnn_config.models_path)


if __name__ == '__main__':
    tf.app.run()

训练结束后，模型会存放在model路径下，直接执行predict.py脚本可以对测试集路径下的脚本进行预测。

# -*- utf-8 -*-
"""
    @describe: text recognition with images path or images ndarray list
    @author: xushen
    @date: 2018-12-25
"""
import os
import tensorflow as tf
from crnn.modules import CRNN
from multiprocessing import Pool
from crnn import config as crnn_config
from crnn.data_generator import load_images

crnn_graph = tf.Graph()
with crnn_graph.as_default():
    crnn = CRNN(image_shape=crnn_config.image_shape,
                min_len=crnn_config.min_len,
                max_len=crnn_config.max_len,
                lstm_hidden=crnn_config.lstm_hidden,
                pool_size=crnn_config.pool_size,
                learning_decay_rate=crnn_config.learning_decay_rate,
                learning_rate=crnn_config.learning_rate,
                learning_decay_steps=crnn_config.learning_decay_steps,
                mode=crnn_config.mode,
                dict=crnn_config.dict,
                is_training=False,
                train_label_path=crnn_config.train_label_path,
                train_images_path=crnn_config.train_images_path,
                charset_path=crnn_config.charset_path)

crnn_sess = tf.Session(graph=crnn_graph)
with crnn_sess.as_default():
    with crnn_graph.as_default():
        tf.global_variables_initializer().run()
        crnn_saver = tf.train.Saver(tf.global_variables())
        crnn_ckpt = tf.train.get_checkpoint_state(crnn_config.models_path)
        crnn_saver.restore(crnn_sess, crnn_ckpt.model_checkpoint_path)


def predict(images, batch_size=crnn_config.predict_batch_size):
    """
    predict images
    :param images:images path or list of images ,[list/str]
    :param batch_size: batch size
    :return:
    """
    if isinstance(images, str):
        assert os.path.exists(images), 'path of image or images dir is not exist'
        if os.path.isdir(images):
            test_img_list = os.listdir(images)
            batch_size = len(test_img_list) if len(test_img_list) <= batch_size else batch_size
            test_img_list = [os.path.join(images, i) for i in test_img_list]
            batch_images, batch_image_widths = load_images(
                test_img_list,
                crnn.image_shape
            )
        elif os.path.isfile(images):
            test_img_list = [images]
            batch_size = len(test_img_list) if len(test_img_list) <= batch_size else batch_size
            batch_images, batch_image_widths = load_images(
                test_img_list,
                crnn.image_shape
            )

    elif isinstance(images, list):
        assert len(images) > 0, '图片数量不可以为0'
        batch_size = len(images) if len(images) <= batch_size else batch_size
        batch_images, batch_image_widths = load_images(
            images,
            crnn.image_shape
        )
    # 启用多线程
    predict_label_list = list()
    for i in range(0, len(batch_images), batch_size):
        if i + batch_size >= len(batch_images):
            batch_size = len(batch_images) - i
        predict_label_list.append(crnn_sess.run(crnn.dense_predicts,
                                                feed_dict={crnn.images: batch_images[i:i + batch_size],
                                                           crnn.image_widths: batch_image_widths[i:i + batch_size]}))
    result = list()
    for predict_label in predict_label_list:
        for j in range(len(predict_label)):
            text_i = ''.join([crnn.charsets[v] for v in predict_label[j] if v != -1])
            if text_i.replace(' ', '') != '':
                result.append(text_i)
    return result


if __name__ == '__main__':
    # 可以传入本地图片文件夹路径、本地图片路径、ndarray图片列表
    predict(crnn_config.predict_images_path, crnn_config.predict_batch_size)

config.py存放的是各个超参数的定义，其中主要注意的是mode参数的设置，当设置为1时则直接模拟验证码数据集进行训练，当设置为2时则需要提供真实数据集进行训练。

import time
import numpy as np

# data
mode = 2  # mode=1则用验证码进行训练，mode=2则用真实场景进行训练
image_shape = [32, 1024, 3]  # 图像尺寸
seed = np.random.RandomState(int(round(time.time())))  # 生成模拟数据时的随机种子
min_len = 1  # 文本的最小长度
max_len = 256  # 文本的最大长度
fonts = ['./crnn/fonts/ch_font/STSONG.TTF']  # 生成模拟数据时的字体文件路径列表
train_images_path = './crnn/data/train_images'  # 训练集图像存放路径
train_label_path = './crnn/data/train_label.txt'  # 训练集标签存放路径
test_images_path = './crnn/data/test_images'  # 测试集图像存放路径
test_label_path = './crnn/data/test_label.txt'  # 测试集标签存放路径
dict = './crnn/dict/english.txt'
logs_path = './crnn/logs'  # 训练日志存放路径
models_path = './crnn/models'  # 模型存放路径

# data icpr
org_images_path = './crnn/data/origin_images'  # ICPR数据集原始图像路径
org_labels_path = './crnn/data/txt'  # ICPR数据集原始label路径
cut_train_images_path = './crnn/data/train_images'  # 训练集切图的保存路径
cut_train_labels_path = './crnn/data/train_label.txt'  # 训练集切图对应label的保存路径
cut_test_images_path = './crnn/data/test_images'  # 测试集切图的保存路径
cut_test_labels_path = './crnn/data/test_label.txt'  # 测试集切图对应label的保存路径
train_test_ratio = 0.9  # 训练测试集的比例
is_transform = True  # 是否进行仿射变换
angle_range = [-15.0, 15.0]  # 不进行仿射变换的倾斜角度范围
epsilon = 1e-4  # 原始图像的顺时针变换参数
filter_ratio = 1.3  # 图片过滤的高宽比例，高于该比例的图片将被过滤
filter_height = 16  # 高度过滤，切图后的图像高度低于该值的将被过滤掉，[int]

# data generate (with base images)
num_samples = 100  # 生成样本总量
base_img_dir = './crnn/images_base'  # 背景图文件夹路径
font_style_path = {'ch': './crnn/fonts/ch_fonts', 'en': './crnn/fonts/en_fonts'}  # 字体文件夹路径
font_size = [12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 40]  # 字体大小列表
# 字体颜色列表 ,black:0-3 gray:3-6 blue:6-12 green:12-15 brown:15-16 white:16-17
font_color = [[0, 0, 0], [36, 36, 36], [83, 72, 53], [109, 129, 139], [139, 139, 139], [143, 161, 143],
              [106, 160, 194], [97, 174, 238], [191, 234, 255], [118, 103, 221], [198, 120, 221], [64, 148, 216],
              [147, 178, 139], [76, 136, 107], [62, 144, 135], [209, 125, 72], [255, 255, 255]]
dictionary_file = './crnn/dict/en_ch.txt'  # 字典文件路径
text_size_limit = [1, 256]  # 生成文本字符范围
margin = 10  # 生成文本离背景图的边距最大值
use_blank = True  # 是否使用多线程，默认False
num_process = 1  # 并行处理数据的进程数，默认1（即单进程）

# charset generate
charset_path = './crnn/data/charset.txt'

# model
lstm_hidden = 256

# train
pool_size = 2 * 2  # pool层总共对图像宽度的缩小倍数
batch_size = 32  # batch_size
learning_rate = 1e-3  # 学习率
learning_decay_steps = 3000  # 学习率每多少次递减一次
learning_decay_rate = 0.95  # 学习率每次递减时，变为原来的多少
epoch = 100  # 迭代的次数

# predict
predict_batch_size = 64
predict_images_path = './crnn/data/predict_images'
predict_label_path = './crnn/data/predict_label.txt'

本文在ICPR数据集训练了大概19个epoch后，模型已经基本达到稳定，其效果如下：

在验证码的数据集上训练后，文本识别的效果也基本达到了100%的准确率，效果如下：

4.总结

在文章的最后，大概总结一下CRNN模型的优点吧：

CRNN可以对文本进行端到端的识别
可以对任意长度的文本序列进行识别，而不需要涉及到字符分割、水平尺度归一化等技术
它不局限于任何预定义的词典，在无词典和基于词典的场景文本识别任务中都取得了显著的性能。
模型更加轻量级

你可能感兴趣的:(Tensorflow,文本识别)

【最新】TensorFlow、cuDNN、CUDA三者之间的最新版本对应及下载地址江上_酒开发环境及工具配置 TensorFlow CUDA cuDNN
TensorFlow、cuDNN、CUDA对应关系官网查询地址CUDA下载地址cuDNN下载地址VersionPythonversionCompilerBuildtoolscuDNNCUDAtensorflow_gpu-2.9.03.7-3.10MSVC2019Bazel5.0.08.111.2tensorflow_gpu-2.8.03.7-3.10MSVC2019Bazel4.2.18.111.
AI模型技术演进与行业应用图谱智能计算研究中心其他
内容概要当前AI模型技术正经历从基础架构到行业落地的系统性革新。主流深度学习框架如TensorFlow和PyTorch持续优化动态计算图与分布式训练能力，而MXNet凭借高效的异构计算支持在边缘场景崭露头角。与此同时，模型压缩技术通过量化和知识蒸馏将参数量降低60%-80%，联邦学习则通过加密梯度交换实现多机构数据协同训练。在应用层面，医疗诊断模型通过迁移学习在CT影像分类任务中达到98.2%的准
模型优化驱动产业应用创新智能计算研究中心其他
内容概要当前模型优化技术的迭代正沿着多维路径快速演进，其核心驱动力在于突破算法性能与产业需求间的适配瓶颈。以自适应学习机制与迁移学习框架为基础的优化策略，显著提升了模型在跨场景应用中的泛化能力，而超参数自动调优技术则通过PyTorch、TensorFlow等主流框架的接口标准化，降低了复杂模型的开发门槛。在部署层面，边缘计算与联邦学习的协同应用不仅缩短了金融预测、医疗影像分析等场景的响应延迟，更通
TensorFlow和Pytorch在功能上的区别以及优势 Honeysea_70 #算法 tensorflow pytorch 人工智能
功能上的区别1.计算图TensorFlow：使用静态计算图（StaticGraph）。在运行模型之前，需要先构建完整的计算图，然后通过会话（Session）运行图。优点是性能优化更高效，适合大规模分布式训练和生产环境部署。缺点是调试相对复杂，因为计算图的构建和运行是分离的。PyTorch：使用动态计算图（DynamicGraph）。计算图是动态构建和执行的，每次迭代都会重新构建图。优点是调试方便，
TensorFlow深度学习实战项目：从入门到精通点我头像干啥 Ai 深度学习 tensorflow 人工智能
引言深度学习作为人工智能领域的一个重要分支，近年来取得了显著的进展。TensorFlow作为Google开源的深度学习框架，因其强大的功能和灵活的架构，成为了众多开发者和研究者的首选工具。本文将带领大家通过一个实战项目，深入理解TensorFlow的使用方法，并掌握深度学习的基本流程。1.TensorFlow简介1.1TensorFlow是什么？TensorFlow是一个开源的机器学习框架，由Go
GOT-OCR2.0：突破性端到端架构与高精度文本识别的技术创新 XianxinMao 人工智能深度学习
GOT-OCR2.0在技术上的突破与优势GOT-OCR2.0在技术上实现了对传统OCR系统的显著超越，主要体现在其采用了统一的端到端（End-to-End）架构。这一架构的创新性设计带来了多方面的提升，具体包括以下几个关键方面：1.统一的端到端架构传统OCR系统的局限：传统的OCR流程通常由多个独立的模块组成，如图像预处理、字符分割、特征提取、分类识别等。这种多步处理方式不仅增加了系统的复杂性，还
查看 CUDA cudnn 版本查看Navicat GPU版本 FergusJ 备份 python 开发语言
查看显卡型号：lspci|grepVGA（lspci是linux查看硬件信息的命令），屏幕会打印出主机的集显几独显信息python中查看显卡型号fromtensorflow.python.clientimportdevice_libdevice_lib.list_local_devices()
错误moduleNotFoundError: No module named 'matplotlib' 逆着tensor tensorflow2.0学习 tensorflow
错误ModuleNotFoundError:Nomodulenamed‘matplotlib’问题tensorflow2.0中jupyternotebook编写线性回归例子，出现ModuleNotFoundError:Nomodulenamed'matplotlib’错误解决办法好了，重新加载程序，已经可以用了。
下一代模型技术演进与场景应用突破智能计算研究中心其他
内容概要当前模型技术正经历多维度的范式跃迁，可解释性模型与自动化机器学习（AutoML）成为突破传统黑箱困境的核心路径。在底层架构层面，边缘计算与量子计算的融合重构了算力分配模式，联邦学习技术则为跨域数据协作提供了安全可信的解决方案。主流框架如TensorFlow和PyTorch持续迭代优化能力，通过动态参数压缩与自适应超参数调优策略，显著提升模型部署效率。应用层创新呈现垂直化特征，医疗诊断模型通
TikTokenizer 项目常见问题解决方案齐飞锴Timothea
TikTokenizer项目常见问题解决方案tiktokenizerOnlineplaygroundforOpenAPItokenizers项目地址:https://gitcode.com/gh_mirrors/ti/tiktokenizer项目基础介绍TikTokenizer是一个开源项目，主要用于文本处理，特别是将文本转化为可用于深度学习的格式。该项目是基于TensorFlow和Keras开发
软件定义世界下的教育创新：高校计算机实验室应重心转向开源平台开源
一、一键式教学环境部署，节省90%准备时间•应用模板库：提供200+预置教学工具模板（如JupyterLab+TensorFlow、MySQL集群），教师可根据课程需求选择模板，5分钟内完成包含依赖库、运行环境的全栈部署。•多版本隔离：支持同一服务器并行运行不同版本框架（如Django3.2教学版与4.1开发版），避免版本冲突导致30%的课堂时间浪费。•自助式环境创建：学生通过命令行快速申请带GP
使用 TensorFlow 进行图像处理：深度解析卷积神经网络（CNN）一碗黄焖鸡三碗米饭人工智能前沿与实践 tensorflow 图像处理 cnn 人工智能机器学习 python ai
目录使用TensorFlow进行图像处理：深度解析卷积神经网络（CNN）1.什么是卷积神经网络（CNN）？CNN的基本结构为什么CNN适合图像处理？2.使用TensorFlow构建CNN2.1环境准备2.2加载并预处理MNIST数据集2.3构建CNN模型2.4编译和训练模型2.5评估模型3.CNN的优化与改进3.1使用数据增强3.2调整网络结构4.CNN在其他图像处理任务中的应用5.总结参考文献在
LeetCode98-验证二叉搜索树学习的学习者 LeetCode Python 二叉搜索树
上个星期和导师去了华农一趟名义上是和导师去参加一个国家级的项目其实没我啥事都是我导师在那口若悬河当时和那边的本科生去了另一间会议室交流了关于GAN的知识偶然听说大家都在用pytorch好像最新版的也挺好用的反正就是学术界目前主要用这个框架工业界主要用Tensorflow(没办法，Google出品)这两天也拿来瞧了瞧好像也确实可以的！！！98-验证二叉搜索树给定一个二叉树，判断其是否是一个有效的二叉
人工智能（AI）系统化学习路线 xiaoyu❅ python 人工智能学习
一、为什么需要系统化学习AI？人工智能技术正在重塑各行各业，但许多初学者容易陷入误区：❌盲目跟风：直接学习TensorFlow/PyTorch，忽视数学与算法基础。❌纸上谈兵：只看理论不写代码，无法解决实际问题。❌方向模糊：对CV/NLP/RL等细分领域缺乏认知，难以针对性提升。正确的学习姿势：“金字塔式”分层学习（理论→算法→框架→应用→工程化），逐步构建完整的AI知识体系。二、人工智能学习路线
使用TensorFlow、OpenCV和Pygame实现图像处理与游戏开发 UwoiGit tensorflow opencv pygame
在本篇文章中，我们将介绍如何结合使用TensorFlow、OpenCV和Pygame来进行图像处理和游戏开发。这三个工具在机器学习、计算机视觉和游戏开发领域都非常流行，并且它们的结合可以提供强大的功能和无限的创造力。我们将逐步介绍如何安装和配置这些工具，并提供相关的源代码示例。安装TensorFlowTensorFlow是一个基于数据流图的开源机器学习框架，提供了丰富的工具和库来构建和训练各种深度
MNIST数据集&手写数字识别 Zoro｜ keras tensorflow 人工智能机器学习
TensorFlow是一个开源的机器学习框架，由Google开发并发布。它提供了一种基于数据流图的编程模型，用于构建和训练机器学习模型。TensorFlow的核心概念是张量（Tensor）和流图（Graph）。张量是TensorFlow中的基本数据单位，可以理解为多维数组，可以是标量、向量、矩阵或更高维度的数组。流图是由一系列操作（Operation）和张量组成的。操作定义了计算和转换张量的方式。
AI模型技术前沿与跨场景应用实践智能计算研究中心其他
内容概要当前AI模型技术正呈现多维度突破与跨领域融合的特征。从技术演进角度看，可解释性模型与量子计算框架的协同发展正在突破传统黑箱限制，而联邦学习、自适应优化等技术则为复杂场景建模提供了新的方法论支撑。应用层面，TensorFlow与PyTorch框架在医疗影像诊断、金融时序预测等领域的实战案例，验证了深度学习模型在垂直行业的泛化能力。值得关注的是，工具链整合已成为技术落地的关键环节，MXNet与
大数据开发之Kubernetes篇----安装部署Kubernetes&dashboard 豆豆总 kubernetes
Kubernetes简介由于公司有需要，需要将外后的服务外加Tensorflow模型部署加训练全部集成到k8s上，所以特意记录下这次简单部署的过程。k8s安装部署首先，我们在部署任何大型的组件前都必须要做的事情就是关闭防火墙和设置hostname了vi/etc/hostsk8s001xxx.xxx.xxx.xxk8s002xxx.xxx.xxx.xx...systemctlstopfirewall
如何使用Python实现生成对抗网络（GAN）「已注销」互联网前沿技术韩进的创作空间全栈开发知识库 python 生成对抗网络 tensorflow 深度学习数据分析
生成对抗网络（GAN）是一种深度学习模型，由两个部分组成：生成器和判别器。生成器负责生成与训练数据相似的新数据，而判别器负责判断输入数据是真实的还是由生成器生成的。这两个部分不断相互博弈，直到生成器能够生成非常逼真的数据，使判别器难以区分生成数据和真实数据。下面是一个简单的Python实现，使用TensorFlow和Keras库。在开始之前，请确保已经安装了TensorFlow和Keras。imp
从零开始大模型开发与微调：PyCharm的下载与安装 AI天才研究院 AI大模型企业级应用开发实战 AI大模型应用入门实战与进阶 DeepSeek R1 &大数据AI人工智能大模型计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
从零开始大模型开发与微调：PyCharm的下载与安装1.背景介绍随着人工智能和深度学习技术的不断发展,大型语言模型(LargeLanguageModels,LLMs)已经成为当前最引人注目的研究热点之一。LLMs能够在各种自然语言处理任务上展现出惊人的性能,例如机器翻译、文本生成、问答系统等。PyTorch和TensorFlow等深度学习框架为训练和微调大型语言模型提供了强大的支持。PyCharm
tensorflow 不支持python3以下的版本辽宁大学神经网络神经网络
小白一枚，没用过tensorflow，所以在安装的时候导致版本错误安装不上。遇到这种情况建议换python的版本。
3.13.0 python 配置tensorflow（CPU版本） m0_Gattuso tensorflow 人工智能 python
condacreate--nametestpython=3.12activatetestpipinstalltensorflow上面3步骤在condaprompt里完成退出时记得condadeactivate来源：Windows下tensorflow/pytorch环境配置_pycharm怎么配置tensorflow环境-CSDN博客然后问题出现了：condaenvironment里什么都没有，理
模型可解释性：基于博弈论的SHAP值计算与特征贡献度分析（附PyTorch/TensorFlow实现）燃灯工作室 Ai pytorch tensorflow 人工智能
一、技术原理与数学推导（含典型案例）1.1Shapley值基础公式SHAP值基于合作博弈论中的Shapley值，计算公式为：ϕi=∑S⊆F∖{i}∣S∣!(∣F∣−∣S∣−1)!∣F∣![f(S∪{i})−f(S)]\phi_i=\sum_{S\subseteqF\setminus\{i\}}\frac{|S|!(|F|-|S|-1)!}{|F|!}[f(S\cup\{i\})-f(S)]ϕi=S
pytorch训练权重转化为tensorflow模型的教训小枫小疯深度学习部署模型转移 pytorch tensorflow 人工智能
模型构建时候有时候在工程量比较大的时候，不可避免使用迭代算法，迭代算法本身会让错误的追踪更加困难，因此掌握基本的框架之间的差异非常重要。以下均是在模型转换过程中出现的错误。shuffleoperation(shuffle操作)这个操作原本是用来将各个通道之间的信息进行打乱后，此时面临重要的问题就是，如果将通道打乱，在pytorch里面与tensorflow中间，两种通道排序是不一样的，是采用不同的
OpenCV 深度学习模块 cv2.dnn 与其他深度学习框架的优缺点对比及适用场景白.夜深度学习 opencv
OpenCV提供了一个深度学习模块cv2.dnn，让开发者能够在计算机视觉项目中轻松加载和推理深度学习模型。相比于TensorFlow、PyTorch等其他深度学习框架，cv2.dnn有其独特的优点与缺点，适用于不同的应用场景。在这篇文章中，我们将详细分析cv2.dnn的优缺点，并讨论它的适用场景。一、cv2.dnn的优点1.简单易用cv2.dnn提供了一个相对简单且易于使用的接口，适合已经在使用
【Transformer-Hugging Face手册 07/10】微调预训练模型无水先生人工智能高级阶段人工智能综合 transformer 深度学习人工智能
微调预训练模型-目录一、说明二、在本机PyTorch中微调预训练模型。2.1加载数据2.2训练2.2.1使用PyTorchTrainer进行训练2.3训练超参数2.4评价2.5训练类三、使用Keras训练TensorFlow模型3.1为Keras加载数据3.2将数据加载为tf.data.Dataset3.3数据加载器3.4优化器和学习率调度器3.5训练循环3.6评价四、结论一、说明使用预训练模
数据分析及人工智能框架汇总 xihuanyuye 机器学习
一、数据分析二、人工智能1、Tensorflow1、简介TensorFlow是谷歌基于DistBelief进行研发的第二代人工智能学习系统，其命名来源于本身的运行原理。Tensor（张量）意味着N维数组，Flow（流）意味着基于数据流图的计算，TensorFlow为张量从流图的一端流动到另一端计算过程。TensorFlow是将复杂的数据结构传输至人工智能神经网中进行分析和处理过程的系统。Tenso
本地部署时，如何通过硬件加速（如 CUDA、TensorRT）提升 DeepSeek 的推理性能？不同显卡型号的兼容性如何测试？百态老人人工智能科技算法 vscode
本地部署DeepSeek模型的硬件加速优化与显卡兼容性测试指南一、硬件加速技术实现路径CUDA基础环境搭建版本匹配原则：根据显卡架构选择CUDA版本（如NVIDIARTX50系列需CUDA12+，V100需CUDA11.x），并通过nvcc--version验证安装。GPU加速验证：运行以下代码检查硬件加速状态：importtensorflowastfprint("可用GPU数量：",len(tf
训练大模型LLM选择哪种开发语言最好大0马浓人工智能训练 python
训练大型语言模型（LLM）时，选择合适的编程语言主要取决于效率、生态支持、开发便利性以及特定需求（如性能优化或硬件适配）。以下是常见语言的分析和推荐：---1.Python（首选语言）优势：-生态系统丰富：主流深度学习框架（PyTorch、TensorFlow、JAX）均以Python为主要接口，提供完整的工具链（数据处理、模型训练、评估部署）。-开发效率高：语法简洁，适合快速实验和原型开发，社区
开发ai模型最佳的系统是Ubuntu还是linux？俺足人工智能 ubuntu
在AI/ML开发中，Ubuntu是更优选的Linux发行版，原因如下：1.开箱即用的AI工具链支持Ubuntu预装了主流的AI框架（如TensorFlow、PyTorch）和依赖库，且通过apt包管理器可快速部署开发环境。提供针对NVIDIAGPU的官方驱动支持，简化CUDA和cuDNN的配置流程（如nvidia-smi直接监控显存）。2.社区生态与长期维护（LTS）UbuntuLTS版本（如24
springmvc 下 freemarker页面枚举的遍历输出杨白白 enum freemarker
spring mvc freemarker 中遍历枚举 1枚举类型有一个本地方法叫values（），这个方法可以直接返回枚举数组。所以可以利用这个遍历。 enum public enum BooleanEnum { TRUE(Boolean.TRUE, "是"), FALSE(Boolean.FALSE, "否");
实习简要总结 byalias 工作
来白虹不知不觉中已经一个多月了，因为项目还在需求分析及项目架构阶段，自己在这段时间都是在学习相关技术知识，现在对这段时间的工作及学习情况做一个总结：（1）工作技能方面大体分为两个阶段，Java Web 基础阶段和Java EE阶段 1）Java Web阶段在这个阶段，自己主要着重学习了 JSP, Servlet, JDBC, MySQL，这些知识的核心点都过了一遍，也
Quartz——DateIntervalTrigger触发器 eksliang quartz
转载请出自出处：http://eksliang.iteye.com/blog/2208559 一.概述 simpleTrigger 内部实现机制是通过计算间隔时间来计算下次的执行时间，这就导致他有不适合调度的定时任务。例如我们想每天的 1：00AM 执行任务，如果使用 SimpleTrigger，间隔时间就是一天。注意这里就会有一个问题，即当有 misfired 的任务并且恢复执行时，该执行时间
Unix快捷键 18289753290 unix Unix；快捷键;
复制，删除，粘贴： dd:删除光标所在的行 &nbs
获取Android设备屏幕的相关参数酷的飞上天空 android
包含屏幕的分辨率以及屏幕宽度的最大dp 高度最大dp TextView text = (TextView)findViewById(R.id.text); DisplayMetrics dm = new DisplayMetrics(); text.append("getResources().ge
要做物联网？先保护好你的数据蓝儿唯美数据
根据Beecham Research的说法，那些在行业中希望利用物联网的关键领域需要提供更好的安全性。在Beecham的物联网安全威胁图谱上，展示了那些可能产生内外部攻击并且需要通过快速发展的物联网行业加以解决的关键领域。 Beecham Research的技术主管Jon Howes说：“之所以我们目前还没有看到与物联网相关的严重安全事件，是因为目前还没有在大型客户和企业应用中进行部署，也就
Java取模（求余）运算随便小屋 java
整数之间的取模求余运算很好求，但几乎没有遇到过对负数进行取模求余，直接看下面代码： /** * * @author Logic * */ public class Test { public static void main(String[] args) { // TODO A
SQL注入介绍 aijuans sql注入
二、SQL注入范例这里我们根据用户登录页面 <form action="" > 用户名：<input type="text" name="username"><br/> 密码：<input type="password" name="passwor
优雅代码风格 aoyouzi 代码
总结了几点关于优雅代码风格的描述：代码简单：不隐藏设计者的意图，抽象干净利落，控制语句直截了当。接口清晰：类型接口表现力直白，字面表达含义，API 相互呼应以增强可测试性。依赖项少：依赖关系越少越好，依赖少证明内聚程度高，低耦合利于自动测试，便于重构。没有重复：重复代码意味着某些概念或想法没有在代码中良好的体现，及时重构消除重复。战术分层：代码分层清晰，隔离明确，
布尔数组百合不是茶 java 布尔数组
androi中提到了布尔数组; 布尔数组默认的是false, 并且只会打印false或者是true 布尔数组的例子; 根据字符数组创建布尔数组 char[] c = {'p','u','b','l','i','c'}; //根据字符数组的长度创建布尔数组的个数 boolean[] b = new bool
web.xml之welcome-file-list、error-page bijian1013 java web.xml servlet error-page
welcome-file-list 1.定义： <welcome-file-list> <welcome-file>login.jsp</welcome> </welcome-file-list> 2.作用：用来指定WEB应用首页名称。 error-page1.定义： <error-page&g
richfaces 4 fileUpload组件删除上传的文件 sunjing clear Richfaces 4 fileupload
页面代码 <h:form id="fileForm"> <rich:
技术文章备忘 bit1129 技术文章
Zookeeper http://wenku.baidu.com/view/bab171ffaef8941ea76e05b8.html http://wenku.baidu.com/link?url=8thAIwFTnPh2KL2b0p1V7XSgmF9ZEFgw4V_MkIpA9j8BX2rDQMPgK5l3wcs9oBTxeekOnm5P3BK8c6K2DWynq9nfUCkRlTt9uV
org.hibernate.hql.ast.QuerySyntaxException: unexpected token: on near line 1解决方案白糖_ Hibernate
文章摘自：http://blog.csdn.net/yangwawa19870921/article/details/7553181 在编写HQL时，可能会出现这种代码： select a.name,b.age from TableA a left join TableB b on a.id=b.id 如果这是HQL，那么这段代码就是错误的，因为HQL不支持
sqlserver按照字段内容进行排序 bozch 按照内容排序
在做项目的时候，遇到了这样的一个需求：从数据库中取出的数据集，首先要将某个数据或者多个数据按照地段内容放到前面显示，例如:从学生表中取出姓李的放到数据集的前面； select * fro
编程珠玑-第一章-位图排序 bylijinnan java 编程珠玑
import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.io.Writer; import java.util.Random; public class BitMapSearch {
Java关于==和equals chenbowen00 java
关于==和equals概念其实很简单，一个是比较内存地址是否相同，一个比较的是值内容是否相同。虽然理解上不难，但是有时存在一些理解误区，如下情况： 1、 String a = "aaa"; a=="aaa"; ==> true 2、 new String("aaa")==new String("aaa
[IT与资本]软件行业需对外界投资热情保持警惕 comsci it
我还是那个看法,软件行业需要增强内生动力,尽量依靠自有资金和营业收入来进行经营,避免在资本市场上经受各种不同类型的风险,为企业自主研发核心技术和产品提供稳定,温和的外部环境... 如果我们在自己尚未掌握核心技术之前,企图依靠上市来筹集资金,然后使劲往某个领域砸钱,然
oracle 数据块结构 daizj oracle 块数据块块结构行目录
oracle 数据块是数据库存储的最小单位，一般为操作系统块的N倍。其结构为：块头－－〉空行－－〉数据，其实际为纵行结构。块的标准大小由初始化参数DB_BLOCK_SIZE指定。具有标准大小的块称为标准块（Standard Block）。块的大小和标准块的大小不同的块叫非标准块（Nonstandard Block）。同一数据库中，Oracle9i及以上版本支持同一数据库中同时使用标
github上一些觉得对自己工作有用的项目收集 dengkane github
github上一些觉得对自己工作有用的项目收集技能类 markdown语法中文说明回到顶部全文检索 elasticsearch bigdesk elasticsearch管理插件回到顶部 nosql mapdb 支持亿级别map, list, 支持事务. 可考虑做为缓存使用 C
初二上学期难记单词二 dcj3sjt126com english word
dangerous 危险的 panda 熊猫 lion 狮子 elephant 象 monkey 猴子 tiger 老虎 deer 鹿 snake 蛇 rabbit 兔子 duck 鸭 horse 马 forest 森林 fall 跌倒；落下 climb 爬；攀登 finish 完成；结束 cinema 电影院；电影 seafood 海鲜；海产食品 bank 银行
8、mysql外键(FOREIGN KEY)的简单使用 dcj3sjt126com mysql
一、基本概念 1、MySQL中“键”和“索引”的定义相同，所以外键和主键一样也是索引的一种。不同的是MySQL会自动为所有表的主键进行索引，但是外键字段必须由用户进行明确的索引。用于外键关系的字段必须在所有的参照表中进行明确地索引，InnoDB不能自动地创建索引。 2、外键可以是一对一的，一个表的记录只能与另一个表的一条记录连接，或者是一对多的，一个表的记录与另一个表的多条记录连接。 3、如
java循环标签 Foreach shuizhaosi888 标签 java循环 foreach
1. 简单的for循环 public static void main(String[] args) { for (int i = 1, y = i + 10; i < 5 && y < 12; i++, y = i * 2) { System.err.println("i=" + i + " y="
Spring Security（05）——异常信息本地化 234390216 exception Spring Security 异常信息本地化
异常信息本地化 Spring Security支持将展现给终端用户看的异常信息本地化，这些信息包括认证失败、访问被拒绝等。而对于展现给开发者看的异常信息和日志信息（如配置错误）则是不能够进行本地化的，它们是以英文硬编码在Spring Security的代码中的。在Spring-Security-core-x
DUBBO架构服务端告警Failed to send message Response javamingtingzhao 架构 DUBBO
废话不多说，警告日志如下，不知道有哪位遇到过，此异常在服务端抛出(服务器启动第一次运行会有这个警告)，后续运行没问题，找了好久真心不知道哪里错了。 WARN 2015-07-18 22:31:15,272 com.alibaba.dubbo.remoting.transport.dispatcher.ChannelEventRunnable.run(84)
JS中Date对象中几个用法 leeqq JavaScript Date 最后一天
近来工作中遇到这样的两个需求 1. 给个Date对象，找出该时间所在月的第一天和最后一天 2. 给个Date对象，找出该时间所在周的第一天和最后一天需求1中的找月第一天很简单，我记得api中有setDate方法可以使用使用setDate方法前，先看看getDate var date = new Date(); console.log(date); // Sat J
MFC中使用ado技术操作数据库你不认识的休道人 sql mfc
1.在stdafx.h中导入ado动态链接库 #import"C:\Program Files\Common Files\System\ado\msado15.dll" no_namespace rename("EOF","end")2.在CTestApp文件的InitInstance()函数中domodal之前写::CoIniti
Android Studio加速 rensanning android studio
Android Studio慢、吃内存！启动时后会立即通过Gradle来sync & build工程。（1）设置Android Studio a) 禁用插件 File -> Settings... Plugins 去掉一些没有用的插件。比如：Git Integration、GitHub、Google Cloud Testing、Google Cloud
各数据库的批量Update操作 tomcat_oracle java oracle sql mysql sqlite
MyBatis的update元素的用法与insert元素基本相同，因此本篇不打算重复了。本篇仅记录批量update操作的 sql语句，懂得SQL语句，那么MyBatis部分的操作就简单了。　　注意：下列批量更新语句都是作为一个事务整体执行，要不全部成功，要不全部回滚。 MSSQL的SQL语句　WITH R AS（　　SELECT 'John' as name, 18 as
html禁止清除input文本输入缓存 xp9802 input
多数浏览器默认会缓存input的值，只有使用ctl+F5强制刷新的才可以清除缓存记录。如果不想让浏览器缓存input的值，有2种方法：方法一：在不想使用缓存的input中添加 autocomplete="off"; eg: <input type="text" autocomplete="off" name