有时候在读电子文档的过程中,往往会遇到图片形式的文本,想要复制下来,记个笔记甚是不便,需要对照着打字输入,活生生被逼成键盘侠啊…
被逼无奈,何不自己造个轮子,开发一款自己专属的文字识别工具呢,于是我们找到了Matlab App Designer。
玩过 Matlab 的朋友们都知道,构建图形用户界面,Matlab提供了两种工具,一是用guide
构建,俗称GUI
,在未来版本中会移除;二是用App Designer
,俗称App
,这是官方推荐的,也是以后主流的框架。
今天我们就通过一个简单案例来介绍如何利用App
设计一个图片文字识别工具。
搭建的方式主要有两种:
uifigure
的编程方式:灵活、重构方便,适合构建复杂、大型的图形用户界面,原始社会方法。这里我们就以编程方式进行创建。
文字识别涉及到光学字符识别(Optical Character Recognition,OCR)技术,如果我们自己造这种底层的轮子,要有高精度的识别率,那估计累得够呛。
幸运的是市场上已经有成熟的工具了,如百度智能云、阿里云、科大讯飞等均提供了API接口,只需借过来用就完事。这里主要以百度智能云
提供的文字识别API为例。
免费申请文字识别功能后,在控制台可以查看到API Key
和Secret Key
,由这两个参数可以获得access_token
,它是调用API
接口的必需参数(如下图红色方框所示)。
通过查看文字识别的技术文档,我们可以得到通用文字识别(标准版)
的请求接口,如下:
HTTP 方法:
POST
请求URL:
https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic
URL参数:属性名:access_token,值:通过
API Key
和Secret Key
获取的access_token
,参考“Access Token获取”Header:属性名:Content-Type,值:application/x-www-form-urlencoded
请求参数:属性名:image,值:图像数据,base64编码后进行urlencode,要求base64编码和urlencode后大小不超过4M,最短边至少15px,最长边最大4096px,支持jpg/jpeg/png/bmp格式
返回参数:属性名:words_result,值:识别结果数组
关于具体的HTTP请求过程接下来会细聊。
Base64是网络上最常见的用于传输8Bit字节码的编码方式之一,它是包括小写字母
a-z
、大写字母A-Z
、数字0-9
、符号+
、/
共64个字符的字符集,等号=
用来作为后缀用途。任何符号都可以转换成这个字符集中的字符,该转换过程就叫做Base64编码。Base64编码具有不可读性,需要解码后才能阅读。
许多编程语言都提供了现成的Base64编码库函数,Matlab也不例外,大家不妨 help matlab.net.base64encode
查看细节。
下面提供三种Matlab中的实现方式:
function base64string = img2base64(fileName)
%IMG2BASE64 Coding an image to base64 file
% INPUTS:
% fileName string, an image file name
% OUTPUTS:
% base64string string, the input image's base64 code
% USAGE:
% >>base64string = img2base64('1.jpg')
% >>base64string = 'xxx'
%
try
fid = fopen(fileName, 'rb');
bytes = fread(fid);
fclose(fid);
% -------------------------------------------
% First method
% -------------------------------------------
encoder = org.apache.commons.codec.binary.Base64;
base64string = char(encoder.encode(bytes))';
% -------------------------------------------
% Second method
% -------------------------------------------
% base64string = matlab.net.base64encode(bytes);
catch
disp('The file does not exist!');
base64string = '';
end % end try
end % end function
base64
模块Matlab中可以直接使用Python,那Python中提供的模块base64
就可以直接使用了,源代码如下:
function base64string = img2base64_(fileName)
%IMG2BASE64 Coding an image to base64 file
% INPUTS:
% fileName string, an image file name
% OUTPUTS:
% base64string string, the input image's base64 code
% USAGE:
% >>base64string = img2base64('1.jpg')
% >>base64string = 'xxx'
%
try
f = py.open(fileName, 'rb');
bytes = f.read();
f.close();
temp = char(py.base64.b64encode(bytes));
temp = regexp(temp, '(?<=b'').+(?='')', 'match');
base64string = temp{1};
catch
disp('The file does not exist!');
base64string = '';
end % end try
end % end function
我们可以对如下所示的同一张图片(500 x 500)进行base64编码,比较一下编码速度:
结果:’/9j/4AAQSkZ…AAAAAAD/9k='
- Java类—org.apache.commons.codec.binary.Base64 ⏲ 0.000783 秒
- matlab.net.base64encode ⏲ 0.017589 秒
- Python
base64
模块 ⏲ 0.000709 秒
可以发现使用Java类和Python base64
模块的方法,速度相当,而使用matlab.net.base64encode
速度要慢20多倍,但编码一张大小为500 x 500的图像耗时0.02秒左右,其速度是非常之快了。
综合一下,我们推荐使用org.apache.commons.codec.binary.Base64
类进行base64编码。
识别扫描版pdf文档、视频教程等中的文字时,我们需要对待识别文字所在区域截个图,保存为图像再进行后续识别操作。要实现上述过程,首先需要对屏幕进行截图,Matlab通过借助java.awt.Robot
这个Java类来实现,截屏源代码如下所示:
function imgData = screenSnipping
%screenSnipping Capturel full-screen to an image
% Output:
% imgData, uint8, image data.
% Source code from: https://www.mathworks.com/support/search.html/answers/362358-how-do-i-take-a-screenshot-using-matlab.html?fq=asset_type_name:answer%20category:matlab/audio-and-video&page=1
% Modified: Qingpinwangzi
% Date: Apr 14, 2021.
% Take screen capture
robo = java.awt.Robot;
tk = java.awt.Toolkit.getDefaultToolkit();
rectSize = java.awt.Rectangle(tk.getScreenSize());
cap = robo.createScreenCapture(rectSize);
% Convert to an RGB image
rgb = typecast(cap.getRGB(0, 0, cap.getWidth, cap.getHeight, [], 0, cap.getWidth), 'uint8');
imgData = zeros(cap.getHeight, cap.getWidth, 3, 'uint8');
imgData(:, :, 1) = reshape(rgb(3:4:end), cap.getWidth, [])';
imgData(:, :, 2) = reshape(rgb(2:4:end), cap.getWidth, [])';
imgData(:, :, 3) = reshape(rgb(1:4:end), cap.getWidth, [])';
end
上述第1节中我们提到过,access_token
是调用API
接口的必需参数。通过阅读技术文档得知,需要API Key
和Secret Key
进行http请求就可以获得,核心代码如下:
url = ['https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=', apiKey, '&client_secret=', secretKey];
res = webread(url, options);
access_token = res.access_token;
有了access_token
我们就可以调用文字识别API进行文字识别了,这里再分享下识别文字的源代码:
function result = getWordsByBaiduOCR(fileName, apiKey, secretKey, accessToken, apiURL, outType)
%GETWORDSBYBAIDUOCR return recognition words
% INPUTS:
% fileName string, an image file name
% apiKey string, the API Key of the application
% secretKey string, The Secret Key of the application
% accessToken string, default is '', get the Access Token by API
% Key and Secret Key.
% apiURL string, such as:
% 'https://aip.baidubce.com/rest/2.0/ocr/v1/accurate'
% 'https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic'
% 'https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic'
% outType, 'MultiLine|SingleLine'
% OUTPUTS:
% result []|struct
% USAGE:
% >>result = getWordsByBaiduOCR(fileName, apiKey, secretKey, accessToken, apiURL)
% Date: Mar 18, 2021.
% Author: 清贫王子
%
options = weboptions('RequestMethod', 'post');
if isempty(outType)
outType = 'MultiLine';
end
if isempty(accessToken)
url = ['https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=', apiKey, '&client_secret=', secretKey];
res = webread(url, options);
access_token = res.access_token;
else
access_token = accessToken;
end % end if
url = [apiURL, '?access_token=', access_token];
options.HeaderFields = { 'Content-Type', 'application/x-www-form-urlencoded'};
imgBase64String = img2base64(fileName);
if isempty(imgBase64String)
result = '';
return
end % end if
res = webwrite(url, 'image', imgBase64String, options);
wordsRsult = res.words_result;
data.ocrResultChar = '';
if strcmp(outType, 'SingleLine')
for ii = 1 : size(wordsRsult, 1)
data.ocrResultChar = [data.ocrResultChar, wordsRsult(ii,1).words];
end % end for
elseif strcmp(outType, 'MultiLine')
for ii = 1 : size(wordsRsult, 1)
data.ocrResultChar{ii} = wordsRsult(ii,1).words;
end % end for
end
result = data.ocrResultChar;
end % end function
简单测试下这个函数,输入下面所示的图片,我们进行图片(截图地址:https://ww2.mathworks.cn/products/matlab/app-designer.html)中的文字识别。
>> result =
1×7 cell 数组
列 1 至 4
{'App设计工具帮助您…'} {'开发专业背景。您只…'} {'面(GUI)设计布局,…'} {'编程。'}
列 5 至 7
{'要共享App,您可以使…'} {' MATLAB Compile…'} {'桌面App或 Web App'}
>> result{1}
ans =
'App设计工具帮助您创建专业的App,同时并不要求软件'
识别结果中共有7个cell
,代表识别了图片中的7行
文字,即1个cell
对应1行
识别的文字,如result{1}
的结果。
以基于uifigure
的编程方式创建APP,我们推荐面向对象(OOP)方法编程,简单起见,这里主要封装一个类来实现所需的功能。当然更标准的做法是利用MVC
等设计模式将界面和逻辑分离,能达到对扩展开放,对修改封闭
的软件设计原则。
我们的功能需求非常简单,主要有以下两个功能:
实现第1个功能,我们只需要加载图像,然后调用识别函数进行识别,将识别结果显示到文本区域就可以了;而实现第2个功能,首先需要屏幕截图,选取待识别文字所在的区域,存储为图像,后续处理和实现第1个功能的一样。
根据上述描述,我们需要的控件有:加载图像按钮,截图按钮,图像显示器,识别结果显示文本域。另外,需要一个清理按钮,用于清除显示的图像和识别结果;还需要一个设置按钮,用于配置API Key
和Secret Key
。
便于叙述,我们先展示下最终设计的结果,如下图所示:
在设置界面中,需要两个标签和两个文本框,两外需要两个按钮。据此,我们需要的控件都清楚了,接下来让我们一起来创建他们吧!
主要封装一个类来实现所需的功能,我们给这个类起个名:ReadWords
,这个类需要继承matlab.apps.AppBase
,它的属性就是界面中的所有控件,那么这个类看上去应该是这样的:
classdef ReadWords < matlab.apps.AppBase
%%
properties
UIFig matlab.ui.Figure
ContainerForMain matlab.ui.container.GridLayout
ThisTB matlab.ui.container.Toolbar
SnippingToolBtn matlab.ui.container.toolbar.PushTool
ImgLoadToolBtn matlab.ui.container.toolbar.PushTool
SetupToolBtn matlab.ui.container.toolbar.PushTool
CleanToolBtn matlab.ui.container.toolbar.PushTool
ImgShow matlab.ui.control.Image
WordsShowTA matlab.ui.control.TextArea
ContainerForSetup matlab.ui.container.GridLayout
APIKeyText matlab.ui.control.EditField
SecrectKeyText matlab.ui.control.EditField
ResetBtn matlab.ui.control.Button
SaveBtn matlab.ui.control.Button
end % end properties
%%
properties(Hidden, Dependent)
APIKeyVal
SecrectKeyVal
end % end properties
%%
properties(Access = protected)
HasSetup = false
end % end properties
end % end classdef
下面说明下一些重要的属性
公有属性:
matlab.ui.Figure
类的属性,通过uifigure
构造,这是整个工具的主窗口matlab.ui.container.GridLayout
类的属性,通过uigridlayout
构造,这是主窗口的布局容器matlab.ui.container.Toolbar
类的属性,通过uitoolbar
构造,这是工具栏的容器,用于放置SnippingToolBtn
、ImgLoadToolBtn
、SetupToolBtn
、CleanToolBtn
这4个工具按钮matlab.ui.control.Image
类的属性,通过uiimage
构造,用于显示加载或者截图后的图像matlab.ui.control.TextArea
类的属性,通过uitextarea
构造,用于显示文字识别结果APIKey
和SecrectKey
APIKey
和SecrectKey
从属、隐藏属性:
APIKey
的值SecrectKey
的值受保护属性:
APIKey
和SecrectKey
,默认为false
至此,我们设置好了所有的属性,然后进行构造方法、析构方法以及类方法的编写。
加上构造方法、析构方法以及从属属性APIKeyVal
和SecrectKeyVal
的get
方法的代码后看上去是这样的:
classdef ReadWords < matlab.apps.AppBase
%%
properties
UIFig matlab.ui.Figure
ContainerForMain matlab.ui.container.GridLayout
ThisTB matlab.ui.container.Toolbar
SnippingToolBtn matlab.ui.container.toolbar.PushTool
ImgLoadToolBtn matlab.ui.container.toolbar.PushTool
SetupToolBtn matlab.ui.container.toolbar.PushTool
CleanToolBtn matlab.ui.container.toolbar.PushTool
ImgShow matlab.ui.control.Image
WordsShowTA matlab.ui.control.TextArea
ContainerForSetup matlab.ui.container.GridLayout
APIKeyText matlab.ui.control.EditField
SecrectKeyText matlab.ui.control.EditField
ResetBtn matlab.ui.control.Button
SaveBtn matlab.ui.control.Button
end % end properties
%%
properties(Hidden, Dependent)
APIKeyVal
SecrectKeyVal
end % end properties
%%
properties(Access = protected)
HasSetup = false
end % end properties
%%
methods
% --------------------------------------
% % Constructor
% --------------------------------------
function app = ReadWords
% Create UIFigure and components
app.buildApp();
% Register the app with App Designer
registerApp(app, app.UIFig)
if nargout == 0
clear app
end
end % end Constructor
% --------------------------------------
% % Destructor
% --------------------------------------
% Code that executes before app deletion
function delete(app)
% Delete UIFigure when app is deleted
delete(app.UIFig)
end % end Constructor
% --------------------------------------
% % Get/Set methods
% --------------------------------------
% get.APIKeyVal
function apiKeyVal = get.APIKeyVal(app)
apiKeyVal = app.APIKeyText.Value;
end
% get.SecrectKeyVal
function secrectKeyVal = get.SecrectKeyVal(app)
secrectKeyVal = app.SecrectKeyText.Value;
end
end % end methods
end % end classdef
析构方法(Destructor)的写法是固定的,构造方法中的registerApp(app, app.UIFig)
也是固定的,另外的buildApp()
方法就用来创建界面、注册各个控件。
我们将后续的方法都创建为私有方法,添加了buildApp()
方法后的整个ReadWords
类是下面这样的:
classdef ReadWords < matlab.apps.AppBase %% properties UIFig matlab.ui.Figure ContainerForMain matlab.ui.container.GridLayout ThisTB matlab.ui.container.Toolbar SnippingToolBtn matlab.ui.container.toolbar.PushTool ImgLoadToolBtn matlab.ui.container.toolbar.PushTool SetupToolBtn matlab.ui.container.toolbar.PushTool CleanToolBtn matlab.ui.container.toolbar.PushTool ImgShow matlab.ui.control.Image WordsShowTA matlab.ui.control.TextArea ContainerForSetup matlab.ui.container.GridLayout APIKeyText matlab.ui.control.EditField SecrectKeyText matlab.ui.control.EditField ResetBtn matlab.ui.control.Button SaveBtn matlab.ui.control.Button end % end properties %% properties(Hidden, Dependent) APIKeyVal SecrectKeyVal end % end properties %% properties(Access = protected) HasSetup = false end % end properties %% methods % -------------------------------------- % % Constructor % -------------------------------------- function app = ReadWords % Create UIFigure and components app.buildApp(); % Register the app with App Designer registerApp(app, app.UIFig) if nargout == 0 clear app end end % end Constructor % -------------------------------------- % % Destructor % -------------------------------------- % Code that executes before app deletion function delete(app) % Delete UIFigure when app is deleted delete(app.UIFig) end % end Constructor % -------------------------------------- % % Get/Set methods % -------------------------------------- % get.APIKeyVal function apiKeyVal = get.APIKeyVal(app) apiKeyVal = app.APIKeyText.Value; end % get.SecrectKeyVal function secrectKeyVal = get.SecrectKeyVal(app) secrectKeyVal = app.SecrectKeyText.Value; end end % end methods %% methods(Access = private) % buildApp function buildApp(app) % % -------------------------------------- % % Main Figure % -------------------------------------- app.UIFig = uifigure(); app.UIFig.Icon = 'icons/img2text.png'; app.UIFig.Name = 'ReadWords'; app.UIFig.Visible = 'off'; app.UIFig.Position = [app.UIFig.Position(1), app.UIFig.Position(2), 745, 420]; app.UIFig.AutoResizeChildren = 'on'; app.UIFig.Units = 'Normalized'; app.setAutoResize(app.UIFig, true); % -------------------------------------- % % Toolbar % -------------------------------------- app.ThisTB = uitoolbar(app.UIFig); % SetupToolBtn app.SetupToolBtn = uipushtool(app.ThisTB); app.SetupToolBtn.Icon = 'icons/setup.png'; app.SetupToolBtn.Tooltip = 'Setup'; % SnippingToolBtn app.SnippingToolBtn = uipushtool(app.ThisTB); app.SnippingToolBtn.Icon = 'icons/snip.png'; app.SnippingToolBtn.Tooltip = 'Screenshot'; % ImgLoadToolBtn app.ImgLoadToolBtn = uipushtool(app.ThisTB); app.ImgLoadToolBtn.Icon = 'icons/load.png'; app.ImgLoadToolBtn.Tooltip = 'Load image'; % CleanToolBtn app.CleanToolBtn = uipushtool(app.ThisTB); app.CleanToolBtn.Icon = 'icons/clean.png'; app.CleanToolBtn.Tooltip = 'Clean'; % -------------------------------------- % % ContainerForMain % -------------------------------------- app.ContainerForMain = uigridlayout(app.UIFig, [1, 2]); % ContainerForMain imgShowPanel = uipanel(app.ContainerForMain, 'Title', 'Original'); resultShowPanel = uipanel(app.ContainerForMain, 'Title', 'Result'); % ImgShow imgShowPanelLay = uigridlayout(imgShowPanel, [1, 1]); imgShowPanelLay.RowSpacing = 0; imgShowPanelLay.ColumnSpacing = 0; app.ImgShow = uiimage(imgShowPanelLay); % WordsShowTA resultShowPanelLay = uigridlayout(resultShowPanel, [1, 1]); resultShowPanelLay.RowSpacing = 0; resultShowPanelLay.ColumnSpacing = 0; app.WordsShowTA = uitextarea(resultShowPanelLay); app.WordsShowTA.FontSize = 22; % -------------------------------------- % % ContainerForSetup % -------------------------------------- app.ContainerForSetup = uigridlayout(app.UIFig, [4, 3]); app.ContainerForSetup.RowHeight = {22, 22, 22, '1x'}; app.ContainerForSetup.ColumnWidth = {'1x', '1x', '2.5x'}; app.ContainerForSetup.Visible = 'off'; apiKeyLabel = uilabel(app.ContainerForSetup, 'Text', 'API Key'); apiKeyLabel.HorizontalAlignment = 'right'; apiKeyLabel.Layout.Row = 1; apiKeyLabel.Layout.Column = 1; % APIKeyText app.APIKeyText = uieditfield(app.ContainerForSetup); app.APIKeyText.Layout.Row = 1; app.APIKeyText.Layout.Column = 2; secrectKeyLabel = uilabel(app.ContainerForSetup, 'Text', 'Secrect Key'); secrectKeyLabel.HorizontalAlignment = 'right'; secrectKeyLabel.Layout.Row = 2; secrectKeyLabel.Layout.Column = 1; % SecrectKeyText app.SecrectKeyText = uieditfield(app.ContainerForSetup); app.SecrectKeyText.Layout.Row = 2; app.SecrectKeyText.Layout.Column = 2; % ResetBtn app.ResetBtn = uibutton(app.ContainerForSetup, 'Text', 'Reset'); app.ResetBtn.Layout.Row = 3; app.ResetBtn.Layout.Column = 1; % SaveBtn app.SaveBtn = uibutton(app.ContainerForSetup, 'Text', 'Save'); app.SaveBtn.Layout.Row = 3; app.SaveBtn.Layout.Column = 2; % Set visibility for UIFig movegui(app.UIFig, 'center'); app.UIFig.Visible = 'on'; % -------------------------------------- % % RunstartupFcn % -------------------------------------- app.runStartupFcn(@startupFcn); end % end buildApp end % methodsend % end classdef
需要注意的是,工具栏按钮和窗口的图标来源于:https://www.easyicon.cc/。一些常见的图标素材都可以从中免费下载。我们已经将图标下载完毕,需要的朋友可以点击下方链接来下载:
链接:https://pan.baidu.com/s/11kIvt4SX-MhQ2ltEeC18ZA
提取码:5i3k
另外,app.runStartupFcn(@startupFcn);
语句调用的是父类matlab.apps.AppBase
的方法,我们将各个控件的注册任务放在startupFcn
这个方法中完成。这里不妨先注释掉这个语句,直接运行ReadWords.m
便可以显示出我们刚才在buildApp
方法中构造的界面了,动图演示如下:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oDSZ5OlZ-1632731967843)(https://i.loli.net/2021/09/26/hae3bNmIAcowu8Z.gif)]
可以看到,我们在点击工具栏各个按钮时,没有反应,这是因为到目前为止我们还没有给各个控件注册回调方法,那接下来将会在startupFcn
这个方法中完成各个控件的注册任务,代码如下:
classdef ReadWords < matlab.apps.AppBase %% properties UIFig matlab.ui.Figure ContainerForMain matlab.ui.container.GridLayout ThisTB matlab.ui.container.Toolbar SnippingToolBtn matlab.ui.container.toolbar.PushTool ImgLoadToolBtn matlab.ui.container.toolbar.PushTool SetupToolBtn matlab.ui.container.toolbar.PushTool CleanToolBtn matlab.ui.container.toolbar.PushTool ImgShow matlab.ui.control.Image WordsShowTA matlab.ui.control.TextArea ContainerForSetup matlab.ui.container.GridLayout APIKeyText matlab.ui.control.EditField SecrectKeyText matlab.ui.control.EditField ResetBtn matlab.ui.control.Button SaveBtn matlab.ui.control.Button end % end properties %% properties(Hidden, Dependent) APIKeyVal SecrectKeyVal end % end properties %% properties(Access = protected) HasSetup = false end % end properties %% methods % -------------------------------------- % % Constructor % -------------------------------------- function app = ReadWords % Create UIFigure and components app.buildApp(); % Register the app with App Designer registerApp(app, app.UIFig) if nargout == 0 clear app end end % end Constructor % -------------------------------------- % % Destructor % -------------------------------------- % Code that executes before app deletion function delete(app) % Delete UIFigure when app is deleted delete(app.UIFig) end % end Constructor % -------------------------------------- % % Get/Set methods % -------------------------------------- % get.APIKeyVal function apiKeyVal = get.APIKeyVal(app) apiKeyVal = app.APIKeyText.Value; end % get.SecrectKeyVal function secrectKeyVal = get.SecrectKeyVal(app) secrectKeyVal = app.SecrectKeyText.Value; end end % end methods %% methods(Access = private) % buildApp function buildApp(app) % % -------------------------------------- % % Main Figure % -------------------------------------- app.UIFig = uifigure(); app.UIFig.Icon = 'icons/img2text.png'; app.UIFig.Name = 'ReadWords'; app.UIFig.Visible = 'off'; app.UIFig.Position = [app.UIFig.Position(1), app.UIFig.Position(2), 745, 420]; app.UIFig.AutoResizeChildren = 'on'; app.UIFig.Units = 'Normalized'; app.setAutoResize(app.UIFig, true); % -------------------------------------- % % Toolbar % -------------------------------------- app.ThisTB = uitoolbar(app.UIFig); % SetupToolBtn app.SetupToolBtn = uipushtool(app.ThisTB); app.SetupToolBtn.Icon = 'icons/setup.png'; app.SetupToolBtn.Tooltip = 'Setup'; % SnippingToolBtn app.SnippingToolBtn = uipushtool(app.ThisTB); app.SnippingToolBtn.Icon = 'icons/snip.png'; app.SnippingToolBtn.Tooltip = 'Screenshot'; % ImgLoadToolBtn app.ImgLoadToolBtn = uipushtool(app.ThisTB); app.ImgLoadToolBtn.Icon = 'icons/load.png'; app.ImgLoadToolBtn.Tooltip = 'Load image'; % CleanToolBtn app.CleanToolBtn = uipushtool(app.ThisTB); app.CleanToolBtn.Icon = 'icons/clean.png'; app.CleanToolBtn.Tooltip = 'Clean'; % -------------------------------------- % % ContainerForMain % -------------------------------------- app.ContainerForMain = uigridlayout(app.UIFig, [1, 2]); % ContainerForMain imgShowPanel = uipanel(app.ContainerForMain, 'Title', 'Original'); resultShowPanel = uipanel(app.ContainerForMain, 'Title', 'Result'); % ImgShow imgShowPanelLay = uigridlayout(imgShowPanel, [1, 1]); imgShowPanelLay.RowSpacing = 0; imgShowPanelLay.ColumnSpacing = 0; app.ImgShow = uiimage(imgShowPanelLay); % WordsShowTA resultShowPanelLay = uigridlayout(resultShowPanel, [1, 1]); resultShowPanelLay.RowSpacing = 0; resultShowPanelLay.ColumnSpacing = 0; app.WordsShowTA = uitextarea(resultShowPanelLay); app.WordsShowTA.FontSize = 22; % -------------------------------------- % % ContainerForSetup % -------------------------------------- app.ContainerForSetup = uigridlayout(app.UIFig, [4, 3]); app.ContainerForSetup.RowHeight = {22, 22, 22, '1x'}; app.ContainerForSetup.ColumnWidth = {'1x', '1x', '2.5x'}; app.ContainerForSetup.Visible = 'off'; apiKeyLabel = uilabel(app.ContainerForSetup, 'Text', 'API Key'); apiKeyLabel.HorizontalAlignment = 'right'; apiKeyLabel.Layout.Row = 1; apiKeyLabel.Layout.Column = 1; % APIKeyText app.APIKeyText = uieditfield(app.ContainerForSetup); app.APIKeyText.Layout.Row = 1; app.APIKeyText.Layout.Column = 2; secrectKeyLabel = uilabel(app.ContainerForSetup, 'Text', 'Secrect Key'); secrectKeyLabel.HorizontalAlignment = 'right'; secrectKeyLabel.Layout.Row = 2; secrectKeyLabel.Layout.Column = 1; % SecrectKeyText app.SecrectKeyText = uieditfield(app.ContainerForSetup); app.SecrectKeyText.Layout.Row = 2; app.SecrectKeyText.Layout.Column = 2; % ResetBtn app.ResetBtn = uibutton(app.ContainerForSetup, 'Text', 'Reset'); app.ResetBtn.Layout.Row = 3; app.ResetBtn.Layout.Column = 1; % SaveBtn app.SaveBtn = uibutton(app.ContainerForSetup, 'Text', 'Save'); app.SaveBtn.Layout.Row = 3; app.SaveBtn.Layout.Column = 2; % Set visibility for UIFig movegui(app.UIFig, 'center'); app.UIFig.Visible = 'on'; % -------------------------------------- % % RunstartupFcn % -------------------------------------- app.runStartupFcn(@startupFcn); end % end buildApp % startupFcn function startupFcn(app, ~, ~) % Setup APIKeyText and SecrectKeyText if exist('apikey.mat', 'file') temp = load('apikey.mat'); app.APIKeyText.Value = temp.key.apiKeyVal; app.APIKeyText.Editable = 'off'; app.SecrectKeyText.Value = temp.key.secrectKeyVal; app.SecrectKeyText.Editable = 'off'; end % Register callback app.SnippingToolBtn.ClickedCallback = @app.clickedSnippingToolBtn; app.ImgLoadToolBtn.ClickedCallback = @app.clickedImgLoadToolBtn; app.SetupToolBtn.ClickedCallback = @app.clickedSetupToolBtn; app.CleanToolBtn.ClickedCallback = @app.clickedCleanToolBtn; app.ResetBtn.ButtonPushedFcn = @app.callbackResetBtn; app.SaveBtn.ButtonPushedFcn = @app.callbackSaveBtn; end % end function end % methodsend % end classdef
由此,我们总共为6个按钮注册了6个回调方法,需要都进行实现,不然触发按钮时,该按钮不会做出响应。简单起见,这里我们以实现设置界面中的SaveBtn
的回调方法callbackSaveBtn
为例子来说明。
在没有设置APIKey
或SecrectKey
前,触发SnippingToolBtn
或者ImgLoadToolBtn
会有先进行设置的提示:
callbackSaveBtn
方法实现的逻辑:首先由HasSetup
属性判断是否进行了APIKey
和SecrectKey
的设置(初始默认是false
没有设置),如果没有设置,会提示没有APIKey
或SecrectKey
,则需要输入APIKey
和SecrectKey
的值,然后点击保存按钮,那么后台会将获取到的值存储下来(.mat文件),更新HasSetup
的值为true
,后续我们就不必要再次输入了,要想更换值的话,点击重置按钮重新配置即可;如果进行了设置(HasSetup
属性为true
),直接保存即可。
具体的代码如下:
% --------------------------------------% % Callback functions% --------------------------------------% callbackSaveBtnfunction callbackSaveBtn(app, ~, ~) if ~isempty(app.SecrectKeyText.Value) && ~isempty(app.APIKeyText.Value) key.apiKeyVal = app.APIKeyText.Value; key.secrectKeyVal = app.SecrectKeyText.Value; if exist('apikey.mat', 'file') delete('apikey.mat'); end save('apikey.mat', 'key'); !attrib +s +h apikey.mat uialert(app.UIFig, 'Save successfully!', 'Confirm', 'Icon', 'success'); app.APIKeyText.Editable = 'off'; app.SecrectKeyText.Editable = 'off'; else uialert(app.UIFig, 'API Key or Secrect Key is empty!', 'Confirm', 'Icon', 'warning'); end % end ifend % callbackSaveBtn
实现了保存按钮的功能后,就可以得到如下动图所示的效果了。
其他的回调函数源代码:
% clickedSnippingToolBtnfunction clickedSnippingToolBtn(app, ~, ~) if ~isempty(app.SecrectKeyText.Value) && ~isempty(app.APIKeyText.Value) app.UIFig.Visible = 'off'; pause(0.1); outFileName = 'temp.png'; cropImg(outFileName); !attrib +s +h temp.png % app.ImgShow.ImageSource = imread(outFileName); app.UIFig.Visible = 'on'; % apiURL = 'https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic'; words = getWordsByBaiduOCR(outFileName, app.APIKeyVal, app.SecrectKeyVal, '', apiURL, 'MultiLine'); app.WordsShowTA.Value = words; else msg = {'API Key or Secrect Key is empty!'; 'Please set it up first!'}; uialert(app.UIFig, msg, 'Confirm', 'Icon', 'warning'); endend % end clickedSnippingToolBtn% clickedImgLoadToolBtnfunction clickedImgLoadToolBtn(app, ~, ~) if ~isempty(app.SecrectKeyText.Value) && ~isempty(app.APIKeyText.Value) [fName, fPath] = uigetfile({'*.png'; '*.jpg'; '*.bmp'; '*.tif'}, 'Open image'); if ~isequal(any([fName, fPath]), 0) img = imread(strcat(fPath, fName)); outFileName = 'temp.png'; if exist(outFileName, 'file') delete(outFileName) end imwrite(img, outFileName); !attrib +s +h temp.png % app.ImgShow.ImageSource = imread(outFileName); app.UIFig.Visible = 'on'; % apiURL = 'https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic'; words = getWordsByBaiduOCR(outFileName, app.APIKeyVal, app.SecrectKeyVal, '', apiURL, 'MultiLine'); app.WordsShowTA.Value = words; else return end % end if else % end if msg = {'API Key or Secrect Key is empty!'; 'Please set it up first!'}; uialert(app.UIFig, msg, 'Confirm', 'Icon', 'warning'); endend % end clickedImgLoadToolBtn% clickedSetupToolBtnfunction clickedSetupToolBtn(app, ~, ~) if ~app.HasSetup app.ContainerForMain.Visible = 'off'; app.ContainerForSetup.Visible = 'on'; app.HasSetup = true; else app.ContainerForMain.Visible = 'on'; app.ContainerForSetup.Visible = 'off'; app.HasSetup = false; endend % end clickedSetupToolBtn% clickedCleanToolBtnfunction clickedCleanToolBtn(app, ~, ~) app.WordsShowTA.Value = ''; app.ImgShow.ImageSource = '';end % end clickedCleanToolBtn% callbackResetBtnfunction callbackResetBtn(app, ~, ~) app.APIKeyText.Value = ''; app.APIKeyText.Editable = 'on'; app.SecrectKeyText.Value = ''; app.SecrectKeyText.Editable = 'on';end % callbackResetBtn
现在让我们来测试一下搭建的图像识别工具吧,比如,某麻子同学是一名研究生,在阅读那种扫描版的pdf文献时,想把其中的一段语句复制下来用于记录笔记或者做PPT用,这时我们的工具就派上用场了:
刹那间,某麻子同学得到了想要的结果,露出了久违的幸福的一笑!
至此,我们完成了一个比较完整的文字识别工具!希望您喜欢,并且可以从中获得有用的东西。
本文完整代码,请在gh内回复“文字识别工具”进行下载。
【往期推荐】