BRIEF HISTORY
历史简介
From time to time during my journey through the "IT" realm, I have been faced with the necessity to code a program or module to parse some source text and produce an array containing the tokens, operators and delimiters thereof, the ultimate goal being that of performing some type of analysis of the original sentence.
在遍历“ IT”领域的过程中,有时会遇到需要编写程序或模块以解析某些源文本并产生包含令牌,操作符和定界符的数组的必要性,最终目的是:对原始句子进行某种类型的分析的过程。
All started during the dinosaur era when I was tasked with coding a program to translate one flavor of “Business Basic” to another flavor of “Business Basic”. And surely there was the need to code a parsing routine to produce a structure that could be analyzed and cross-referenced in order to automatically produce the equivalent statements in the target syntax.
一切都始于恐龙时代,当时我负责编写一个程序,以将一种“ Business Basic”风格转换为另一种“ Business Basic”风格。 当然,有必要对解析例程进行编码,以生成可以进行分析和交叉引用的结构,以便自动生成目标语法中的等效语句。
Some years passed by and during a project to design a relational database we needed to create a metadata catalog from the source COBOL-based legacy application’s record definitions, working-storage areas, CICS screens, programs and other related structures . Consequently a parsing module was required.
在设计关系数据库的项目中以及过去的几年中,我们需要从源基于COBOL的旧应用程序的记录定义,工作存储区域,CICS屏幕,程序和其他相关结构中创建元数据目录。 因此,需要一个解析模块。
More years passed by, until recently I needed to analyze arithmetic formulas using the Shunting Yard algorithm and yet another parsing program was coded to go.
经过了很多年,直到最近我才需要使用Shunting Yard算法来分析算术公式,并且编写了另一个解析程序。
OVERVIEW
总览
According to Wikipedia, the term ”parsing” is used to refer to the formal analysis by a computer of a sentence or string of words and breaking it down into its constituents, resulting in a parse tree showing their syntactic relation to each other.
根据Wikipedia的说法,“解析”一词是指由计算机对句子或字符串进行形式化分析,然后将其分解为组成部分,从而形成一棵分析树,显示了彼此之间的句法关系。
In the market today you may find multiple “parser” programs, but all are integrated into some kind of application, compiler or program which primary focus is on the lexical analysis of the source text based on some syntax and grammatical rules.
在当今的市场上,您可能会找到多个“解析器”程序,但它们都集成到某种应用程序,编译器或程序中,其主要重点是基于某些语法和语法规则对源文本进行词法分析。
The Text2Token PL/SQL package I am sharing is the result of an attempt to code a simple program that will take text input and build a data structure useful for any type of analysis (or whatever). And which functionality based on user-defined operators and delimiters is limited to the task of splitting this sentence into its constituents.
我共享的Text2Token PL / SQL包是尝试编写一个简单程序的结果,该程序将接受文本输入并构建可用于任何类型的分析(或其他任何分析)的数据结构。 而基于用户定义的运算符和定界符的功能仅限于将这句话分成其组成部分的任务。
MODULE Components
模块组件
Variables, Constants and Arrays
变量,常量和数组
Name Type Description
--------------- --------------- ---------------------------------
Operator VARCHAR2(30) Default operators: '!^*/=+-&|'
Op_Association VARCHAR2(30) Operator associations: 'RRLLLLLLL'
Op_Precedence VARCHAR2(30) Operator precedence: '443322210'
Quote VARCHAR2(5) Quote characters: '"'''
Blank VARCHAR2(5) Space and tab characters: ' '||CHR(9)
Comma CHAR(1) Comma character: ','
Tok# PLS_INTEGER Count of Tokens
Ops# PLS_INTEGER Count of Operators
Res# PLS_INTEGER Count of Result elements
Text_Array TABLE OF VARCHAR2 Main type definition for arrays
Ops Text_Array Array of operators
Tokens Text_Array Tokenized source
Results Text_Array Result array
SUBPROGRAMS
子程序
Subprogram Description
-------------- ----------------------------------------------
Initialize Setup initial options
Tokenize Return array of string constituents
Parse_Csv Return field values of delimited string
Shunting Yard Return Shunting Yard array
INITIALIZE Procedure
初始化程序
This procedure initializes the following processing options:
此过程初始化以下处理选项:
Option Description
--------------- ---------------------------------------------------
Operator Change/Set the default list of operators
Op_Association Change/Set the corresponding association of operators
Op_Precedence Change/Set the operator precedence
Use_Quotes Enable quoted strings to be tokenized regardless of
embedded delimiters
Discard_Blanks Ignore spaces and tabs (exclude blanks from result array)
Include_Operators Includes operators in the result array
Debug_On Enables debugging messages
Syntax
句法
Text2Token.INITIALIZE (
P_Operator VARCHAR2 DEFAULT NULL
, P_Op_Association VARCHAR2 DEFAULT NULL
, P_Op_Precedence VARCHAR2 DEFAULT NULL
, P_Include_Operators BOOLEAN DEFAULT TRUE
, P_Use_Quotes BOOLEAN DEFAULT TRUE
, P_Discard_Blanks BOOLEAN DEFAULT TRUE
, P_Debug BOOLEAN DEFAULT FALSE);
TOKENIZE Function
代币功能
This is the main engine that separates the source text into its various components according to the optional operators or delimiters.
这是根据可选的运算符或定界符将源文本分为各种组件的主机。
Syntax
句法
Text2Token.Tokenize (
P_Source_Text VARCHAR2
, P_Delimiters VARCHAR2 DEFAULT NULL );
Returns
Text2Token.Tokens%TYPE;
PARSE_CSV Function
PARSE_CSV函数
Calls the Tokenize function and return an array containing the fields from the delimited source string.
调用Tokenize函数并返回一个数组,其中包含来自定界源字符串的字段。
Syntax
句法
Text2Token.Parse_Csv (
P_ P_Source_Text VARCHAR2
, P_Delimiters VARCHAR2 DEFAULT Comma );
Returns
Text2Token.Results%TYPE;
SHUNTING_YARD Function
SHUNTING_YARD函数
Calls the Tokenize function and returns an array containing the Shunting Yard array of elements.
调用Tokenize函数并返回一个包含Shunting Yard元素数组的数组。
Syntax
句法
Text2Token.Shunting_Yard (
P_ P_Source_Text VARCHAR2
, P_Delimiters VARCHAR2 DEFAULT Comma );
Returns
Text2Token.Results%TYPE;
Here is the code:
这是代码:
text2token-pkg.sql text2token-pkg.sqlEXAMPLE
例
SQL> DECLARE
2 V_Text VARCHAR2 ( 1000 );
3 V_Results Text2token.Results%TYPE;
4
5 PROCEDURE Print_Result ( P_Ttl VARCHAR2 )
6 IS
7 BEGIN
8 DBMS_OUTPUT.Put_Line ( '***** ' || P_Ttl || ' Results *****'||CHR(10)||'String ['||V_Text||']' );
9
10 FOR I IN 1 .. V_Results.COUNT
11 LOOP
12 DBMS_OUTPUT.Put_Line ( TO_CHAR ( I, '000.' ) ||' '|| V_Results ( I ) );
13 END LOOP;
14 END;
15 BEGIN
16 Text2token.Initialize ( );
17 V_Text := '3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3';
18 V_Results := Text2token.Tokenize ( V_Text );
19 Print_Result ( 'Tokenize' );
20
21 Text2token.Initialize ( );
22 V_Results := Text2token.Shurting_Yard ( V_Text );
23 Print_Result ( 'Shurting_Yard' );
24
25 Text2token.Initialize ( );
26 V_Text := 'We are,the people,out fishing,with,"O''Brien, Elka",at the lake.';
27 V_Results := Text2token.Parse_Csv ( V_Text );
28 Print_Result ( 'Parse_Csv' );
29
30 END;
31 /
***** Tokenize Results *****
String [3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3]
001. 3
002. +
003. 4
004. *
005. 2
006. /
007. (
008. 1
009. -
010. 5
011. )
012. ^
013. 2
014. ^
015. 3
***** Shurting_Yard Results *****
String [3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3]
001. 3
002. 4
003. 2
004. *
005. 1
006. 5
007. -
008. 2
009. 3
010. ^
011. ^
012. /
013. +
***** Parse_Csv Results *****
String [We are,the people,out fishing,with,"O'Brien, Elka",at the lake.]
001. We are
002. the people
003. out fishing
004. with
005. O'Brien, Elka
006. at the lake.
PL/SQL procedure successfully completed.
翻译自: https://www.experts-exchange.com/articles/13941/Ye-Olde-Generic-Parsing-Module.html