Ye Olde通用解析模块

BRIEF HISTORY

历史简介

From time to time during my journey through the "IT" realm, I have been faced with the necessity to code a program or module to parse some source text and produce an array containing the tokens, operators and delimiters thereof, the ultimate goal being that of performing some type of analysis of the original sentence.

在遍历“ IT”领域的过程中,有时会遇到需要编写程序或模块以解析某些源文本并产生包含令牌,操作符和定界符的数组的必要性,最终目的是:对原始句子进行某种类型的分析的过程。

All started during the dinosaur era when I was tasked with coding a program to translate one flavor of “Business Basic” to another flavor of “Business Basic”. And surely there was the need to code a parsing routine to produce a structure that could be analyzed and cross-referenced in order to automatically produce the equivalent statements in the target syntax.

一切都始于恐龙时代,当时我负责编写一个程序,以将一种“ Business Basic”风格转换为另一种“ Business Basic”风格。 当然,有必要对解析例程进行编码,以生成可以进行分析和交叉引用的结构,以便自动生成目标语法中的等效语句。

Some years passed by and during a project to design a relational database we needed to create a metadata catalog from the source COBOL-based legacy application’s record definitions, working-storage areas, CICS screens, programs and other related structures . Consequently a parsing module was required.

在设计关系数据库的项目中以及过去的几年中,我们需要从源基于COBOL的旧应用程序的记录定义,工作存储区域,CICS屏幕,程序和其他相关结构中创建元数据目录。 因此,需要一个解析模块。

More years passed by, until recently I needed to analyze arithmetic formulas using the Shunting Yard algorithm and yet another parsing program was coded to go.

经过了很多年,直到最近我才需要使用Shunting Yard算法来分析算术公式,并且编写了另一个解析程序。

OVERVIEW

总览

According to Wikipedia, the term ”parsing” is used to refer to the formal analysis by a computer of a sentence or string of words and breaking it down into its constituents, resulting in a parse tree showing their syntactic relation to each other.

根据Wikipedia的说法,“解析”一词是指由计算机对句子或字符串进行形式化分析,然后将其分解为组成部分,从而形成一棵分析树,显示了彼此之间的句法关系。

In the market today you may find multiple “parser” programs, but all are integrated into some kind of application, compiler or program which primary focus is on the lexical analysis of the source text based on some syntax and grammatical rules.

在当今的市场上,您可能会找到多个“解析器”程序,但它们都集成到某种应用程序,编译器或程序中,其主要重点是基于某些语法和语法规则对源文本进行词法分析。

The Text2Token PL/SQL package I am sharing is the result of an attempt to code a simple program that will take text input and build a data structure useful for any type of analysis (or whatever). And which functionality based on user-defined operators and delimiters is limited to the task of splitting this sentence into its constituents.

我共享的Text2Token PL / SQL包是尝试编写一个简单程序的结果,该程序将接受文本输入并构建可用于任何类型的分析(或其他任何分析)的数据结构。 而基于用户定义的运算符和定界符的功能仅限于将这句话分成其组成部分的任务。

MODULE Components

模块组件

Variables, Constants and Arrays

变量,常量和数组

Name		Type		Description
--------------- --------------- ---------------------------------
Operator	VARCHAR2(30)	Default operators: '!^*/=+-&|'
Op_Association	VARCHAR2(30)	Operator associations: 'RRLLLLLLL'
Op_Precedence	VARCHAR2(30)	Operator precedence: '443322210'
Quote		VARCHAR2(5)	Quote characters: '"'''
Blank		VARCHAR2(5)	Space and tab characters: ' '||CHR(9)
Comma		CHAR(1)		Comma character: ','
Tok#		PLS_INTEGER	Count of Tokens
Ops#		PLS_INTEGER	Count of Operators
Res#		PLS_INTEGER	Count of Result elements

Text_Array 	TABLE OF VARCHAR2	Main type definition for arrays

Ops		Text_Array	Array of operators
Tokens		Text_Array	Tokenized source
Results		Text_Array	Result array

SUBPROGRAMS

子程序

Subprogram	Description
--------------  ----------------------------------------------
Initialize	Setup initial options
Tokenize 	Return array of string constituents
Parse_Csv	Return field values of delimited string
Shunting Yard	Return Shunting Yard array

INITIALIZE Procedure

初始化程序

This procedure initializes the following processing options:

此过程初始化以下处理选项:

Option		   Description
---------------    ---------------------------------------------------
Operator	   Change/Set the default list of operators
Op_Association	   Change/Set the corresponding association of operators
Op_Precedence	   Change/Set the operator precedence
Use_Quotes	   Enable quoted strings to be tokenized regardless of
		   embedded delimiters
Discard_Blanks	   Ignore spaces and tabs (exclude blanks from result array) 
Include_Operators  Includes operators in the result array
Debug_On	   Enables debugging messages

Syntax

句法

Text2Token.INITIALIZE (
		  P_Operator		VARCHAR2 DEFAULT NULL
		, P_Op_Association	VARCHAR2 DEFAULT NULL
		, P_Op_Precedence	VARCHAR2 DEFAULT NULL
		, P_Include_Operators	BOOLEAN DEFAULT TRUE
		, P_Use_Quotes		BOOLEAN DEFAULT TRUE
		, P_Discard_Blanks	BOOLEAN DEFAULT TRUE
		, P_Debug		BOOLEAN DEFAULT FALSE);

TOKENIZE Function

代币功能

This is the main engine that separates the source text into its various components according to the optional operators or delimiters.

这是根据可选的运算符或定界符将源文本分为各种组件的主机。

Syntax

句法

Text2Token.Tokenize (
	 P_Source_Text	VARCHAR2
	, P_Delimiters	VARCHAR2 DEFAULT NULL );

Returns 
	Text2Token.Tokens%TYPE;

PARSE_CSV Function

PARSE_CSV函数

Calls the Tokenize function and return an array containing the fields from the delimited source string.

调用Tokenize函数并返回一个数组,其中包含来自定界源字符串的字段。

Syntax

句法

Text2Token.Parse_Csv (
	  P_ P_Source_Text	 VARCHAR2
	, P_Delimiters	VARCHAR2 DEFAULT Comma );

Returns 
	Text2Token.Results%TYPE;

SHUNTING_YARD Function

SHUNTING_YARD函数

Calls the Tokenize function and returns an array containing the Shunting Yard array of elements.

调用Tokenize函数并返回一个包含Shunting Yard元素数组的数组。

Syntax

句法

Text2Token.Shunting_Yard (
	  P_ P_Source_Text	 VARCHAR2
	, P_Delimiters	VARCHAR2 DEFAULT Comma );

Returns 
	Text2Token.Results%TYPE;

Here is the code:

这是代码:

text2token-pkg.sql text2token-pkg.sql

EXAMPLE

SQL> DECLARE
  2      V_Text                  VARCHAR2 ( 1000 );
  3      V_Results               Text2token.Results%TYPE;
  4
  5      PROCEDURE Print_Result ( P_Ttl VARCHAR2 )
  6      IS
  7      BEGIN
  8          DBMS_OUTPUT.Put_Line ( '***** ' || P_Ttl || ' Results *****'||CHR(10)||'String ['||V_Text||']' );
  9
 10          FOR I IN 1 .. V_Results.COUNT
 11          LOOP
 12              DBMS_OUTPUT.Put_Line ( TO_CHAR ( I, '000.' ) ||'  '|| V_Results ( I ) );
 13          END LOOP;
 14      END;
 15  BEGIN
 16      Text2token.Initialize ( );
 17      V_Text           := '3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3';
 18      V_Results        := Text2token.Tokenize ( V_Text );
 19      Print_Result ( 'Tokenize' );
 20
 21      Text2token.Initialize ( );
 22      V_Results        := Text2token.Shurting_Yard ( V_Text );
 23      Print_Result ( 'Shurting_Yard' );
 24
 25      Text2token.Initialize ( );
 26      V_Text           := 'We are,the people,out fishing,with,"O''Brien, Elka",at the lake.';
 27      V_Results        := Text2token.Parse_Csv ( V_Text );
 28      Print_Result ( 'Parse_Csv' );
 29
 30  END;
 31  /
***** Tokenize Results *****
String [3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3]
001.  3
002.  +
003.  4
004.  *
005.  2
006.  /
007.  (
008.  1
009.  -
010.  5
011.  )
012.  ^
013.  2
014.  ^
015.  3
***** Shurting_Yard Results *****
String [3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3]
001.  3
002.  4
003.  2
004.  *
005.  1
006.  5
007.  -
008.  2
009.  3
010.  ^
011.  ^
012.  /
013.  +
***** Parse_Csv Results *****
String [We are,the people,out fishing,with,"O'Brien, Elka",at the lake.]
001.   We are
002.   the people
003.   out fishing
004.   with
005.   O'Brien, Elka
006.   at the lake.

PL/SQL procedure successfully completed.

翻译自: https://www.experts-exchange.com/articles/13941/Ye-Olde-Generic-Parsing-Module.html

你可能感兴趣的:(python,java,c++,编程语言,数据结构)