本文主要讲如何把一个数学表达式转化成语法树,并通过语法树来解出结果。
这个主要是用来学习语法树的构建,数学表达式只是一个引子,本项目可以引申到 HTTP 请求报文解析,SQL语句的解析的诸如此类的拥有固定规则的字符串的解析。
我们想想,对于 1 + 2 * 3
这个表达式,它的值是7。但是如果你拿到的是一串字符串,那么你要如何用C ++这样的语言来解析呢?首先,这是一个所谓的 “中缀” 符号。还有前缀和后缀表示法。术语“中缀”,“前缀”和“后缀”指的是与操作数相关的运算符的位置:
但是很明显,我们没有办法使用程序通过中序遍历做什么,因为表达式里通常包含优先级的运算,这使得中序遍历并不能提前做什么操作。
因此,我们必须借助别的工具来辅助,常用的方式有两种
但是,对于数学表达式来说,两种方法都可以做到,但是对于别的句式进行解析,逆波兰表达式就显得不那么方便了。因此我们这里通过建立抽象语法树来对数学表达式进行解析。
首先我们定义出一种递归的语法
EXP -> EXP + EXP | EXP - EXP | EXP * EXP | EXP / EXP |
- EXP | ( EXP ) | number | sin( EXP ) | cos( EXP )
但是很显然,这种方式并不能体现表达式的优先级,因此我们改进语法:
EXP -> EXP + TERM |
EXP - TERM |
TERM
TERM -> TERM * FACTOR |
TERM / FACTOR |
FACTOR
FACTOR -> ( EXP ) | - EXP | number |
sin( EXP ) | cos( EXP )
现在这种语法是可以表示出表达式的优先级,但是还有一个问题,这种语法是一个左递归的语法,因此我们还需要对其进行改进:
EXP -> TERM EXP1
EXP1 -> + TERM EXP1 |
- TERM EXP1 |
null
TERM -> FACTOR TERM1
TERM1 -> * FACTOR TERM1 |
/ FACTOR TERM1 |
null
FACTOR -> ( EXP ) | - EXP | number |
sin( EXP ) | cos( EXP )
我们这里将使用 Java 进行编写示例,但是你可以把它翻译成任何语言的代码
Parser类定义:
public class Parser {
private Token m_crtToken;
private final String m_Text;
private int m_Index;
private Parser(String str) {...}
public static ASTNode parse(String expr) {...}
private ASTNode Expression() {...}
private ASTNode Expression1() {...}
private ASTNode Term() {...}
private ASTNode Term1() {...}
private ASTNode Factor() {...}
private void Match(char expected) {...}
private void SkipWhitespaces() {...}
private void GetNextToken() {...}
private double GetNumber() {...}
private boolean isSpace(char ch) {...}
private boolean isDigit(char ch) {...}
}
其中
Parser.java
// Parser.java
package expr_parser;
public class Parser {
private Token m_crtToken;
private final String m_Text;
private int m_Index;
public Parser(String str) {
m_Text = str + "#";
m_Index = 0;
m_crtToken = new Token();
}
public static ASTNode parse(String expr) throws ParserException {
Parser parser = new Parser(expr);
parser.GetNextToken();
return parser.Expression();
}
private ASTNode Expression() throws ParserException {
ASTNode t_node = Term();
ASTNode e1_node = Expression1();
return new ASTNode(ASTNodeType.OPERATOR_PLUS, 0, t_node, e1_node);
}
private ASTNode Expression1() throws ParserException {
ASTNode t_node;
ASTNode e1_node;
switch (m_crtToken.type) {
case PLUS:
GetNextToken();
t_node = Term();
e1_node = Expression1();
return new ASTNode(ASTNodeType.OPERATOR_PLUS, 0, t_node, e1_node);
case MINUS:
GetNextToken();
t_node = Term();
e1_node = Expression1();
return new ASTNode(ASTNodeType.OPERATOR_MINUS, 0, t_node, e1_node);
default:
return new ASTNode(ASTNodeType.NUMBER_VALUE, 0, null, null);
}
}
private ASTNode Term() throws ParserException {
ASTNode f_node = Factor();
// GetNextToken();
ASTNode t1_node = Term1();
return new ASTNode(ASTNodeType.OPERATOR_MUL, 0, f_node, t1_node);
}
private ASTNode Term1() throws ParserException {
ASTNode t_node;
ASTNode e1_node;
switch (m_crtToken.type) {
case MUL:
GetNextToken();
t_node = Factor();
e1_node = Term1();
return new ASTNode(ASTNodeType.OPERATOR_MUL, 0, t_node, e1_node);
case DIV:
GetNextToken();
t_node = Factor();
e1_node = Term1();
return new ASTNode(ASTNodeType.OPERATOR_DIV, 0, t_node, e1_node);
default:
return new ASTNode(ASTNodeType.NUMBER_VALUE, 1, null, null);
}
}
private ASTNode Factor() throws ParserException {
ASTNode node;
switch (m_crtToken.type) {
case OPEN_PARENTHESIS:
GetNextToken();
node = Expression();
Match(')');
return node;
case MINUS:
GetNextToken();
node = Factor();
return new ASTNode(ASTNodeType.UNARY_MINUS, 0, node, null);
case NUMBER:
double number = m_crtToken.value;
GetNextToken();
return new ASTNode(ASTNodeType.NUMBER_VALUE, number, null, null);
case SIN:
GetNextToken();
node = Expression();
Match(')');
return new ASTNode(ASTNodeType.OPERATOR_SIN, 0, node, null);
case COS:
GetNextToken();
node = Expression();
Match(')');
return new ASTNode(ASTNodeType.OPERATOR_COS, 0, node, null);
default:
String err_msg = "Unexpected token '" + m_Text.charAt(m_Index) + "' at position " + m_Index;
throw new ParserException(err_msg, m_Index);
}
}
private void Match(char expected) throws ParserException {
if (m_Text.charAt(m_Index - 1) == expected)
GetNextToken();
else {
String err_msg = "Unexpected token '" + m_Text.charAt(m_Index) + "' at position " + m_Index;
throw new ParserException(err_msg, m_Index);
}
}
private void SkipWhitespaces() {
while (isSpace(m_Text.charAt(m_Index))) m_Index++;
}
private void GetNextToken() throws ParserException {
// Ignore white spaces
SkipWhitespaces();
m_crtToken.value = 0;
m_crtToken.symbol = 0;
// Test for the end of test
if (m_Text.charAt(m_Index) == '#') {
m_crtToken.type = TokenType.EOT;
return;
}
if (isDigit(m_Text.charAt(m_Index))) {
m_crtToken.type = TokenType.NUMBER;
m_crtToken.value = GetNumber();
return;
}
m_crtToken.type = TokenType.ERROR;
switch (m_Text.charAt(m_Index)) {
case '+': m_crtToken.type = TokenType.PLUS; break;
case '-': m_crtToken.type = TokenType.MINUS; break;
case '*': m_crtToken.type = TokenType.MUL; break;
case '/': m_crtToken.type = TokenType.DIV; break;
case '(': m_crtToken.type = TokenType.OPEN_PARENTHESIS; break;
case ')': m_crtToken.type = TokenType.CLOSE_PARENTHESIS; break;
case 's':
if (m_Text.substring(m_Index, m_Index + 4).equals("sin(")) {
m_crtToken.type = TokenType.SIN;
m_Index += 3;
}
break;
case 'c':
if (m_Text.substring(m_Index, m_Index + 4).equals("cos(")) {
m_crtToken.type = TokenType.COS;
m_Index += 3;
}
break;
}
if (m_crtToken.type != TokenType.ERROR) {
m_crtToken.symbol = m_Text.charAt(m_Index);
m_Index++;
} else {
String err_msg = "Unexpected token '" + m_Text.charAt(m_Index) + "' at position " + m_Index;
throw new ParserException(err_msg, m_Index);
}
}
private double GetNumber() throws ParserException {
SkipWhitespaces();
int index = m_Index;
while (isDigit(m_Text.charAt(m_Index))) m_Index++;
if (m_Text.charAt(m_Index) == '.') m_Index++;
while (isDigit(m_Text.charAt(m_Index))) m_Index++;
if (m_Index - index == 0)
throw new ParserException("Number expected but not found!", m_Index);
String buffer = m_Text.substring(index, m_Index);
return Double.valueOf(buffer);
}
private boolean isSpace(char ch) {
return ch == ' ';
}
private boolean isDigit(char ch) {
return ch >= '0' && ch <= '9';
}
}
enum TokenType {
ERROR,
PLUS,
MINUS,
MUL,
DIV,
SIN,
COS,
EOT,
OPEN_PARENTHESIS,
CLOSE_PARENTHESIS,
NUMBER
}
class Token {
TokenType type;
double value;
char symbol;
Token() {
type = TokenType.ERROR;
value = 0;
}
}
ASTNode.java
// ASTNode.java
package expr_parser;
public class ASTNode {
private ASTNodeType type;
private double value;
private ASTNode leftChild;
private ASTNode rightChild;
public ASTNode() {
type = ASTNodeType.UNDEFINED;
value = 0;
leftChild = null;
rightChild = null;
}
public ASTNode(ASTNodeType type, double value, ASTNode leftChild, ASTNode rightChild) {
this.type = type;
this.value = value;
this.leftChild = leftChild;
this.rightChild = rightChild;
}
@Override
public String toString() {
String str = "--------------------------------------------\n";
switch (type) {
case NUMBER_VALUE:
str += "node_type: NUMBER_VALUE\n";
str += "value: " + value + "\n";
break;
case OPERATOR_PLUS:
str += "node_type: OPERATOR_PLUS\n";
break;
case OPERATOR_MINUS:
str += "node_type: OPERATOR_MINUS\n";
break;
case OPERATOR_MUL:
str += "node_type: OPERATOR_MUL\n";
break;
case OPERATOR_DIV:
str += "node_type: OPERATOR_DIV\n";
break;
case OPERATOR_SIN:
str += "node_type: OPERATOR_SIN\n";
break;
case OPERATOR_COS:
str += "node_type: OPERATOR_COS\n";
break;
default:
str += "ERROR!!!!!!!!!!!!!!\n";
break;
}
if (leftChild != null) {
str += "left_child";
str += leftChild.toString();
} else
str += "left_child is null\n";
if (rightChild != null) {
str += "right_child";
str += rightChild.toString();
} else
str += "right_child is null\n";
str += "--------------------------------------------\n";
return str;
}
public ASTNodeType getType() {
return type;
}
public void setType(ASTNodeType type) {
this.type = type;
}
public double getValue() {
return value;
}
public void setValue(double value) {
this.value = value;
}
public ASTNode getLeftChild() {
return leftChild;
}
public void setLeftChild(ASTNode leftChild) {
this.leftChild = leftChild;
}
public ASTNode getRightChild() {
return rightChild;
}
public void setRightChild(ASTNode rightChild) {
this.rightChild = rightChild;
}
}
ASTNodeType.java
// ASTNodeType.java
package expr_parser;
public enum ASTNodeType {
UNDEFINED,
OPERATOR_PLUS,
OPERATOR_MINUS,
OPERATOR_MUL,
OPERATOR_DIV,
OPERATOR_SIN,
OPERATOR_COS,
NUMBER_VALUE,
UNARY_MINUS
}
通过调用 Parser.parse(String)
后可以得到一个类型为 ASTNode
的一个二叉树,通过后序遍历把左右孩子节点的值进行当前结点所表示的操作符表示的操作运算并返回到上一级。通过一次递归后即可得出结果。
Evaluator.java
// Evaluator.java
package expr_parser;
public class Evaluator {
public static double evaluate(ASTNode ast) throws EvaluatorException {
if (null == ast)
throw new EvaluatorException("Incorrect abstract syntax tree");
switch (ast.getType()) {
case NUMBER_VALUE:
return ast.getValue();
case UNARY_MINUS:
return -Evaluator.evaluate(ast.getLeftChild());
case OPERATOR_SIN:
double temp = Evaluator.evaluate(ast.getLeftChild());
return Math.sin(temp);
case OPERATOR_COS:
return Math.cos(Evaluator.evaluate(ast.getLeftChild()));
case UNDEFINED:
throw new EvaluatorException("Incorrect abstract syntax tree");
default:
double v1 = Evaluator.evaluate(ast.getLeftChild());
double v2 = Evaluator.evaluate(ast.getRightChild());
switch (ast.getType()) {
case OPERATOR_PLUS: return v1 + v2;
case OPERATOR_MINUS: return v2 - v1;
case OPERATOR_MUL: return v1 * v2;
case OPERATOR_DIV: return v2 / v1;
}
}
throw new EvaluatorException("Incorrect abstract syntax tree");
}
}
EvaluatorException.java
// EvaluatorException.java
package expr_parser;
public class EvaluatorException extends Exception {
public EvaluatorException(String err_msg) {
super(err_msg);
}
}