前面说过,一个能够识别ABNF文法并且自动构造ABNF文法解析器的生成器(parser generator),它首先要能够识别ABNF文法,即把ABNF读入内存并结构化之后,才能进行后续的生成解析器的步骤。我把这个读入ABNF文法的模块称为AbnfParser类。下面先来看看这个类的基本结构:
/* This file is one of the component a Context-free Grammar Parser Generator, which accept a piece of text as the input, and generates a parser for the inputted context-free grammar. Copyright (C) 2013, Junbiao Pan (Email: [email protected]) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. */ import java.io.InputStream; import java.io.BufferedInputStream; import java.io.IOException; import java.util.ArrayList; import java.util.HashSet; import java.util.List; import java.util.Set; import java.util.Map; import java.util.HashMap; // ABNF文法解析器 public class AbnfParser { // ABNF文法解析器的输入流,这是一个支持peek和read操作的输入流, // 支持peek是因为这是一个预测解析器,即需要向前看1~2个字符, // 以决定下一步所需要匹配的ABNF文法产生式(或元素)。 protected PeekableInputStream is; public PeekableInputStream getInputStream() { return is; } protected String prefix; // match函数用来判断两个字符是否相同 // (例如判断输入的字符是否与期望的字符相同) public boolean match(int value, int expected) { return value == expected; } // match函数用来判断字符是否在某个范围之内 // (例如判断输入的字符是否是字母、或数字字符等) public boolean match(int value, int lower, int upper) { return value >= lower && value <= upper; } // match函数用来判断字符是否与某个字符相同 // (忽略大小写) public boolean match(int value, char expected) { return Character.toUpperCase(value) == Character.toUpperCase(expected); } // match函数用来判断字符是否与某些字符相同 // (例如判断输入的字符是否为'-','+',或'%') public boolean match(int value, int[] expected) { for(int index = 0; index < expected.length; index ++) { if (value == expected[index]) return true; } return false; } // 如果不匹配则抛出MatchException异常 // MatchException中包含了产生匹配异常的符号输入流中的行列位置,以及期待的字符。 public void assertMatch(int value, int expected) throws MatchException { if (!match(value, expected)) { throw(new MatchException("'" + (char)expected +"' [" + String.format("%02X", expected) + "]", value, is.getPos(), is.getLine())); } } // 如果字符不在某个范围之内则抛出MatchException异常 // MatchException中包含了产生匹配异常的符号输入流中的行列位置,以及期待的字符。 public void assertMatch(int value, int lower, int upper) throws MatchException { if (!match(value, lower, upper)) { throw(new MatchException( "'" + (char)lower +"'~'" + (char)upper + "' " + "[" + String.format("%02X", lower) + "~" + String.format("%02X", lower) + "]", value, is.getPos(), is.getLine())); } } // 如果不匹配(忽略大小写)则抛出MatchException异常 // MatchException中包含了产生匹配异常的符号输入流中的行列位置,以及期待的字符。 public void assertMatch(int value, char expected) throws IOException, MatchException { if (!match(value, expected)) { throw(new MatchException("'" + expected +"' [" + String.format("%02X", (byte)expected) + "]", value, is.getPos(), is.getLine())); } } ... // 调用parse函数开始对输入源进行解析,返回输入源中定义的ABNF规则列表 public List<rule> parse() throws IOException, MatchException, CollisionException { return rulelist(); } // 构造函数,设置规则名的前缀和输入源,并将普通的输入源转化为支持peek操作的输入源。 public AbnfParser(String prefix, InputStream inputStream) { this.prefix = prefix; this.is = new PeekableInputStream(inputStream); } // 其他内容暂时忽略 }
这样,当我们需要对输入的ABNF文法进行解析时,只需要这样调用就可以了:
AbnfParser abnfParser = new AbnfParser(prefix, System.in); List<Rule> ruleList = abnfParser.parse();
PeekableInputStream类是从网上copy下来的,有兴趣的同学请点击 Peekable InputStream查看,我在此基础上增加了一点与位置有关的函数,用于出现匹配错误的时候提示出错的位置,其他内容则没有动过。下面再来看看PeekableInputStream类的定义。
/* This file is one of the component a Context-free Grammar Parser Generator, which accept a piece of text as the input, and generates a parser for the inputted context-free grammar. Copyright (C) 2013, Junbiao Pan (Email: [email protected]) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. */ import java.io.*; /** * The Heaton Research Spider Copyright 2007 by Heaton * Research, Inc. * * HTTP Programming Recipes for Java ISBN: 0-9773206-6-9 * http://www.heatonresearch.com/articles/series/16/ * * PeekableInputStream: This is a special input stream that * allows the program to peek one or more characters ahead * in the file. * * This class is released under the: * GNU Lesser General Public License (LGPL) * http://www.gnu.org/copyleft/lesser.html * * @author Jeff Heaton * @version 1.1 */ public class PeekableInputStream extends InputStream { protected int pos = 1; public int getPos() { return pos; } protected int line = 1; public int getLine() { return line; } // 当读入回车字符时,将位置pos设置为1 // 当读入换行字符时,将行号line加1. protected void updatePosition(int value) { if (value == (byte)0x0D) pos = 1; else if (value == (byte)0x0A) line ++; else pos ++; } /** * The underlying stream. */ private InputStream stream; /** * Bytes that have been peeked at. */ private byte peekBytes[]; /** * How many bytes have been peeked at. */ private int peekLength; /** * The constructor accepts an InputStream to setup the * object. * * @param is * The InputStream to parse. */ public PeekableInputStream(InputStream is) { this.stream = is; this.peekBytes = new byte[10]; this.peekLength = 0; } /** * Peek at the next character from the stream. * * @return The next character. * @throws IOException * If an I/O exception occurs. */ public int peek() throws IOException { return peek(0); } /** * Peek at a specified depth. * * @param depth * The depth to check. * @return The character peeked at. * @throws IOException * If an I/O exception occurs. */ public int peek(int depth) throws IOException { // does the size of the peek buffer need to be extended? if (this.peekBytes.length <= depth) { byte temp[] = new byte[depth + 10]; for (int i = 0; i < this.peekBytes.length; i++) { temp[i] = this.peekBytes[i]; } this.peekBytes = temp; } // does more data need to be read? if (depth >= this.peekLength) { int offset = this.peekLength; int length = (depth - this.peekLength) + 1; int lengthRead = this.stream.read(this.peekBytes, offset, length); if (lengthRead == -1) { return -1; } this.peekLength = depth + 1; } return this.peekBytes[depth]; } /* * Read a single byte from the stream. @throws IOException * If an I/O exception occurs. @return The character that * was read from the stream. */ @Override public int read() throws IOException { if (this.peekLength == 0) { int value = this.stream.read(); updatePosition(value); return value; } int result = this.peekBytes[0]; this.updatePosition(result); this.peekLength--; for (int i = 0; i < this.peekLength; i++) { this.peekBytes[i] = this.peekBytes[i + 1]; } return result; } }
下面再看看两个解析文法时可能会抛出的异常。
首先是MatchException匹配异常:
package org.sip4x.abnf; /* This file is one of the component a Context-free Grammar Parser Generator, which accept a piece of text as the input, and generates a parser for the inputted context-free grammar. Copyright (C) 2013, Junbiao Pan (Email: [email protected]) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. */ public class MatchException extends Exception { private int actual; private int pos; private int line; private String expected; public MatchException(String expected, int actual, int pos, int line) { this.expected = expected; this.actual = actual; this.pos = pos; this.line = line; } public MatchException(String expected, char value, int pos, int line) { this.expected = expected; this.actual = (int)value; this.pos = pos; this.line = line; } public String toString() { return "Input stream does not match with '" + (char)actual +"' [" + String.format("%02X", actual) + "] at position " + pos + ":" + line + ". Expected value is " + expected; } }
没什么特别的,再来看冲突异常:
/* This file is one of the component a Context-free Grammar Parser Generator, which accept a piece of text as the input, and generates a parser for the inputted context-free grammar. Copyright (C) 2013, Junbiao Pan (Email: [email protected]) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. */ public class CollisionException extends Exception { private String collision; private int pos; private int line; public CollisionException(String collision, int pos, int line) { this.collision = collision; this.pos = pos; this.line = line; } public String toString() { return "Collision happened at position " + pos + ":" + line + ". Description: " + collision; } }
冲突异常用于在输入流中发现两条同名的规则,而且不是递增性定义的时候抛出异常,即规则不能重名,除非是使用“=/”在已有规则的基础上增加定义。
到这里,我们的ABNF语法分析器的基本架构已经出来了,下一篇我们要插入一些单元测试的内容,然后再开始写具体的ABNF语法解析代码。