基于Predictive Parsing的ABNF语法分析器(3)——ABNF语法解析器的基本框架

前面说过,一个能够识别ABNF文法并且自动构造ABNF文法解析器的生成器(parser generator),它首先要能够识别ABNF文法,即把ABNF读入内存并结构化之后,才能进行后续的生成解析器的步骤。我把这个读入ABNF文法的模块称为AbnfParser类。下面先来看看这个类的基本结构:

 

/*

    This file is one of the component a Context-free Grammar Parser Generator,

    which accept a piece of text as the input, and generates a parser

    for the inputted context-free grammar.

    Copyright (C) 2013, Junbiao Pan (Email: [email protected])



    This program is free software: you can redistribute it and/or modify

    it under the terms of the GNU General Public License as published by

    the Free Software Foundation, either version 3 of the License, or

    (at your option) any later version.



    This program is distributed in the hope that it will be useful,

    but WITHOUT ANY WARRANTY; without even the implied warranty of

    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

    GNU General Public License for more details.



    You should have received a copy of the GNU General Public License

    along with this program.  If not, see <http://www.gnu.org/licenses/>.

 */



import java.io.InputStream;

import java.io.BufferedInputStream;

import java.io.IOException;

import java.util.ArrayList;

import java.util.HashSet;

import java.util.List;

import java.util.Set;

import java.util.Map;

import java.util.HashMap;



//    ABNF文法解析器

public class AbnfParser {

//    ABNF文法解析器的输入流,这是一个支持peek和read操作的输入流,

//    支持peek是因为这是一个预测解析器,即需要向前看1~2个字符,

//    以决定下一步所需要匹配的ABNF文法产生式(或元素)。

    protected PeekableInputStream is;

    public PeekableInputStream getInputStream() { return is; }

	protected String prefix;



//    match函数用来判断两个字符是否相同

//    (例如判断输入的字符是否与期望的字符相同)

    public boolean match(int value, int expected) {

        return value == expected;

    }



//    match函数用来判断字符是否在某个范围之内

//    (例如判断输入的字符是否是字母、或数字字符等)

    public boolean match(int value, int lower, int upper) {

        return value >= lower && value <= upper;

    }



//    match函数用来判断字符是否与某个字符相同

//    (忽略大小写)

    public boolean match(int value, char expected) {

        return Character.toUpperCase(value) == Character.toUpperCase(expected);

    }



//    match函数用来判断字符是否与某些字符相同

//    (例如判断输入的字符是否为'-','+',或'%')

    public boolean match(int value, int[] expected) {

        for(int index = 0; index < expected.length; index ++) {

            if (value == expected[index]) return true;

        }

        return false;

    }



//    如果不匹配则抛出MatchException异常

//    MatchException中包含了产生匹配异常的符号输入流中的行列位置,以及期待的字符。

    public void assertMatch(int value, int expected) throws MatchException {

        if (!match(value, expected)) {

            throw(new MatchException("'" + (char)expected +"' [" + String.format("%02X", expected) + "]", value, is.getPos(), is.getLine()));

        }

    }



//    如果字符不在某个范围之内则抛出MatchException异常

//    MatchException中包含了产生匹配异常的符号输入流中的行列位置,以及期待的字符。

    public void assertMatch(int value, int lower, int upper) throws MatchException {

        if (!match(value, lower, upper)) {

            throw(new MatchException(

                    "'" + (char)lower +"'~'" + (char)upper + "' " +

                            "[" + String.format("%02X", lower) + "~" + String.format("%02X", lower) + "]",

                    value, is.getPos(), is.getLine()));

        }

    }



//    如果不匹配(忽略大小写)则抛出MatchException异常

//    MatchException中包含了产生匹配异常的符号输入流中的行列位置,以及期待的字符。

    public void assertMatch(int value, char expected) throws IOException, MatchException {

        if (!match(value, expected)) {

            throw(new MatchException("'" + expected +"' [" + String.format("%02X", (byte)expected) + "]", value, is.getPos(), is.getLine()));

        }

    }

...



//        调用parse函数开始对输入源进行解析,返回输入源中定义的ABNF规则列表

	public List<rule> parse() throws IOException, MatchException, CollisionException {

		return rulelist();

	}



//    构造函数,设置规则名的前缀和输入源,并将普通的输入源转化为支持peek操作的输入源。

	public AbnfParser(String prefix, InputStream inputStream) {

		this.prefix = prefix;

        this.is = new PeekableInputStream(inputStream);

	}



//    其他内容暂时忽略



}

这样,当我们需要对输入的ABNF文法进行解析时,只需要这样调用就可以了:

 

        AbnfParser abnfParser = new AbnfParser(prefix, System.in);

        List<Rule> ruleList = abnfParser.parse();

 

 

PeekableInputStream类是从网上copy下来的,有兴趣的同学请点击 Peekable InputStream查看,我在此基础上增加了一点与位置有关的函数,用于出现匹配错误的时候提示出错的位置,其他内容则没有动过。下面再来看看PeekableInputStream类的定义。

 

/*

    This file is one of the component a Context-free Grammar Parser Generator,

    which accept a piece of text as the input, and generates a parser

    for the inputted context-free grammar.

    Copyright (C) 2013, Junbiao Pan (Email: [email protected])



    This program is free software: you can redistribute it and/or modify

    it under the terms of the GNU General Public License as published by

    the Free Software Foundation, either version 3 of the License, or

    (at your option) any later version.



    This program is distributed in the hope that it will be useful,

    but WITHOUT ANY WARRANTY; without even the implied warranty of

    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

    GNU General Public License for more details.



    You should have received a copy of the GNU General Public License

    along with this program.  If not, see <http://www.gnu.org/licenses/>.

 */



import java.io.*;



/**

 * The Heaton Research Spider Copyright 2007 by Heaton

 * Research, Inc.

 *

 * HTTP Programming Recipes for Java ISBN: 0-9773206-6-9

 * http://www.heatonresearch.com/articles/series/16/

 *

 * PeekableInputStream: This is a special input stream that

 * allows the program to peek one or more characters ahead

 * in the file.

 *

 * This class is released under the:

 * GNU Lesser General Public License (LGPL)

 * http://www.gnu.org/copyleft/lesser.html

 *

 * @author Jeff Heaton

 * @version 1.1

 */

public class PeekableInputStream extends InputStream

{

    protected int pos = 1;

    public int getPos() { return pos; }

    protected int line = 1;

    public int getLine() { return line; }



//    当读入回车字符时,将位置pos设置为1

//    当读入换行字符时,将行号line加1.

    protected void updatePosition(int value) {

        if (value == (byte)0x0D) pos = 1;

        else if (value == (byte)0x0A) line ++;

        else pos ++;



    }



    /**

     * The underlying stream.

     */

    private InputStream stream;



    /**

     * Bytes that have been peeked at.

     */

    private byte peekBytes[];



    /**

     * How many bytes have been peeked at.

     */

    private int peekLength;



    /**

     * The constructor accepts an InputStream to setup the

     * object.

     *

     * @param is

     *          The InputStream to parse.

     */

    public PeekableInputStream(InputStream is)

    {

        this.stream = is;

        this.peekBytes = new byte[10];

        this.peekLength = 0;

    }



    /**

     * Peek at the next character from the stream.

     *

     * @return The next character.

     * @throws IOException

     *           If an I/O exception occurs.

     */

    public int peek() throws IOException

    {

        return peek(0);

    }



    /**

     * Peek at a specified depth.

     *

     * @param depth

     *          The depth to check.

     * @return The character peeked at.

     * @throws IOException

     *           If an I/O exception occurs.

     */

    public int peek(int depth) throws IOException

    {

        // does the size of the peek buffer need to be extended?

        if (this.peekBytes.length <= depth)

        {

            byte temp[] = new byte[depth + 10];

            for (int i = 0; i < this.peekBytes.length; i++)

            {

                temp[i] = this.peekBytes[i];

            }

            this.peekBytes = temp;

        }



        // does more data need to be read?

        if (depth >= this.peekLength)

        {

            int offset = this.peekLength;

            int length = (depth - this.peekLength) + 1;

            int lengthRead = this.stream.read(this.peekBytes, offset, length);



            if (lengthRead == -1)

            {

                return -1;

            }



            this.peekLength = depth + 1;

        }



        return this.peekBytes[depth];

    }



    /*

     * Read a single byte from the stream. @throws IOException

     * If an I/O exception occurs. @return The character that

     * was read from the stream.

     */

    @Override

    public int read() throws IOException

    {

        if (this.peekLength == 0)

        {

            int value = this.stream.read();

            updatePosition(value);

            return value;

        }



        int result = this.peekBytes[0];

        this.updatePosition(result);

        this.peekLength--;

        for (int i = 0; i < this.peekLength; i++)

        {

            this.peekBytes[i] = this.peekBytes[i + 1];

        }



        return result;

    }



}


下面再看看两个解析文法时可能会抛出的异常。

首先是MatchException匹配异常:

 

package org.sip4x.abnf;



/*

    This file is one of the component a Context-free Grammar Parser Generator,

    which accept a piece of text as the input, and generates a parser

    for the inputted context-free grammar.

    Copyright (C) 2013, Junbiao Pan (Email: [email protected])



    This program is free software: you can redistribute it and/or modify

    it under the terms of the GNU General Public License as published by

    the Free Software Foundation, either version 3 of the License, or

    (at your option) any later version.



    This program is distributed in the hope that it will be useful,

    but WITHOUT ANY WARRANTY; without even the implied warranty of

    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

    GNU General Public License for more details.



    You should have received a copy of the GNU General Public License

    along with this program.  If not, see <http://www.gnu.org/licenses/>.

 */



public class MatchException extends Exception {

    private int actual;

    private int pos;

    private int line;

    private String expected;

    public MatchException(String expected, int actual, int pos, int line) {

        this.expected = expected;

        this.actual = actual;

        this.pos = pos;

        this.line = line;

    }

    public MatchException(String expected, char value, int pos, int line) {

        this.expected = expected;

        this.actual = (int)value;

        this.pos = pos;

        this.line = line;

    }



    public String toString() {

        return "Input stream does not match with '" + (char)actual +"' [" + String.format("%02X", actual) + "] at position " + pos + ":" + line + ". Expected value is " + expected;

    }



}

没什么特别的,再来看冲突异常:

 

 

/*

    This file is one of the component a Context-free Grammar Parser Generator,

    which accept a piece of text as the input, and generates a parser

    for the inputted context-free grammar.

    Copyright (C) 2013, Junbiao Pan (Email: [email protected])



    This program is free software: you can redistribute it and/or modify

    it under the terms of the GNU General Public License as published by

    the Free Software Foundation, either version 3 of the License, or

    (at your option) any later version.



    This program is distributed in the hope that it will be useful,

    but WITHOUT ANY WARRANTY; without even the implied warranty of

    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

    GNU General Public License for more details.



    You should have received a copy of the GNU General Public License

    along with this program.  If not, see <http://www.gnu.org/licenses/>.

 */



public class CollisionException extends Exception {

    private String collision;

    private int pos;

    private int line;

    public CollisionException(String collision, int pos, int line) {

        this.collision = collision;

        this.pos = pos;

        this.line = line;

    }



    public String toString() {

        return "Collision happened at position " + pos + ":" + line + ". Description: " + collision;

    }



}

冲突异常用于在输入流中发现两条同名的规则,而且不是递增性定义的时候抛出异常,即规则不能重名,除非是使用“=/”在已有规则的基础上增加定义。

到这里,我们的ABNF语法分析器的基本架构已经出来了,下一篇我们要插入一些单元测试的内容,然后再开始写具体的ABNF语法解析代码。

你可能感兴趣的:(pre)