基于Predictive Parsing的ABNF语法分析器(三)——ABNF语法解析器的基本框架

前面说过,一个能够识别ABNF文法并且自动构造ABNF文法解析器的生成器(parser generator),它首先要能够识别ABNF文法,即把ABNF读入内存并结构化之后,才能进行后续的生成解析器的步骤。我把这个读入ABNF文法的模块称为AbnfParser类。下面先来看看这个类的基本结构:

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

import java.io.InputStream;
import java.io.BufferedInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.Map;
import java.util.HashMap;

//    ABNF文法解析器
public class AbnfParser {
//    ABNF文法解析器的输入流,这是一个支持peek和read操作的输入流,
//    支持peek是因为这是一个预测解析器,即需要向前看1~2个字符,
//    以决定下一步所需要匹配的ABNF文法产生式(或元素)。
    protected PeekableInputStream is;
    public PeekableInputStream getInputStream() { return is; }
	protected String prefix;

//    match函数用来判断两个字符是否相同
//    (例如判断输入的字符是否与期望的字符相同)
    public boolean match(int value, int expected) {
        return value == expected;
    }

//    match函数用来判断字符是否在某个范围之内
//    (例如判断输入的字符是否是字母、或数字字符等)
    public boolean match(int value, int lower, int upper) {
        return value >= lower && value <= upper;
    }

//    match函数用来判断字符是否与某个字符相同
//    (忽略大小写)
    public boolean match(int value, char expected) {
        return Character.toUpperCase(value) == Character.toUpperCase(expected);
    }

//    match函数用来判断字符是否与某些字符相同
//    (例如判断输入的字符是否为'-','+',或'%')
    public boolean match(int value, int[] expected) {
        for(int index = 0; index < expected.length; index ++) {
            if (value == expected[index]) return true;
        }
        return false;
    }

//    如果不匹配则抛出MatchException异常
//    MatchException中包含了产生匹配异常的符号输入流中的行列位置,以及期待的字符。
    public void assertMatch(int value, int expected) throws MatchException {
        if (!match(value, expected)) {
            throw(new MatchException("'" + (char)expected +"' [" + String.format("%02X", expected) + "]", value, is.getPos(), is.getLine()));
        }
    }

//    如果字符不在某个范围之内则抛出MatchException异常
//    MatchException中包含了产生匹配异常的符号输入流中的行列位置,以及期待的字符。
    public void assertMatch(int value, int lower, int upper) throws MatchException {
        if (!match(value, lower, upper)) {
            throw(new MatchException(
                    "'" + (char)lower +"'~'" + (char)upper + "' " +
                            "[" + String.format("%02X", lower) + "~" + String.format("%02X", lower) + "]",
                    value, is.getPos(), is.getLine()));
        }
    }

//    如果不匹配(忽略大小写)则抛出MatchException异常
//    MatchException中包含了产生匹配异常的符号输入流中的行列位置,以及期待的字符。
    public void assertMatch(int value, char expected) throws IOException, MatchException {
        if (!match(value, expected)) {
            throw(new MatchException("'" + expected +"' [" + String.format("%02X", (byte)expected) + "]", value, is.getPos(), is.getLine()));
        }
    }
...

//        调用parse函数开始对输入源进行解析,返回输入源中定义的ABNF规则列表
	public List<rule> parse() throws IOException, MatchException, CollisionException {
		return rulelist();
	}

//    构造函数,设置规则名的前缀和输入源,并将普通的输入源转化为支持peek操作的输入源。
	public AbnfParser(String prefix, InputStream inputStream) {
		this.prefix = prefix;
        this.is = new PeekableInputStream(inputStream);
	}

//    其他内容暂时忽略

}

这样,当我们需要对输入的ABNF文法进行解析时,只需要这样调用就可以了:

        AbnfParser abnfParser = new AbnfParser(prefix, System.in);
        List<Rule> ruleList = abnfParser.parse();

PeekableInputStream类是从网上copy下来的,有兴趣的同学请点击 Peekable InputStream查看,我在此基础上增加了一点与位置有关的函数,用于出现匹配错误的时候提示出错的位置,其他内容则没有动过。下面再来看看PeekableInputStream类的定义。

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

import java.io.*;

/**
 * The Heaton Research Spider Copyright 2007 by Heaton
 * Research, Inc.
 *
 * HTTP Programming Recipes for Java ISBN: 0-9773206-6-9
 * http://www.heatonresearch.com/articles/series/16/
 *
 * PeekableInputStream: This is a special input stream that
 * allows the program to peek one or more characters ahead
 * in the file.
 *
 * This class is released under the:
 * GNU Lesser General Public License (LGPL)
 * http://www.gnu.org/copyleft/lesser.html
 *
 * @author Jeff Heaton
 * @version 1.1
 */
public class PeekableInputStream extends InputStream
{
    protected int pos = 1;
    public int getPos() { return pos; }
    protected int line = 1;
    public int getLine() { return line; }

//    当读入回车字符时,将位置pos设置为1
//    当读入换行字符时,将行号line加1.
    protected void updatePosition(int value) {
        if (value == (byte)0x0D) pos = 1;
        else if (value == (byte)0x0A) line ++;
        else pos ++;

    }

    /**
     * The underlying stream.
     */
    private InputStream stream;

    /**
     * Bytes that have been peeked at.
     */
    private byte peekBytes[];

    /**
     * How many bytes have been peeked at.
     */
    private int peekLength;

    /**
     * The constructor accepts an InputStream to setup the
     * object.
     *
     * @param is
     *          The InputStream to parse.
     */
    public PeekableInputStream(InputStream is)
    {
        this.stream = is;
        this.peekBytes = new byte[10];
        this.peekLength = 0;
    }

    /**
     * Peek at the next character from the stream.
     *
     * @return The next character.
     * @throws IOException
     *           If an I/O exception occurs.
     */
    public int peek() throws IOException
    {
        return peek(0);
    }

    /**
     * Peek at a specified depth.
     *
     * @param depth
     *          The depth to check.
     * @return The character peeked at.
     * @throws IOException
     *           If an I/O exception occurs.
     */
    public int peek(int depth) throws IOException
    {
        // does the size of the peek buffer need to be extended?
        if (this.peekBytes.length <= depth)
        {
            byte temp[] = new byte[depth + 10];
            for (int i = 0; i < this.peekBytes.length; i++)
            {
                temp[i] = this.peekBytes[i];
            }
            this.peekBytes = temp;
        }

        // does more data need to be read?
        if (depth >= this.peekLength)
        {
            int offset = this.peekLength;
            int length = (depth - this.peekLength) + 1;
            int lengthRead = this.stream.read(this.peekBytes, offset, length);

            if (lengthRead == -1)
            {
                return -1;
            }

            this.peekLength = depth + 1;
        }

        return this.peekBytes[depth];
    }

    /*
     * Read a single byte from the stream. @throws IOException
     * If an I/O exception occurs. @return The character that
     * was read from the stream.
     */
    @Override
    public int read() throws IOException
    {
        if (this.peekLength == 0)
        {
            int value = this.stream.read();
            updatePosition(value);
            return value;
        }

        int result = this.peekBytes[0];
        this.updatePosition(result);
        this.peekLength--;
        for (int i = 0; i < this.peekLength; i++)
        {
            this.peekBytes[i] = this.peekBytes[i + 1];
        }

        return result;
    }

}


下面再看看两个解析文法时可能会抛出的异常。

首先是MatchException匹配异常:

package org.sip4x.abnf;

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class MatchException extends Exception {
    private int actual;
    private int pos;
    private int line;
    private String expected;
    public MatchException(String expected, int actual, int pos, int line) {
        this.expected = expected;
        this.actual = actual;
        this.pos = pos;
        this.line = line;
    }
    public MatchException(String expected, char value, int pos, int line) {
        this.expected = expected;
        this.actual = (int)value;
        this.pos = pos;
        this.line = line;
    }

    public String toString() {
        return "Input stream does not match with '" + (char)actual +"' [" + String.format("%02X", actual) + "] at position " + pos + ":" + line + ". Expected value is " + expected;
    }

}
没什么特别的,再来看冲突异常:

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class CollisionException extends Exception {
    private String collision;
    private int pos;
    private int line;
    public CollisionException(String collision, int pos, int line) {
        this.collision = collision;
        this.pos = pos;
        this.line = line;
    }

    public String toString() {
        return "Collision happened at position " + pos + ":" + line + ". Description: " + collision;
    }

}
冲突异常用于在输入流中发现两条同名的规则,而且不是递增性定义的时候抛出异常,即规则不能重名,除非是使用“=/”在已有规则的基础上增加定义。

到这里,我们的ABNF语法分析器的基本架构已经出来了,下一篇我们要插入一些单元测试的内容,然后再开始写具体的ABNF语法解析代码。

本系列文章索引:基于预测的ABNF文法分析器

你可能感兴趣的:(编译原理,SIP,ABNF,上下文无关文法)