基于Predictive Parsing的ABNF语法分析器(二)——ABNF语法元素的类定义

下面根据ABNF的语法定义,逐条来定义ANBF语法元素类:

(一)首先来看rulelist:

rulelist       =  1*( rule / (*c-wsp c-nl) )
rulelist(规则列表)是ABNF语法的最顶层的符号,也就说一份符合ABNF规定的文法,它就是一个rulelist。rulelist至少由一个rule(规则)组成,在Java语言中,我们直接使用List来定义即可,例如:
List<Rule> ruleList;

(二)rule的定义

rule代表一条上下文无关文法规则,rule的定义是:

rule           =  rulename defined-as elements c-nl
                               ; continues if next line starts
                               ;  with white space
即一条规则是由rulename(规则名称)、defined-as(定义为符号)、elements(元素)、c-nl(换行)组成的,例如Rule1="this is a rule"就是一条规则。

对于Rule类来说,rulename、defined-as、elements是有具体内容的,我们将其定义为Rule类的成员即可,而c-nl只是分隔符,其具体内容(例如是一个空格还是两个空格),对于Rule没有影响,因此我们把Rule类定义为:

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class Rule {
	private RuleName ruleName;
	public RuleName getRuleName() { return ruleName; }

    private String definedAs;
    public String getDefinedAs() { return definedAs; }
    public void setDefinedAs(String definedAs) { this.definedAs = definedAs; }
	
	private Elements elements;
	public Elements getElements() { return elements; }
	
	public Rule(RuleName ruleName, String definedAs, Elements elements) {
		this.ruleName = ruleName;
        this.definedAs = definedAs;
		this.elements = elements;
	}
}
之所以需要保存defined-as信息,是因为rule的定义分普通定义和递增定义两种,在Rule类中需要保留这个信息。

        defined-as     =  *c-wsp ("=" / "=/") *c-wsp
                               ; basic rules definition and
                               ;  incremental alternatives


(三)rulename的定义

不废话了,直接上代码:

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class RuleName implements Element {
	private String prefix;
	private String rulename;
	public String toString() { return prefix + rulename; }
	
    public RuleName(String rulename) {
        this.prefix = "";
        this.rulename = rulename;
    }

	public RuleName(String prefix, String rulename) {
		this.prefix = prefix;
		this.rulename = rulename;
	}
}
请原谅我版权声明的行数比代码的行数还多,等这些代码慢慢羽翼丰满就没那么亮瞎眼啦,嘿嘿。这个RuleName定义了一个prefix即前缀,为什么呢?因为今后我们处理ABNF文法时,会有许多依赖关系,例如一份SIP(RFC3261)协议,需要依赖RFC1035、RFC2234、RFC2396、RFC2616、RFC2806等多份规范,前缀就像命名空间一样为每一份规范内的规则名加上规范名称的限定。

(四)elements的定义

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */
// elements       =  alternation *c-wsp
public class Elements {
	private Alternation alternation;
	public Alternation getAlternation() { return alternation; }
	public Elements(Alternation alternation) {
		this.alternation = alternation;
	}
}

elements其实就是alternation,我把alternation定义为elements的成员(而不是定义为子类),主要是因为elements比alternation多了后面的*c-wsp。

(五)Alternation的定义

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

//  alternation    =  concatenation
//                          *(*c-wsp "/" *c-wsp concatenation)
public class Alternation {
    private Set<Concatenation> concatenations = new HashSet<Concatenation>();
	public void addConcatenation(Concatenation concatenation) {
		concatenations.add(concatenation);
	}
	public Set<Concatenation> getConcatenations() {
		return concatenations;
	}
}
alternation可以派生为一个或多个concatenation,之间用“/”隔开。

这里把concatenation定义在集合Set而不是List中,是因为alternation并无先后顺序。

(六)Concatenation的定义

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */
// concatenation  =  repetition *(1*c-wsp repetition)
public class Concatenation {
	private List<Repetition> repetitions = new ArrayList<Repetition>();
	public void addRepetition(Repetition repetition) {
		repetitions.add(repetition);
	}
	public List<Repetition> getRepetitions() { return repetitions; }
}

一个concatenation由一个或多个repetition组成,这些repetition是有先后顺序的。

(七)Repetition的定义

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

// repetition     =  [repeat] element
public class Repetition {
	private Repeat repeat;
	private Element element;
	
	public Repetition(Repeat repeat, Element element) {
		this.repeat = repeat;
		this.element = element;
	}
	public Repetition(Element element) {
		this(null, element);
	}
}
(八)repeat和element的定义
/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

// repeat         =  1*DIGIT / (*DIGIT "*" *DIGIT)
public class Repeat { 
    private int min = 0, max = 0; 
    public int getMin() { return this.min; } 
    public int getMax() { return this.max; } 
    public Repeat(int min, int max) {this.min = min;this.max = max;}
}
repeat是由一组或两组数字组成的,当由两组数字组成时,两组数字之间有星号“*”。

再看element的定义:

//  element        =  rulename / group / option /
//                            char-val / num-val / prose-val
public interface Element  {
}
element可以派生为rulename、group、option等nonterminal,可以把它定义为一个接口(暂时我没有想到更好的方法)。

(九)group的定义

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

//  group          =  "(" *c-wsp alternation *c-wsp ")"
public class Group implements Element {
    private Alternation alternation;
    public Group(Alternation alternation) {this.alternation = alternation;}
}
一个group是由一对圆括号包含的alternation。

(十)option的定义

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

//  option         =  "[" *c-wsp alternation *c-wsp "]"

public class Option implements Element {
    private Alternation alternation; 
    public Alternation getAlternation() { return alternation; } 
    public Option(Alternation alternation) {this.alternation = alternation;}
}
一个option是由一对方括号包含的alternation。

(十一)num_val的定义

num_val包括二进制、十进制和十六进制的形式。

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */
/*
        num-val        =  "%" (bin-val / dec-val / hex-val)

        bin-val        =  "b" 1*BIT
                          [ 1*("." 1*BIT) / ("-" 1*BIT) ]
                               ; series of concatenated bit values
                               ; or single ONEOF range

        dec-val        =  "d" 1*DIGIT
                          [ 1*("." 1*DIGIT) / ("-" 1*DIGIT) ]

        hex-val        =  "x" 1*HEXDIG
                          [ 1*("." 1*HEXDIG) / ("-" 1*HEXDIG) ]
*/
public class NumVal implements Element {//, Terminal {
	private String base;
	private List<String> values = new ArrayList<String>();
	public List<String> getValues() { return values; }
	public NumVal(String base) {
		this.base = base;
	}
}

无论是二进制、十进制还是十六进制,都有列举和范围两种形式,例如%d11.22.33.44表示4个十进制的数字11、22、33、44,而%x00-ff表示十六进制从0x00到0xff之间。

这里单独定义范围类型的数值:

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class RangedNumVal implements Element {
	private String base, from ,to;
	
	public RangedNumVal(String base, String from, String to) {
		this.base = base;
		this.from = from;
		this.to = to;
	}
}

(十二)char-val和prose-val

char-val和prose-val的ABNF定义:

        char-val       =  DQUOTE *(%x20-21 / %x23-7E) DQUOTE
                               ; quoted string of SP and VCHAR
                                  without DQUOTE

        prose-val      =  "<" *(%x20-3D / %x3F-7E) ">"
                               ; bracketed string of SP and VCHAR
                                  without angles
                               ; prose description, to be used as
                                  last resort
char-val的Java定义:

/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class CharVal implements Element {
	private String value;
	public CharVal(String value) {
		this.value = value;
	}

}
prose-val的定义:
/*
    This file is one of the component a Context-free Grammar Parser Generator,
    which accept a piece of text as the input, and generates a parser
    for the inputted context-free grammar.
    Copyright (C) 2013, Junbiao Pan (Email: [email protected])

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

public class ProseVal implements Element {
	private String value;
	public ProseVal(String value) {
		this.value = value;
	}

}


其他一些ABNF语法元素,例如ALPHA之类的,因为比较简单就直接按字符串使用了,不单独定义一个类。

有了这些元素对应的类定义,下一步我们就可以开始正式写预测分析器的代码了。

上面的版权声明比较累赘,还请包涵,代码会在完善的时候逐步加上去的。

午休去 :)

本系列文章索引:基于预测的ABNF文法分析器

你可能感兴趣的:(编译原理,SIP,ABNF,上下文无关文法)