ANTLR4 词法分析器应用之利用文法动作直接在G4文件中加入处理逻辑用法(XML解析)

词法分析器是基于编译原理的应用。可以很好的解析文本和修改文本。

今天就以简单的XML文件解析来简单说明其用法。

注:适用读者,对词法分析器已入门,或有一定了解,并基本熟悉java语言

1,G4文件原版出处,直接到GitHub下载

2,在原版上修改G4文件。

/*
 [The "BSD licence"]
 Copyright (c) 2013 Terence Parr
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:
 1. Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.
 2. Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in the
    documentation and/or other materials provided with the distribution.
 3. The name of the author may not be used to endorse or promote products
    derived from this software without specific prior written permission.

 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
 IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
 NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
 THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

/** XML parser derived from ANTLR v4 ref guide book example */
parser grammar XMLParser;

options { tokenVocab=XMLLexer; }

@parser::header{
import java.util.HashMap;
import java.util.Map;
import java.util.Stack;
import org.apache.commons.lang3.StringUtils;
}

@parser::members{
Stack nodeStack = new Stack<>();

private Map> nodeValueMap = new HashMap<>();

private String createKey() {
	return StringUtils.join(nodeStack, ".");
}

private void add(String value){
	String key = createKey().toLowerCase();
	if(!nodeValueMap.containsKey(key)) {
		nodeValueMap.put(key, new ArrayList<>());
	}
	nodeValueMap.get(key).add(value);
}
}

document returns[Map> resultMap]    :   prolog? misc* element misc* {$resultMap = nodeValueMap;};

prolog      :   XMLDeclOpen attribute* SPECIAL_CLOSE ;

content     :   (chardata {add($chardata.text);})?
                ((element | reference {add($reference.text);} | CDATA {add($CDATA.text);} | PI {add($PI.text);} | COMMENT {add($COMMENT.text);}) (chardata {add($chardata.text);})?)* ;

element     :   '<' Name {nodeStack.push($Name.text);} attribute* '>' content '<' '/' Name '>' {nodeStack.pop();}
            |   '<' Name {nodeStack.push($Name.text);} attribute* '/>' {nodeStack.pop();}
            ;

reference   :   EntityRef | CharRef ;

attribute   :  Name{nodeStack.push($Name.text);}   '=' STRING {add($STRING.text == null ? $STRING.text : $STRING.text.replaceAll("^\"|\"$", "").replaceAll("^'|'$", ""));nodeStack.pop();}; // Our STRING is AttValue in spec

/** ``All text that is not markup constitutes the character data of
 *  the document.''
 */
chardata    :   TEXT | SEA_WS ;

misc        :   COMMENT | PI | SEA_WS ;

3,编译G4文件生成java类,下面请看调用词法分析器XML的代码

	public static final Map> getXmlInfo(String xmlFile, String encoding) throws IOException {
		CharStream stream = null;
		if(!FileUtil.fileExist(xmlFile)) {
			stream = CharStreams.fromStream(KnwhwCfgUtils.class.getClassLoader().getResource(xmlFile).openStream(), Charset.forName(encoding));
		}else {
			stream = CharStreams.fromFileName(xmlFile, Charset.forName(encoding));
		}
		Lexer lexer = new XMLLexer(stream);
		CommonTokenStream commonTokenStream = new CommonTokenStream(lexer);
		XMLParser parser = new XMLParser(commonTokenStream);
		return parser.document().resultMap;
	}

4,G4文件简单说明,请看图解(不够详细,如有疑问请留言)

ANTLR4 词法分析器应用之利用文法动作直接在G4文件中加入处理逻辑用法(XML解析)_第1张图片

你可能感兴趣的:(编译原理)