词法分析生成器 之 lexertl 【4】添加文件解析行号功能

目标: 将文件名和行号信息存在Token中以便词法分析和语法分析时输出更详细的信息。这在调试你的分析器时将会有非常大帮助。

做法: 记得之前 Boost.Spirit 有一个 file_iterator类和position_iterator类,仔细看了一下,确实满足 lexertl match_results类 对迭代器的要求。 好,那就写几行代码验证一下吧。

 

 

#include "lexertl/generator.hpp" #include "lexertl/lookup.hpp" #include "lexertl/rules.hpp" #include "lexertl/state_machine.hpp" #include <boost/spirit/home/classic/iterator/file_iterator.hpp> #include <boost/spirit/home/classic/iterator/position_iterator.hpp> #include <iostream> #include <string> //直接使用boost.spirit中定义的file_iterator和 position_iterator namespace SPIRIT_CLASSIC = boost::spirit::classic; typedef SPIRIT_CLASSIC::file_iterator<char> file_iterator_type; typedef SPIRIT_CLASSIC::position_iterator2<file_iterator_type> position_iterator_type; int main() { try { lexertl::rules rules_; lexertl::state_machine state_machine_; rules_.add ("[0-9]+", 1); rules_.add ("[a-zA-Z]+", 2); lexertl::generator::build (rules_, state_machine_); //将文件名作为参数传入到file_iterator中。 file_iterator_type iterFile("test.txt"); if ( !iterFile ) { std::cout<<"Open file test.txt fail!"<<std::endl; return (-1); } //lexertl.lookup 要求输入两个迭代器参数,作为输入的起始和结束。 //我们构造两个迭代器不仅包括输入文件内容还包含了行列信息。 position_iterator_type iterBegin( iterFile, iterFile.make_end() ); //迭代器起始位置 position_iterator_type iterEnd; //迭代器结束位置 //剩下的事就交给 lexertl 处理吧 lexertl::match_results<position_iterator_type> results_ (iterBegin, iterEnd); std::cout<<"Start parse file test.txt"<<std::endl; do { //输出token信息 lexertl::lookup(state_machine_, results_); SPIRIT_CLASSIC::file_position posStart = results_.start.get_position(); SPIRIT_CLASSIC::file_position posEnd = results_.end.get_position(); std::cout <<"Token Id : "<<results_.id<<std::endl <<"Token String : "<<std::string (results_.start, results_.end)<<std::endl <<"Token Position : ("<<posStart.line<<"."<<posStart.column<<" -> "<<posEnd.line<<"."<<posEnd.column<<")/n" <<std::endl; } while (results_.id != rules_.eoi()); } catch(const std::exception & e) { std::cout<<"<Error> Exception: "<<e.what()<<std::endl; } return 0; }

 

test.txt 文件 内容为:   abcd1234TTTT

运行结果如下:

可以看到,已经正确地解析出了3个token,并且输出起始行列号与介绍行列号信息。

 

lexert 作者 Ben Hanson 似乎正准备自己为lexetl定义一个file_iterator 用于取代Boost.Spirit中  file_iterator。 这里我将Ben Hanson的Blog拷贝了过来。 如果真的另外开发一个file_iterator,我们期待在编译速度以及运行性能上能够超过Boost.Spirit中file_iterator……

 

 

The lexertl Blog

29.09.2009

As I have recently started a revamp of lexertl I have decided to start a blog to keep everybody up to date. As this version is not feature complete yet, I have added a separate zip file which you can find here.

So far I have implemented the following improvements:

  • Auto compression of wchar_t based state machines (overridable).
  • A generic lookup mechanism based around iterators.
  • Added the lexertl::skip token constant.
  • Removed regex macro length limitation.
  • Made the BOL (^) link a singleton (as it can only occur at the beginning of a token).
  • debug::dump() now compresses ranges.

This dramatically reduces the list of (easier) features I wanted to add and just leaves the following for the immediate future:

  • file_iterator (this will also replace the one in Boost.Spirit)
  • Turn size_t into a templated type for state machine creation.
  • Re-write the code generator.
  • Redo serialisation.

你可能感兴趣的:(exception,File,iterator,token,generator,compression)