写了一个下午,实现了表级别的sql解析。但后来知道taobao那边已经有了完备的字段级别的sql解析工具,所以也就没必要在写下去了。不过还是要将表级别的sql解析介绍介绍,毕竟也写了几百行代码。
先把最后的解析结果贴出来:
hive> ?select * from sunwg;
0 TOK_QUERY sunwg QueryFromClause
Time taken: 0.0010 seconds
hive> ?create table sunwg (id int);
0 TOK_CREATETABLE sunwg CreateTableClause
Time taken: 0.0080 seconds
hive> ?create table sunwg like sunwg00;
0 TOK_CREATETABLE sunwg CreateTableClause
1 TOK_CREATETABLE sunwg00 CreateTableLikeClause
Time taken: 0.0020 seconds
hive> ?insert overwrite table sunwg select * from sunwg01;
0 TOK_QUERY sunwg01 QueryFromClause
1 TOK_QUERY sunwg QueryInsertClause
Time taken: 0.0050 seconds
hive> ?from sunwg
> insert overwrite table sunwg01 select *
> insert overwrite table sunwg02 select *;
0 TOK_QUERY sunwg QueryFromClause
1 TOK_QUERY sunwg01 QueryInsertClause
2 TOK_QUERY sunwg02 QueryInsertClause
Time taken: 0.0020 seconds
hive> ?select * from sunwg01 join sunwg02 join sunwg03;
0 TOK_QUERY sunwg01 QueryFromClause
1 TOK_QUERY sunwg02 QueryFromClause
2 TOK_QUERY sunwg03 QueryFromClause
Time taken: 0.0010 seconds
hive> ?select * from (select * from sunwg) r1;
0 TOK_QUERY sunwg QueryFromClause
Time taken: 0.0020 seconds
hive> ?select * from (select 1 from sunwg union all select 2 from sunwg01 union all select 3 from sunwg02) r1;
0 TOK_QUERY sunwg QueryFromClause
1 TOK_QUERY sunwg01 QueryFromClause
2 TOK_QUERY sunwg02 QueryFromClause
Time taken: 0.193 seconds
hive> ?select * from (select 1 from sunwg union all select 2 from sunwg01 union all select 3 from sunwg02) r1 join sunwg04;
0 TOK_QUERY sunwg QueryFromClause
1 TOK_QUERY sunwg01 QueryFromClause
2 TOK_QUERY sunwg02 QueryFromClause
3 TOK_QUERY sunwg04 QueryFromClause
实现的思路:
1,通过“?”标识该sql仅仅进行解析,而不真正的执行。
2,先对sql进行编译,而不执行,编译的结果传递给sql解析类HiveParseResult.java
3,在HiveParseResult中按照sql的类型进行解析
4,HiveParseResult的解析结果以string的格式传递给hive前台
代码比较长,就贴出具有代表性的吧:
switch (tree.getToken().getType()) {
case HiveParser.TOK_DROPTABLE:
ParseDropTable(tree);
break;
case HiveParser.TOK_DROPVIEW:
ParseDropView(tree);
break;
case HiveParser.TOK_CREATETABLE:
ParseCreateTable(tree);
break;
case HiveParser.TOK_ALTERTABLE_ADDPARTS:
ParseAlterTableAddParts(tree);
break;
case HiveParser.TOK_ALTERTABLE_DROPPARTS:
ParseAlterTableDropParts(tree);
break;
case HiveParser.TOK_ALTERTABLE_RENAME:
ParseAlterTableRename(tree);
break;
case HiveParser.TOK_ALTERTABLE_RENAMECOL:
ParseAlterTableRenameCol(tree);
break;
case HiveParser.TOK_ALTERTABLE_ADDCOLS:
ParseAlterTableAddCols(tree);
break;
case HiveParser.TOK_ALTERTABLE_REPLACECOLS:
ParseAlterTableReplaceCols(tree);
break;
case HiveParser.TOK_ALTERTABLE_PROPERTIES:
ParseAlterTableProperties(tree);
break;
case HiveParser.TOK_ALTERTABLE_SERIALIZER:
ParseAlterTableSerializer(tree);
break;
case HiveParser.TOK_ALTERTABLE_SERDEPROPERTIES:
ParseAlterTableSerdeProperties(tree);
break;
case HiveParser.TOK_ALTERTABLE_FILEFORMAT:
ParseAlterTableFileFormat(tree);
break;
case HiveParser.TOK_ALTERTABLE_CLUSTER_SORT:
ParseAlterTableClusterSort(tree);
break;
case HiveParser.TOK_ALTERTABLE_TOUCH:
ParseAlterTableTouch(tree);
break;
case HiveParser.TOK_ALTERVIEW_PROPERTIES:
ParseAlterViewProperties(tree);
break;
case HiveParser.TOK_QUERY:
ParseQuery(tree);
break;
case HiveParser.TOK_CREATEVIEW:
ParseCreateView(tree);
break;
}
根据不同的SQL类型来调用不同的解析方法。
public void ParseQuery(ASTNode ASTNodeParseQuery) {
int childcount = ASTNodeParseQuery.getChildCount();
for ( int childpos = 0; childpos < childcount; ++childpos) {
ASTNode ASTNodetmp01 = (ASTNode)ASTNodeParseQuery.getChild(childpos);
switch (ASTNodetmp01.getToken().getType()) {
case HiveParser.TOK_FROM:
ASTNode ASTNodetmp02 = (ASTNode)ASTNodetmp01.getChild(0);
ParseQueryFrom(tree,ASTNodetmp02);
break;
case HiveParser.TOK_INSERT:
ParseQueryInsert(tree,ASTNodetmp01);
break;
}
}
}
对于QUERY的解析,主要分析的是TOK_FROM和TOK_INSERT的部分。
public void SubParseQuery(ASTNode ASTNodeParseQuery, ASTNode ASTNodeParseSubQuery) {
int childcount = 0;
int childpos = 0;
switch (ASTNodeParseSubQuery.getToken().getType()) {
case HiveParser.TOK_QUERY:
childcount = ASTNodeParseSubQuery.getChildCount();
for ( childpos = 0; childpos < childcount; ++childpos) {
ASTNode ASTNodetmp01 = (ASTNode)ASTNodeParseSubQuery.getChild(childpos);
switch (ASTNodetmp01.getToken().getType()) {
case HiveParser.TOK_FROM:
ASTNode ASTNodetmp02 = (ASTNode)ASTNodetmp01.getChild(0);
ParseQueryFrom(tree,ASTNodetmp02);
break;
}
}
break;
case HiveParser.TOK_UNION:
childcount = ASTNodeParseSubQuery.getChildCount();
for ( childpos = 0; childpos < childcount; ++childpos) {
ASTNode ASTNodetmp02 = (ASTNode)ASTNodeParseSubQuery.getChild(childpos);
SubParseQuery(ASTNodeParseQuery,ASTNodetmp02);
}
break;
}
}
对于子查询的分析,主要分析TOK_FROM,TOK_UNION,TOK_QUERY的部分。
其他的SQL类型都比较简单了,比如DROP TABLE,如下
public void ParseDropTable(ASTNode ASTNodeDropTable) {
ASTNode ASTNodetmp01 = (ASTNode)ASTNodeDropTable.getChild(0);
ParseResultAppend(ASTNodeDropTable.toString(), ASTNodetmp01.toString(), “DropTableClause”);
}
原文地址:http://www.oratea.net/?p=666