hive 中控制符作为分隔符的使用总结

hive 默认的字段分隔符为ascii码的控制符\001,建表的时候用fields terminated by '\001',如果要测试的话,造数据在vi 打开文件里面,用ctrl+v然后再ctrl+a可以输入这个控制符\001。按顺序,\002的输入方式为ctrl+v,ctrl+b。以此类推。


控制符在java代码中如何输入呢?采用如下方式:

 byte[] bytes = new byte[] {5};
String sendString=new String(  bytes ,"GBK"); 

这样可以输入控制符\005。



split

public String[] split(String regex,
                      int limit)
Splits this string around matches of the given regular expression.
The array returned by this method contains each substring of this string that is terminated by another substring
that matches the given expression or is terminated by the end of the string. 
The substrings in the array are in the order in which they occur in this string. 
If the expression does not match any part of the input then the resulting array has just one element, namely this string.

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. 
If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, 
and the array's last entry will contain all input beyond the last matched delimiter. 

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. 

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

The string "boo:and:foo", for example, yields the following results with these parameters:

Regex   Limit   Result
:   2   { "boo", "and:foo" }
:   5   { "boo", "and", "foo" }
:   -2  { "boo", "and", "foo" }
o   5   { "b", "", ":and:f", "", "" }
o   -2  { "b", "", ":and:f", "", "" }
o   0   { "b", "", ":and:f" }


参考链接:

http://stackoverflow.com/questions/1635764/string-parsing-in-java-with-delimeter-tab-t-using-split

http://zhidao.baidu.com/link?url=kSHmhRmwFMEsqlNfz3AIjlNdAX_zufuZEQCJ0zcecgACwn0yn-TFvnPv5FAROnC6LeOUK3TQgdEbjdDDYKise_

你可能感兴趣的:(分布式集群/Hadoop)