Hive3.1.2自带的系统函数及UDF的随系统自动注册
之前写过一篇稿子介绍了如何使用UDF函数:https://lizhiyong.blog.csdn.net/article/details/126186377
其中比较重要的一个类就是GenericUDF
。通过继承该类并自行实现具体算法、打Jar包、加载Jar包到Hive、注册到Hive及在HQL中使用函数,大体上介绍了使用流程。用户自己写的函数是通过这么一些列骚操作实现的,那么Hive自带的函数是如何就可以不用注册,直接给租户们使用?
搞明白这一点,就可以将最常用的UDF自动注册到Hive,避免经常需要加载Jar包及注册函数的繁琐操作。尤其是自行注册的UDF函数貌似默认是只对当前库生效,跨库使用时还需要使用库名.UDF函数名
来调用UDF函数,并不是非常方便。
直接在idea中按2次shift即可搜索Java类。笔者以RPAD函数为例。
package org.apache.hadoop.hive.ql.udf.generic;
import org.apache.hadoop.hive.ql.exec.Description;
/**
* UDFRpad.
*
*/
@Description(name = "rpad", value = "_FUNC_(str, len, pad) - " +
"Returns str, right-padded with pad to a length of len",
extended = "If str is longer than len, the return value is shortened to "
+ "len characters.\n"
+ "In case of empty pad string, the return value is null.\n"
+ "Example:\n"
+ " > SELECT _FUNC_('hi', 5, '??') FROM src LIMIT 1;\n"
+ " 'hi???'\n"
+ " > SELECT _FUNC_('hi', 1, '??') FROM src LIMIT 1;\n"
+ " 'h'\n"
+ " > SELECT _FUNC_('hi', 5, '') FROM src LIMIT 1;\n"
+ " null")
public class GenericUDFRpad extends GenericUDFBasePad {
public GenericUDFRpad() {
super("rpad");
}
@Override
protected void performOp(
StringBuilder builder, int len, String str, String pad) {
int pos = str.length();
// Copy the text
builder.append(str, 0, pos);
// Copy the padding
while (pos < len) {
builder.append(pad);
pos += pad.length();
}
builder.setLength(len);
}
}
可以找到这个类。它继承了GenericUDFBasePad类,从Java源码可以粗略看出这货是要在字符串右侧追加字符。
其父类:
package org.apache.hadoop.hive.ql.udf.generic;
public abstract class GenericUDFBasePad extends GenericUDF {
private transient Converter converter1;
private transient Converter converter2;
private transient Converter converter3;
private Text result = new Text();
private String udfName;
private StringBuilder builder;
public GenericUDFBasePad(String _udfName) {
this.udfName = _udfName;
this.builder = new StringBuilder();
}
@Override
public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
if (arguments.length != 3) {
throw new UDFArgumentException(udfName + " requires three arguments. Found :"
+ arguments.length);
}
converter1 = checkTextArguments(arguments, 0);
converter2 = checkIntArguments(arguments, 1);
converter3 = checkTextArguments(arguments, 2);
return PrimitiveObjectInspectorFactory.writableStringObjectInspector;
}
@Override
public Object evaluate(DeferredObject[] arguments) throws HiveException {
Object valObject1 = arguments[0].get();
Object valObject2 = arguments[1].get();
Object valObject3 = arguments[2].get();
if (valObject1 == null || valObject2 == null || valObject3 == null) {
return null;
}
Text str = (Text) converter1.convert(valObject1);
IntWritable lenW = (IntWritable) converter2.convert(valObject2);
Text pad = (Text) converter3.convert(valObject3);
if (str == null || pad == null || lenW == null || pad.toString().isEmpty()) {
return null;
}
int len = lenW.get();
builder.setLength(0);
performOp(builder, len, str.toString(), pad.toString());
result.set(builder.toString());
return result;
}
@Override
public String getDisplayString(String[] children) {
return getStandardDisplayString(udfName, children);
}
protected abstract void performOp(
StringBuilder builder, int len, String str, String pad);
// Convert input arguments to Text, if necessary.
private Converter checkTextArguments(ObjectInspector[] arguments, int i)
throws UDFArgumentException {
if (arguments[i].getCategory() != ObjectInspector.Category.PRIMITIVE) {
throw new UDFArgumentTypeException(i, "Only primitive type arguments are accepted but "
+ arguments[i].getTypeName() + " is passed.");
}
Converter converter = ObjectInspectorConverters.getConverter((PrimitiveObjectInspector) arguments[i],
PrimitiveObjectInspectorFactory.writableStringObjectInspector);
return converter;
}
private Converter checkIntArguments(ObjectInspector[] arguments, int i)
throws UDFArgumentException {
if (arguments[i].getCategory() != ObjectInspector.Category.PRIMITIVE) {
throw new UDFArgumentTypeException(i, "Only primitive type arguments are accepted but "
+ arguments[i].getTypeName() + " is passed.");
}
PrimitiveCategory inputType = ((PrimitiveObjectInspector) arguments[i]).getPrimitiveCategory();
Converter converter;
switch (inputType) {
case INT:
case SHORT:
case BYTE:
converter = ObjectInspectorConverters.getConverter((PrimitiveObjectInspector) arguments[i],
PrimitiveObjectInspectorFactory.writableIntObjectInspector);
break;
default:
throw new UDFArgumentTypeException(i + 1, udfName
+ " only takes INT/SHORT/BYTE types as " + (i + 1) + "-ths argument, got "
+ inputType);
}
return converter;
}
}
也是和普通的UDF一样,继承了GenericUDF类。该类此处不再赘述。
当然顺藤摸瓜,可以发现Hive自带的函数集中存放于org.apache.hadoop.hive.ql.udf.generic
这个包下:
根据Java类的名称,就可以看出它们为哪种函数提供了算法:
例如这个Trim函数:
package org.apache.hadoop.hive.ql.udf.generic;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.vector.VectorizedExpressions;
import org.apache.hadoop.hive.ql.exec.vector.expressions.StringTrim;
/**
* UDFTrim.
*
*/
@Description(name = "trim",
value = "_FUNC_(str) - Removes the leading and trailing space characters from str ",
extended = "Example:\n"
+ " > SELECT _FUNC_(' facebook ') FROM src LIMIT 1;\n" + " 'facebook'")
@VectorizedExpressions({ StringTrim.class })
public class GenericUDFTrim extends GenericUDFBaseTrim {
public GenericUDFTrim() {
super("trim");
}
@Override
protected String performOp(String val) {
return StringUtils.strip(val, " ");
}
}
不必多言,就是那个去空格的trim函数。显然,Hive自带的函数和用户自定义的UDF并没有什么太大的差别,底层都是继承了相同的类。只不过开源社区事先把Hive常用的功能函数写好了。
至此找到了Hive的自带函数存放的包名。
同样是以RPAD函数为例。在idea使用alt+F7可以找到调用关系:
显然这个GenericUDFRpad类会被registerGenericUDF方法调用。根据函数名称,可以推测出注册函数的功能与之一定有千丝万缕的联系。
跳入该类:
package org.apache.hadoop.hive.ql.exec;
/**
* FunctionRegistry.
*/
public final class FunctionRegistry {
private static final Logger LOG = LoggerFactory.getLogger(FunctionRegistry.class);
/*
* PTF variables
* */
public static final String LEAD_FUNC_NAME = "lead";
public static final String LAG_FUNC_NAME = "lag";
public static final String LAST_VALUE_FUNC_NAME = "last_value";
public static final String UNARY_PLUS_FUNC_NAME = "positive";
public static final String UNARY_MINUS_FUNC_NAME = "negative";
public static final String WINDOWING_TABLE_FUNCTION = "windowingtablefunction";
private static final String NOOP_TABLE_FUNCTION = "noop";
private static final String NOOP_MAP_TABLE_FUNCTION = "noopwithmap";
private static final String NOOP_STREAMING_TABLE_FUNCTION = "noopstreaming";
private static final String NOOP_STREAMING_MAP_TABLE_FUNCTION = "noopwithmapstreaming";
private static final String MATCH_PATH_TABLE_FUNCTION = "matchpath";
public static final Set<String> HIVE_OPERATORS = new HashSet<String>();
static {
HIVE_OPERATORS.addAll(Arrays.asList(
"+", "-", "*", "/", "%", "div", "&", "|", "^", "~",
"and", "or", "not", "!",
"=", "==", "<=>", "!=", "<>", "<", "<=", ">", ">=",
"index"));
}
// registry for system functions
private static final Registry system = new Registry(true);
static {
system.registerGenericUDF("concat", GenericUDFConcat.class);
system.registerUDF("substr", UDFSubstr.class, false);
system.registerUDF("substring", UDFSubstr.class, false);
system.registerGenericUDF("substring_index", GenericUDFSubstringIndex.class);
system.registerUDF("space", UDFSpace.class, false);
system.registerUDF("repeat", UDFRepeat.class, false);
system.registerUDF("ascii", UDFAscii.class, false);
system.registerGenericUDF("lpad", GenericUDFLpad.class);
system.registerGenericUDF("rpad", GenericUDFRpad.class);
system.registerGenericUDF("levenshtein", GenericUDFLevenshtein.class);
system.registerGenericUDF("soundex", GenericUDFSoundex.class);
system.registerGenericUDF("size", GenericUDFSize.class);
system.registerGenericUDF("round", GenericUDFRound.class);
system.registerGenericUDF("bround", GenericUDFBRound.class);
system.registerGenericUDF("floor", GenericUDFFloor.class);
system.registerUDF("sqrt", UDFSqrt.class, false);
system.registerGenericUDF("cbrt", GenericUDFCbrt.class);
system.registerGenericUDF("ceil", GenericUDFCeil.class);
system.registerGenericUDF("ceiling", GenericUDFCeil.class);
system.registerUDF("rand", UDFRand.class, false);
system.registerGenericUDF("abs", GenericUDFAbs.class);
system.registerGenericUDF("sq_count_check", GenericUDFSQCountCheck.class);
system.registerGenericUDF("enforce_constraint", GenericUDFEnforceConstraint.class);
system.registerGenericUDF("pmod", GenericUDFPosMod.class);
system.registerUDF("ln", UDFLn.class, false);
system.registerUDF("log2", UDFLog2.class, false);
system.registerUDF("sin", UDFSin.class, false);
system.registerUDF("asin", UDFAsin.class, false);
system.registerUDF("cos", UDFCos.class, false);
system.registerUDF("acos", UDFAcos.class, false);
system.registerUDF("log10", UDFLog10.class, false);
system.registerUDF("log", UDFLog.class, false);
system.registerUDF("exp", UDFExp.class, false);
system.registerGenericUDF("power", GenericUDFPower.class);
system.registerGenericUDF("pow", GenericUDFPower.class);
system.registerUDF("sign", UDFSign.class, false);
system.registerUDF("pi", UDFPI.class, false);
system.registerUDF("degrees", UDFDegrees.class, false);
system.registerUDF("radians", UDFRadians.class, false);
system.registerUDF("atan", UDFAtan.class, false);
system.registerUDF("tan", UDFTan.class, false);
system.registerUDF("e", UDFE.class, false);
system.registerGenericUDF("factorial", GenericUDFFactorial.class);
system.registerUDF("crc32", UDFCrc32.class, false);
system.registerUDF("conv", UDFConv.class, false);
system.registerUDF("bin", UDFBin.class, false);
system.registerUDF("chr", UDFChr.class, false);
system.registerUDF("hex", UDFHex.class, false);
system.registerUDF("unhex", UDFUnhex.class, false);
system.registerUDF("base64", UDFBase64.class, false);
system.registerUDF("unbase64", UDFUnbase64.class, false);
system.registerGenericUDF("sha2", GenericUDFSha2.class);
system.registerUDF("md5", UDFMd5.class, false);
system.registerUDF("sha1", UDFSha1.class, false);
system.registerUDF("sha", UDFSha1.class, false);
system.registerGenericUDF("aes_encrypt", GenericUDFAesEncrypt.class);
system.registerGenericUDF("aes_decrypt", GenericUDFAesDecrypt.class);
system.registerUDF("uuid", UDFUUID.class, false);
system.registerGenericUDF("encode", GenericUDFEncode.class);
system.registerGenericUDF("decode", GenericUDFDecode.class);
system.registerGenericUDF("upper", GenericUDFUpper.class);
system.registerGenericUDF("lower", GenericUDFLower.class);
system.registerGenericUDF("ucase", GenericUDFUpper.class);
system.registerGenericUDF("lcase", GenericUDFLower.class);
system.registerGenericUDF("trim", GenericUDFTrim.class);
system.registerGenericUDF("ltrim", GenericUDFLTrim.class);
system.registerGenericUDF("rtrim", GenericUDFRTrim.class);
system.registerGenericUDF("length", GenericUDFLength.class);
system.registerGenericUDF("character_length", GenericUDFCharacterLength.class);
system.registerGenericUDF("char_length", GenericUDFCharacterLength.class);
system.registerGenericUDF("octet_length", GenericUDFOctetLength.class);
system.registerUDF("reverse", UDFReverse.class, false);
system.registerGenericUDF("field", GenericUDFField.class);
system.registerUDF("find_in_set", UDFFindInSet.class, false);
system.registerGenericUDF("initcap", GenericUDFInitCap.class);
system.registerUDF("like", UDFLike.class, true);
system.registerGenericUDF("likeany", GenericUDFLikeAny.class);
system.registerGenericUDF("likeall", GenericUDFLikeAll.class);
system.registerGenericUDF("rlike", GenericUDFRegExp.class);
system.registerGenericUDF("regexp", GenericUDFRegExp.class);
system.registerUDF("regexp_replace", UDFRegExpReplace.class, false);
system.registerUDF("replace", UDFReplace.class, false);
system.registerUDF("regexp_extract", UDFRegExpExtract.class, false);
system.registerUDF("parse_url", UDFParseUrl.class, false);
system.registerGenericUDF("nvl", GenericUDFNvl.class);
system.registerGenericUDF("split", GenericUDFSplit.class);
system.registerGenericUDF("str_to_map", GenericUDFStringToMap.class);
system.registerGenericUDF("translate", GenericUDFTranslate.class);
system.registerGenericUDF(UNARY_PLUS_FUNC_NAME, GenericUDFOPPositive.class);
system.registerGenericUDF(UNARY_MINUS_FUNC_NAME, GenericUDFOPNegative.class);
system.registerGenericUDF("day", UDFDayOfMonth.class);
system.registerGenericUDF("dayofmonth", UDFDayOfMonth.class);
system.registerUDF("dayofweek", UDFDayOfWeek.class, false);
system.registerGenericUDF("month", UDFMonth.class);
system.registerGenericUDF("quarter", GenericUDFQuarter.class);
system.registerGenericUDF("year", UDFYear.class);
system.registerGenericUDF("hour", UDFHour.class);
system.registerGenericUDF("minute", UDFMinute.class);
system.registerGenericUDF("second", UDFSecond.class);
system.registerUDF("from_unixtime", UDFFromUnixTime.class, false);
system.registerGenericUDF("to_date", GenericUDFDate.class);
system.registerUDF("weekofyear", UDFWeekOfYear.class, false);
system.registerGenericUDF("last_day", GenericUDFLastDay.class);
system.registerGenericUDF("next_day", GenericUDFNextDay.class);
system.registerGenericUDF("trunc", GenericUDFTrunc.class);
system.registerGenericUDF("date_format", GenericUDFDateFormat.class);
// Special date formatting functions
system.registerUDF("floor_year", UDFDateFloorYear.class, false);
system.registerUDF("floor_quarter", UDFDateFloorQuarter.class, false);
system.registerUDF("floor_month", UDFDateFloorMonth.class, false);
system.registerUDF("floor_day", UDFDateFloorDay.class, false);
system.registerUDF("floor_week", UDFDateFloorWeek.class, false);
system.registerUDF("floor_hour", UDFDateFloorHour.class, false);
system.registerUDF("floor_minute", UDFDateFloorMinute.class, false);
system.registerUDF("floor_second", UDFDateFloorSecond.class, false);
system.registerGenericUDF("date_add", GenericUDFDateAdd.class);
system.registerGenericUDF("date_sub", GenericUDFDateSub.class);
system.registerGenericUDF("datediff", GenericUDFDateDiff.class);
system.registerGenericUDF("add_months", GenericUDFAddMonths.class);
system.registerGenericUDF("months_between", GenericUDFMonthsBetween.class);
system.registerUDF("get_json_object", UDFJson.class, false);
system.registerUDF("xpath_string", UDFXPathString.class, false);
system.registerUDF("xpath_boolean", UDFXPathBoolean.class, false);
system.registerUDF("xpath_number", UDFXPathDouble.class, false);
system.registerUDF("xpath_double", UDFXPathDouble.class, false);
system.registerUDF("xpath_float", UDFXPathFloat.class, false);
system.registerUDF("xpath_long", UDFXPathLong.class, false);
system.registerUDF("xpath_int", UDFXPathInteger.class, false);
system.registerUDF("xpath_short", UDFXPathShort.class, false);
system.registerGenericUDF("xpath", GenericUDFXPath.class);
system.registerGenericUDF("+", GenericUDFOPPlus.class);
system.registerGenericUDF("-", GenericUDFOPMinus.class);
system.registerGenericUDF("*", GenericUDFOPMultiply.class);
system.registerGenericUDF("/", GenericUDFOPDivide.class);
system.registerGenericUDF("%", GenericUDFOPMod.class);
system.registerGenericUDF("mod", GenericUDFOPMod.class);
system.registerUDF("div", UDFOPLongDivide.class, true);
system.registerUDF("&", UDFOPBitAnd.class, true);
system.registerUDF("|", UDFOPBitOr.class, true);
system.registerUDF("^", UDFOPBitXor.class, true);
system.registerUDF("~", UDFOPBitNot.class, true);
system.registerUDF("shiftleft", UDFOPBitShiftLeft.class, true);
system.registerUDF("shiftright", UDFOPBitShiftRight.class, true);
system.registerUDF("shiftrightunsigned", UDFOPBitShiftRightUnsigned.class, true);
system.registerGenericUDF("grouping", GenericUDFGrouping.class);
system.registerGenericUDF("current_database", UDFCurrentDB.class);
system.registerGenericUDF("current_date", GenericUDFCurrentDate.class);
system.registerGenericUDF("current_timestamp", GenericUDFCurrentTimestamp.class);
system.registerGenericUDF("current_user", GenericUDFCurrentUser.class);
system.registerGenericUDF("current_groups", GenericUDFCurrentGroups.class);
system.registerGenericUDF("logged_in_user", GenericUDFLoggedInUser.class);
system.registerGenericUDF("restrict_information_schema", GenericUDFRestrictInformationSchema.class);
system.registerGenericUDF("current_authorizer", GenericUDFCurrentAuthorizer.class);
system.registerGenericUDF("isnull", GenericUDFOPNull.class);
system.registerGenericUDF("isnotnull", GenericUDFOPNotNull.class);
system.registerGenericUDF("istrue", GenericUDFOPTrue.class);
system.registerGenericUDF("isnottrue", GenericUDFOPNotTrue.class);
system.registerGenericUDF("isfalse", GenericUDFOPFalse.class);
system.registerGenericUDF("isnotfalse", GenericUDFOPNotFalse.class);
system.registerGenericUDF("if", GenericUDFIf.class);
system.registerGenericUDF("in", GenericUDFIn.class);
system.registerGenericUDF("and", GenericUDFOPAnd.class);
system.registerGenericUDF("or", GenericUDFOPOr.class);
system.registerGenericUDF("=", GenericUDFOPEqual.class);
system.registerGenericUDF("==", GenericUDFOPEqual.class);
system.registerGenericUDF("<=>", GenericUDFOPEqualNS.class);
system.registerGenericUDF("!=", GenericUDFOPNotEqual.class);
system.registerGenericUDF("<>", GenericUDFOPNotEqual.class);
system.registerGenericUDF("<", GenericUDFOPLessThan.class);
system.registerGenericUDF("<=", GenericUDFOPEqualOrLessThan.class);
system.registerGenericUDF(">", GenericUDFOPGreaterThan.class);
system.registerGenericUDF(">=", GenericUDFOPEqualOrGreaterThan.class);
system.registerGenericUDF("not", GenericUDFOPNot.class);
system.registerGenericUDF("!", GenericUDFOPNot.class);
system.registerGenericUDF("between", GenericUDFBetween.class);
system.registerGenericUDF("in_bloom_filter", GenericUDFInBloomFilter.class);
// Utility UDFs
system.registerUDF("version", UDFVersion.class, false);
// Aliases for Java Class Names
// These are used in getImplicitConvertUDFMethod
system.registerUDF(serdeConstants.BOOLEAN_TYPE_NAME, UDFToBoolean.class, false, UDFToBoolean.class.getSimpleName());
system.registerUDF(serdeConstants.TINYINT_TYPE_NAME, UDFToByte.class, false, UDFToByte.class.getSimpleName());
system.registerUDF(serdeConstants.SMALLINT_TYPE_NAME, UDFToShort.class, false, UDFToShort.class.getSimpleName());
system.registerUDF(serdeConstants.INT_TYPE_NAME, UDFToInteger.class, false, UDFToInteger.class.getSimpleName());
system.registerUDF(serdeConstants.BIGINT_TYPE_NAME, UDFToLong.class, false, UDFToLong.class.getSimpleName());
system.registerUDF(serdeConstants.FLOAT_TYPE_NAME, UDFToFloat.class, false, UDFToFloat.class.getSimpleName());
system.registerUDF(serdeConstants.DOUBLE_TYPE_NAME, UDFToDouble.class, false, UDFToDouble.class.getSimpleName());
system.registerUDF(serdeConstants.STRING_TYPE_NAME, UDFToString.class, false, UDFToString.class.getSimpleName());
// following mapping is to enable UDFName to UDF while generating expression for default value (in operator tree)
// e.g. cast(4 as string) is serialized as UDFToString(4) into metastore, to allow us to generate appropriate UDF for
// UDFToString we need the following mappings
// Rest of the types e.g. DATE, CHAR, VARCHAR etc are already registered
system.registerUDF(UDFToString.class.getSimpleName(), UDFToString.class, false, UDFToString.class.getSimpleName());
system.registerUDF(UDFToBoolean.class.getSimpleName(), UDFToBoolean.class, false, UDFToBoolean.class.getSimpleName());
system.registerUDF(UDFToDouble.class.getSimpleName(), UDFToDouble.class, false, UDFToDouble.class.getSimpleName());
system.registerUDF(UDFToFloat.class.getSimpleName(), UDFToFloat.class, false, UDFToFloat.class.getSimpleName());
system.registerUDF(UDFToInteger.class.getSimpleName(), UDFToInteger.class, false, UDFToInteger.class.getSimpleName());
system.registerUDF(UDFToLong.class.getSimpleName(), UDFToLong.class, false, UDFToLong.class.getSimpleName());
system.registerUDF(UDFToShort.class.getSimpleName(), UDFToShort.class, false, UDFToShort.class.getSimpleName());
system.registerUDF(UDFToByte.class.getSimpleName(), UDFToByte.class, false, UDFToByte.class.getSimpleName());
system.registerGenericUDF(serdeConstants.DATE_TYPE_NAME, GenericUDFToDate.class);
system.registerGenericUDF(serdeConstants.TIMESTAMP_TYPE_NAME, GenericUDFTimestamp.class);
system.registerGenericUDF(serdeConstants.TIMESTAMPLOCALTZ_TYPE_NAME, GenericUDFToTimestampLocalTZ.class);
system.registerGenericUDF(serdeConstants.INTERVAL_YEAR_MONTH_TYPE_NAME, GenericUDFToIntervalYearMonth.class);
system.registerGenericUDF(serdeConstants.INTERVAL_DAY_TIME_TYPE_NAME, GenericUDFToIntervalDayTime.class);
system.registerGenericUDF(serdeConstants.BINARY_TYPE_NAME, GenericUDFToBinary.class);
system.registerGenericUDF(serdeConstants.DECIMAL_TYPE_NAME, GenericUDFToDecimal.class);
system.registerGenericUDF(serdeConstants.VARCHAR_TYPE_NAME, GenericUDFToVarchar.class);
system.registerGenericUDF(serdeConstants.CHAR_TYPE_NAME, GenericUDFToChar.class);
// Aggregate functions
system.registerGenericUDAF("max", new GenericUDAFMax());
system.registerGenericUDAF("min", new GenericUDAFMin());
system.registerGenericUDAF("sum", new GenericUDAFSum());
system.registerGenericUDAF("$SUM0", new GenericUDAFSumEmptyIsZero());
system.registerGenericUDAF("count", new GenericUDAFCount());
system.registerGenericUDAF("avg", new GenericUDAFAverage());
system.registerGenericUDAF("std", new GenericUDAFStd());
system.registerGenericUDAF("stddev", new GenericUDAFStd());
system.registerGenericUDAF("stddev_pop", new GenericUDAFStd());
system.registerGenericUDAF("stddev_samp", new GenericUDAFStdSample());
system.registerGenericUDAF("variance", new GenericUDAFVariance());
system.registerGenericUDAF("var_pop", new GenericUDAFVariance());
system.registerGenericUDAF("var_samp", new GenericUDAFVarianceSample());
system.registerGenericUDAF("covar_pop", new GenericUDAFCovariance());
system.registerGenericUDAF("covar_samp", new GenericUDAFCovarianceSample());
system.registerGenericUDAF("corr", new GenericUDAFCorrelation());
system.registerGenericUDAF("regr_slope", new GenericUDAFBinarySetFunctions.RegrSlope());
system.registerGenericUDAF("regr_intercept", new GenericUDAFBinarySetFunctions.RegrIntercept());
system.registerGenericUDAF("regr_r2", new GenericUDAFBinarySetFunctions.RegrR2());
system.registerGenericUDAF("regr_sxx", new GenericUDAFBinarySetFunctions.RegrSXX());
system.registerGenericUDAF("regr_syy", new GenericUDAFBinarySetFunctions.RegrSYY());
system.registerGenericUDAF("regr_sxy", new GenericUDAFBinarySetFunctions.RegrSXY());
system.registerGenericUDAF("regr_avgx", new GenericUDAFBinarySetFunctions.RegrAvgX());
system.registerGenericUDAF("regr_avgy", new GenericUDAFBinarySetFunctions.RegrAvgY());
system.registerGenericUDAF("regr_count", new GenericUDAFBinarySetFunctions.RegrCount());
system.registerGenericUDAF("histogram_numeric", new GenericUDAFHistogramNumeric());
system.registerGenericUDAF("percentile_approx", new GenericUDAFPercentileApprox());
system.registerGenericUDAF("collect_set", new GenericUDAFCollectSet());
system.registerGenericUDAF("collect_list", new GenericUDAFCollectList());
system.registerGenericUDAF("ngrams", new GenericUDAFnGrams());
system.registerGenericUDAF("context_ngrams", new GenericUDAFContextNGrams());
system.registerGenericUDAF("compute_stats", new GenericUDAFComputeStats());
system.registerGenericUDAF("bloom_filter", new GenericUDAFBloomFilter());
system.registerUDAF("percentile", UDAFPercentile.class);
// Generic UDFs
system.registerGenericUDF("reflect", GenericUDFReflect.class);
system.registerGenericUDF("reflect2", GenericUDFReflect2.class);
system.registerGenericUDF("java_method", GenericUDFReflect.class);
system.registerGenericUDF("array", GenericUDFArray.class);
system.registerGenericUDF("assert_true", GenericUDFAssertTrue.class);
system.registerGenericUDF("assert_true_oom", GenericUDFAssertTrueOOM.class);
system.registerGenericUDF("map", GenericUDFMap.class);
system.registerGenericUDF("struct", GenericUDFStruct.class);
system.registerGenericUDF("named_struct", GenericUDFNamedStruct.class);
system.registerGenericUDF("create_union", GenericUDFUnion.class);
system.registerGenericUDF("extract_union", GenericUDFExtractUnion.class);
system.registerGenericUDF("case", GenericUDFCase.class);
system.registerGenericUDF("when", GenericUDFWhen.class);
system.registerGenericUDF("nullif", GenericUDFNullif.class);
system.registerGenericUDF("hash", GenericUDFHash.class);
system.registerGenericUDF("murmur_hash", GenericUDFMurmurHash.class);
system.registerGenericUDF("coalesce", GenericUDFCoalesce.class);
system.registerGenericUDF("index", GenericUDFIndex.class);
system.registerGenericUDF("in_file", GenericUDFInFile.class);
system.registerGenericUDF("instr", GenericUDFInstr.class);
system.registerGenericUDF("locate", GenericUDFLocate.class);
system.registerGenericUDF("elt", GenericUDFElt.class);
system.registerGenericUDF("concat_ws", GenericUDFConcatWS.class);
system.registerGenericUDF("sort_array", GenericUDFSortArray.class);
system.registerGenericUDF("sort_array_by", GenericUDFSortArrayByField.class);
system.registerGenericUDF("array_contains", GenericUDFArrayContains.class);
system.registerGenericUDF("sentences", GenericUDFSentences.class);
system.registerGenericUDF("map_keys", GenericUDFMapKeys.class);
system.registerGenericUDF("map_values", GenericUDFMapValues.class);
system.registerGenericUDF("format_number", GenericUDFFormatNumber.class);
system.registerGenericUDF("printf", GenericUDFPrintf.class);
system.registerGenericUDF("greatest", GenericUDFGreatest.class);
system.registerGenericUDF("least", GenericUDFLeast.class);
system.registerGenericUDF("cardinality_violation", GenericUDFCardinalityViolation.class);
system.registerGenericUDF("width_bucket", GenericUDFWidthBucket.class);
system.registerGenericUDF("from_utc_timestamp", GenericUDFFromUtcTimestamp.class);
system.registerGenericUDF("to_utc_timestamp", GenericUDFToUtcTimestamp.class);
system.registerGenericUDF("unix_timestamp", GenericUDFUnixTimeStamp.class);
system.registerGenericUDF("to_unix_timestamp", GenericUDFToUnixTimeStamp.class);
system.registerGenericUDF("internal_interval", GenericUDFInternalInterval.class);
system.registerGenericUDF("to_epoch_milli", GenericUDFEpochMilli.class);
// Generic UDTF's
system.registerGenericUDTF("explode", GenericUDTFExplode.class);
system.registerGenericUDTF("replicate_rows", GenericUDTFReplicateRows.class);
system.registerGenericUDTF("inline", GenericUDTFInline.class);
system.registerGenericUDTF("json_tuple", GenericUDTFJSONTuple.class);
system.registerGenericUDTF("parse_url_tuple", GenericUDTFParseUrlTuple.class);
system.registerGenericUDTF("posexplode", GenericUDTFPosExplode.class);
system.registerGenericUDTF("stack", GenericUDTFStack.class);
system.registerGenericUDTF("get_splits", GenericUDTFGetSplits.class);
//PTF declarations
system.registerGenericUDF(LEAD_FUNC_NAME, GenericUDFLead.class);
system.registerGenericUDF(LAG_FUNC_NAME, GenericUDFLag.class);
system.registerGenericUDAF("row_number", new GenericUDAFRowNumber());
system.registerGenericUDAF("rank", new GenericUDAFRank());
system.registerGenericUDAF("dense_rank", new GenericUDAFDenseRank());
system.registerGenericUDAF("percent_rank", new GenericUDAFPercentRank());
system.registerGenericUDAF("cume_dist", new GenericUDAFCumeDist());
system.registerGenericUDAF("ntile", new GenericUDAFNTile());
system.registerGenericUDAF("first_value", new GenericUDAFFirstValue());
system.registerGenericUDAF("last_value", new GenericUDAFLastValue());
system.registerWindowFunction(LEAD_FUNC_NAME, new GenericUDAFLead());
system.registerWindowFunction(LAG_FUNC_NAME, new GenericUDAFLag());
system.registerTableFunction(NOOP_TABLE_FUNCTION, NoopResolver.class);
system.registerTableFunction(NOOP_MAP_TABLE_FUNCTION, NoopWithMapResolver.class);
system.registerTableFunction(NOOP_STREAMING_TABLE_FUNCTION, NoopStreamingResolver.class);
system.registerTableFunction(NOOP_STREAMING_MAP_TABLE_FUNCTION, NoopWithMapStreamingResolver.class);
system.registerTableFunction(WINDOWING_TABLE_FUNCTION, WindowingTableFunctionResolver.class);
system.registerTableFunction(MATCH_PATH_TABLE_FUNCTION, MatchPathResolver.class);
// Arithmetic specializations are done in a convoluted manner; mark them as built-in.
system.registerHiddenBuiltIn(GenericUDFOPDTIMinus.class);
system.registerHiddenBuiltIn(GenericUDFOPDTIPlus.class);
system.registerHiddenBuiltIn(GenericUDFOPNumericMinus.class);
system.registerHiddenBuiltIn(GenericUDFOPNumericPlus.class);
// mask UDFs
system.registerGenericUDF(GenericUDFMask.UDF_NAME, GenericUDFMask.class);
system.registerGenericUDF(GenericUDFMaskFirstN.UDF_NAME, GenericUDFMaskFirstN.class);
system.registerGenericUDF(GenericUDFMaskLastN.UDF_NAME, GenericUDFMaskLastN.class);
system.registerGenericUDF(GenericUDFMaskShowFirstN.UDF_NAME, GenericUDFMaskShowFirstN.class);
system.registerGenericUDF(GenericUDFMaskShowLastN.UDF_NAME, GenericUDFMaskShowLastN.class);
system.registerGenericUDF(GenericUDFMaskHash.UDF_NAME, GenericUDFMaskHash.class);
}
}
显然此处就是注册了Hive的系统函数。从源码可以看出Hive的系统函数注册了几百个。注册过的函数当然就可以直接使用了。
该类中当然还有其它的方法:
public static FunctionInfo registerPermanentFunction(String functionName,
String className, boolean registerToSession, FunctionResource[] resources) {
return system.registerPermanentFunction(functionName, className, registerToSession, resources);
}
public static void unregisterPermanentFunction(String functionName) throws HiveException {
system.unregisterFunction(functionName);
unregisterTemporaryUDF(functionName);
}
/**
* Unregisters all the functions under the database dbName
* @param dbName specified database name
* @throws HiveException
*/
public static void unregisterPermanentFunctions(String dbName) throws HiveException {
system.unregisterFunctions(dbName);
}
/**
* Registers the appropriate kind of temporary function based on a class's
* type.
*
* @param functionName name under which to register function
*
* @param udfClass class implementing UD[A|T]F
*
* @return true if udfClass's type was recognized (so registration
* succeeded); false otherwise
*/
public static FunctionInfo registerTemporaryUDF(
String functionName, Class<?> udfClass, FunctionResource... resources) {
return SessionState.getRegistryForWrite().registerFunction(
functionName, udfClass, resources);
}
public static void unregisterTemporaryUDF(String functionName) throws HiveException {
if (SessionState.getRegistry() != null) {
SessionState.getRegistry().unregisterFunction(functionName);
}
}
这些就是命令行【没有temporary就是永久函数】:
create temporary function UDF函数名称 as '包名.类名'; #注册临时函数
desc function extended UDF函数名称; #查看UDF函数信息
drop temporary function if exists UDF函数名称; #可以弃用UDF
执行这类操作时底层调用的方法。
显然系统函数能够直接使用,就是因为Java源码中实现用硬编码写死了它们。并且系统函数和永久函数是共享一个Registry类的system实例对象。
至此,找到了Hive自动注册过的系统函数,并且定位到了注册为系统函数需要调用的方法是Registry
类的registerUDF、registerGenericUDF、registerGenericUDTF、registerTableFunction、registerHiddenBuiltIn等方法。
虽然不明觉厉,但是已经可以八九不离十地推测出如何将自己的UDF函数注册为系统函数以便直接使用。方法很简单:
在上述代码段附近,参照着这类方法:
system.registerGenericUDF("rpad", GenericUDFRpad.class);
system.registerGenericUDAF("collect_set", new GenericUDAFCollectSet());
照猫画虎也写几句:
system.registerGenericUDF("HQL的函数名称", 对应的Java类名.class);
即可将自己的函数注册为系统函数,以后使用时就不用再去繁琐地手动注册了。
修改了此处的源码后,需要重新编译源码。聪明的JavaEr一定知道简单的方法:无视编译错误,直接将编译好的.class
文件替换掉hive-exec
这个Jar包内的文件。具体使用mvn clean install -DskipTests
或者其它黑科技,各有各的玩法。
至此,SQL Boy们应该对Hive自带的函数有哪些有了明确的认识,也不至于狐疑为神马Oracle有的函数在Hive中无法使用这种神奇的现象。重要的话说三遍:
Hive不是数据库!!!
Hive不是数据库!!!
Hive不是数据库!!!
对于平台和组件二开人员来说,也明确了如何给Hive的系统函数库添砖加瓦,让平台更加强大。
转载请注明出处:https://lizhiyong.blog.csdn.net/article/details/127501392