Flink进阶系列--类加载机制

本文的Flink源码版本为: 1.15-SNAPSHOT,读者可自行从Github clone.

要讲解 Flink 的类加载机制,首先你得对 JDK 的类加载机制有所了解。

推荐阅读我之前写的1篇博客: 基于源码深入了解Java的类加载机制(JDK8和JDK11双版本)

接着看一下 FLink 的类加载器继承结构:
在这里插入图片描述
FlinkUserCodeClassLoader 继承自 URLClassLoader 类,其 loadClass() 方法实现如下:

@Override
public final Class loadClass(String name, boolean resolve) throws ClassNotFoundException {
	try {
		synchronized (getClassLoadingLock(name)) {
			return loadClassWithoutExceptionHandling(name, resolve);
		}
	} catch (Throwable classLoadingException) {
		classLoadingExceptionHandler.accept(classLoadingException);
		throw classLoadingException;
	}
}

protected Class loadClassWithoutExceptionHandling(String name, boolean resolve)
		throws ClassNotFoundException {
	// 本质上还是调用的 ClassLoader 的 loadClass() 方法
	// 即 FlinkUserCodeClassLoader 仍然遵循双亲委派模型
	return super.loadClass(name, resolve);
}

FlinkUserCodeClassLoader 有2个子类,分别为 ParentFirstClassLoader 和 ChildFirstClassLoader。

ParentFirstClassLoader

public static class ParentFirstClassLoader extends FlinkUserCodeClassLoader {

	ParentFirstClassLoader(
			URL[] urls, ClassLoader parent, Consumer classLoadingExceptionHandler) {
		// 直接复用父类 FlinkUserCodeClassLoader 的相关方法逻辑
		super(urls, parent, classLoadingExceptionHandler);
	}

	static {
		ClassLoader.registerAsParallelCapable();
	}
}

可以看到,Parent-First 类加载策略照搬双亲委派模型,也就是说,用户代码的类加载器是User ClassLoader,Flink 框架本身的类加载器是 Application ClassLoader,用户代码中的类先由 Flink 框架的类加载器加载,再由用户代码的类加载器加载。

双亲委派模型的好处是随着加载器的层次关系保证了被加载类的层次关系,从而保证了 Java 运行环境的安全性。但是在 Flink App 这种依赖纷繁复杂的环境中,双亲委派模型可能并不适用。例如,程序中引入的 Flink-Kafka Connector 总是依赖于固定的 Kafka 版本,用户代码中为了兼容实际使用的 Kafka 版本,会引入一个更低或更高的依赖。而同一个组件不同版本的类定义可能会不同(即使类的全限定名是相同的),如果仍然用双亲委派模型,就会因为 Flink 框架指定版本的类先加载,而出现莫名其妙的兼容性问题,如 NoSuchMethodError、IllegalAccessError等。

鉴于此,Flink 实现了 ChildFirstClassLoader 类加载器并作为默认策略,它打破了双亲委派模型,使得用户代码的类先加载,官方文档中将这个操作称为 “Inverted Class Loading”。

ChildFirstClassLoader

ChildFirstClassLoader 是通过覆写 FlinkUserCodeClassLoader 的 loadClassWithoutExceptionHandling() 方法来实现 Child-First 逻辑的。

@Override
protected Class loadClassWithoutExceptionHandling(String name, boolean resolve)
		throws ClassNotFoundException {

	// 首先调用 findLoadedClass 方法检查全限定名 name 对应的类是否已加载过
	Class c = findLoadedClass(name);

	if (c == null) {
		// 检查要加载的类是否以 alwaysParentFirstPatterns 集合中的前缀开头。如果是,则调用父类的 loadClassWithoutExceptionHandling 方法,以 Parent-First 的方式加载它
		for (String alwaysParentFirstPattern : alwaysParentFirstPatterns) {
			if (name.startsWith(alwaysParentFirstPattern)) {
				return super.loadClassWithoutExceptionHandling(name, resolve);
			}
		}

		try {
			// 若加载的类不以 alwaysParentFirstPatterns 集合中的前缀开头,则调用父类 URLClassLoader 的 findClass 方法进行类加载
			c = findClass(name);
		} catch (ClassNotFoundException e) {
			// 若调用 findClass() 方法失败
			// 最终再调用父类的 loadClassWithoutExceptionHandling 方法,以 Parent-First 的方式加载它
			c = super.loadClassWithoutExceptionHandling(name, resolve);
		}
	} else if (resolve) {
		resolveClass(c);
	}

	return c;
}

可见,用户如果仍然希望某些类"遵循祖制",采用双亲委派模型来加载,则需要借助 alwaysParentFirstPatterns 集合来实现。

而该集合主要由以下2个参数来指定:

  • classloader.parent-first-patterns.default,默认值如下:
@Documentation.Section(Documentation.Sections.EXPERT_CLASS_LOADING)
public static final ConfigOption> ALWAYS_PARENT_FIRST_LOADER_PATTERNS =
		ConfigOptions.key("classloader.parent-first-patterns.default")
				.stringType()
				.asList()
				.defaultValues(
						ArrayUtils.concat(
								new String[] {
									"java.",
									"scala.",
									"org.apache.flink.",
									"com.esotericsoftware.kryo",
									"org.apache.hadoop.",
									"javax.annotation.",
									"org.xml",
									"javax.xml",
									"org.apache.xerces",
									"org.w3c",
									"org.rocksdb."
								},
								PARENT_FIRST_LOGGING_PATTERNS))
				.withDeprecatedKeys("classloader.parent-first-patterns")
				.withDescription(
						"A (semicolon-separated) list of patterns that specifies which classes should always be"
								+ " resolved through the parent ClassLoader first. A pattern is a simple prefix that is checked against"
								+ " the fully qualified class name. This setting should generally not be modified. To add another pattern we"
								+ " recommend to use \"classloader.parent-first-patterns.additional\" instead.");
								
@Internal
public static final String[] PARENT_FIRST_LOGGING_PATTERNS =
		new String[] {
			"org.slf4j",
			"org.apache.log4j",
			"org.apache.logging",
			"org.apache.commons.logging",
			"ch.qos.logback"
		};

该值一般不推荐修改。

  • classloader.parent-first-patterns.additional

用户如果仍然希望某些类"遵循祖制",采用双亲委派模型来加载,则可以通过该参数额外指定,并以分号分隔。

@Documentation.Section(Documentation.Sections.EXPERT_CLASS_LOADING)
public static final ConfigOption> ALWAYS_PARENT_FIRST_LOADER_PATTERNS_ADDITIONAL =
		ConfigOptions.key("classloader.parent-first-patterns.additional")
				.stringType()
				.asList()
				.defaultValues()
				.withDescription(
						"A (semicolon-separated) list of patterns that specifies which classes should always be"
								+ " resolved through the parent ClassLoader first. A pattern is a simple prefix that is checked against"
								+ " the fully qualified class name. These patterns are appended to \""
								+ ALWAYS_PARENT_FIRST_LOADER_PATTERNS.key()
								+ "\".");

若有需要,在 conf/flink-conf.yaml 中配置 classloader.parent-first-patterns.additional即可。

Flink 会将上述2种配置合并在一起,作为 alwaysParentFirstPatterns 集合:

public static String[] getParentFirstLoaderPatterns(Configuration config) {
	List base = config.get(ALWAYS_PARENT_FIRST_LOADER_PATTERNS);
	List append = config.get(ALWAYS_PARENT_FIRST_LOADER_PATTERNS_ADDITIONAL);
	return mergeListsToArray(base, append);
}

通过IDEA,很容易找到整个方法的调用路径:

ClientUtils.buildUserCodeClassLoader()–>FlinkUserCodeClassLoaders.create()–>FlinkUserCodeClassLoaders.childFirst()–>CHildFirstClassLoader()

// ClientUtils.buildUserCodeClassLoader() 方法
public static URLClassLoader buildUserCodeClassLoader(
		List jars, List classpaths, ClassLoader parent, Configuration configuration) {
	URL[] urls = new URL[jars.size() + classpaths.size()];
	for (int i = 0; i < jars.size(); i++) {
		urls[i] = jars.get(i);
	}
	for (int i = 0; i < classpaths.size(); i++) {
		urls[i + jars.size()] = classpaths.get(i);
	}
	// 此处用于获取 alwaysParentFirstPatterns 集合
	final String[] alwaysParentFirstLoaderPatterns =
			CoreOptions.getParentFirstLoaderPatterns(configuration);
	// 此处指定了默认的类加载机制为 child-first
	final String classLoaderResolveOrder =
			configuration.getString(CoreOptions.CLASSLOADER_RESOLVE_ORDER);
	FlinkUserCodeClassLoaders.ResolveOrder resolveOrder =
			FlinkUserCodeClassLoaders.ResolveOrder.fromString(classLoaderResolveOrder);
	final boolean checkClassloaderLeak =
			configuration.getBoolean(CoreOptions.CHECK_LEAKED_CLASSLOADER);
	return FlinkUserCodeClassLoaders.create(
			resolveOrder,
			urls,
			parent,
			alwaysParentFirstLoaderPatterns,
			NOOP_EXCEPTION_HANDLER,
			checkClassloaderLeak);
}

@Documentation.Section(Documentation.Sections.EXPERT_CLASS_LOADING)
public static final ConfigOption CLASSLOADER_RESOLVE_ORDER =
		ConfigOptions.key("classloader.resolve-order")
				.stringType()
				.defaultValue("child-first")
				.withDescription(
						"Defines the class resolution strategy when loading classes from user code, meaning whether to"
								+ " first check the user code jar (\"child-first\") or the application classpath (\"parent-first\")."
								+ " The default settings indicate to load classes first from the user code jar, which means that user code"
								+ " jars can include and load different dependencies than Flink uses (transitively).");

本文到此结束,感谢阅读!

你可能感兴趣的:(Flink,flink,java,大数据)