Understanding Generics, Type Erasure, and Bridge Methods

http://www.znetdevelopment.com/blogs/2011/06/24/java-understanding-generics-type-erasure-and-bridge-methods/

 

Generics is one of the more prominent features from JDK 5 that allows you to basically specify the type as a parameter. For example, in JDK 1.4, a List was always a List of Objects. If you wanted to get something out of the list, you had to cast it. JDK 5 helped to resolve this to a degree by specifying the type. For example, you could create a List to specify that the List will contain User objects. However, this is really just syntactic sugar. For those familiar with C++, C++ has templates. However, there is a big difference between C++ and Java. C++ generics are runtime based such that List<User> is different from List<Address>. However, in Java, those two expressions are essentially the same. The reason for this is that Java uses a concept referred to as type erasure at compile time. Thus, at runtime, types do not exist (this is semi-misleading, so more on this in a bit). Type erasure was done to offer backwards compatibility with older libraries. This allowed a JDK 5 application to use a JDK 1.4 library and vice-versa without having to update every library. One of the powers of Java is in the availability of external libraries, so Java did not want to lose that.

Before we get into type erasure, let’s look at the different forms of generics in Java. There are 5 main types of generic expressions:

TypeVariables

public <E extends Data> E getInstance(Class<E> clazz);
MyData data = getInstance(MyData.class)

TypeVariables provide a single value of a given type that is generalized. In this example, which is a common example, I specify that the getInstance method will return a type E. Thus, I can purely specify MyData without casting. Java does this by analyzing the code at runtime and automatically injects the cast expressions in the byte code. Since the types are erased, Java has to perform casting still.

ParameterizedTypes

public class Factory<E extends Data> {
    E create();
}
 
Factory<MyData> factory = new Factory<MyData>();
MyData data = factory.create();

ParameterizedTypes are made up of a raw type (ie: Factory) and one or more type variables (ie: E). Classes typically define one or more type variable definitions that include optional bounds (ie: Data). Classes defined in this manner may be parameterized to define a generic instance of that type (ie: Factory<MyData>). When this is done, Java at compile time substitutes type E with MyData so that the return type becomes MyData.

WildcardType

public Object lookup(Class<? extends Data> clazz);

WildcardType expressions are similar to TypeVariables and ParameterizedTypes; however, they use a question mark to denote an expression. This allows the expression to match a given class based on the hierarchy. For example, the expression Class<Data> would not support lookup(MyData.class) since MyData.class does not equal Data.class. However, with wildcard expressions, MyData.class does extend Data.class, so it matches. Wildcard expressions have two forms: an upper bound and a lower bound. An upper bound is defined as ? super Type and means that a given type must be a parent of Type. This is rarely used in practice, but it does have its place such as Comparables. A lower bound, ? extends Type, is the more popular format and means that a given type must extend or be a subclass of the given Type.

GenericArrayType

public E[] toArray(List<E> list);
MyData[] data = toArray(new ArrayList<Data>());

GenericArrayType is basically a type expression with an array parameter. The type expression may be any other type expression including another generic array type.

Class

Yes, Class itself is a type expression. In fact, all of these expressions implement thejava.lang.reflect.Type interface.

Java exposes all of these expressions as types so that they can be retrieved at runtime. For example, a Class has the ability to get its defined TypeVariable expressions including the boundaries. A method/constructor allows access to the generic return type and generic parameter types, as do fields. This allows runtime applications to analyze that data, such as injection points. This is precisely how libraries such as CDI and Spring work.

You may be asking yourself what happened to type erasure? If the types are erased at runtime so that List<User> equals List<Address>, then how can that data be acquired at runtime. The answer lies in the bytecode and class file format definition. A method definition, for example, contains a code section and a signature section. The code is the actual byte code of the method. Within the byte code, Java erases the types. Thus, the following code results in the following byte code (notice the automatic casting and loss of data types).

List<User> users = new ArrayList<User>();
users.add(new User());
User user = users.get(0);
 
List<Address> addrs = new ArrayList<Address>();
assert(users.getClass().equals(addrs.getClass()));
NEW java/util/ArrayList
DUP
INVOKESPECIAL java/util/ArrayList.<init>()V
ASTORE 1
ALOAD 1
NEW User
DUP
ALOAD 0
INVOKESPECIAL User.<init>()V
INVOKEINTERFACE java/util/List.add(Ljava/lang/Object;)Z
POP
ALOAD 1
ICONST_0
INVOKEINTERFACE java/util/List.get(I)Ljava/lang/Object;
CHECKCAST User
ASTORE 2

The assert statement would also return true since List.class.equals(List.class) due to type erasure. What about the method signature, then? The method signature contains two definitions: the bytecode definition and the generic definition. The bytecode declaration is the same as in JDK 1.4 and uses type erasure to define the non-generic values. The generic signature, on the other hand, contains the generic expression. Thus, at runtime, via reflection, Java can lookup the expression to generate the resulting Type instances. For example, the following method results in the following definitions:

public <E extends Data> E getInstance(Class<? extends E> clazz);
Declarations:  getInstance(Ljava/lang/Class;)LData;
Generic: <E:LData;>(Ljava/lang/Class<+TE;>;)TE;

The actual format of the expression is defined by the actual class format definition. It basically defines a specific format for each of the 5 type expressions. You can see, though, how the declaration contains no generics due to type erasure. Wait…but it says the return type is Data, not Object…wouldn’t that type erase into an Object? What a good question. Java, at compile time, attempts to be smart and figure out what the base boundary would be. In this example, I said that E extends Data. Thus, I am saying that the method will always return at least something of type Data. Java uses this and defines the return type as Data. Within the byte code, however, it would still cast from Data to the implementation based on the type variables.

One of the misconceptions with first time developers is that the types are actual class instances without type erasure. Thus, they always assume that the return value will be the proper value. This may not always be true. In a compiled environment, Java will ensure that holds true. However, at runtime, there is nothing protecting us against reflection. For example, you could say List<User> to ensure the values are Users in the list. Through reflection, you can get the add method from List.class and then invoke it passing in a Address. The JVM will not stop you, because it does not know about the types at runtime within the byte code. To the JVM at the point, it is just an object. However, if you were to iterate that list as User, it would fail because the JVM would inject class casting which would fail on the Address instance. So, if you are planning on integrating with other libraries, beware of this scenario.

That essentially is type erasure within Java. There is one more piece we have not touched and that is bridge methods. Let’s look at a simple example to demonstrate what I mean:

public interface Context<E> {
    E getInstance();
}
 
public interface DataContext<E extends Data> extends Context<E> {
    E getInstance();
}
 
public class TestContext implements DataContext<Test> {
    public Test getInstance() { ... }
}

That example seems harmless enough and if you think about the resulting byte code, you would think that TestContext would have a single method named getInstance. Well, you would be wrong. In fact, there would be three methods and each defined getInstance with no arguments. Now you are prolly thinking that that breaks Java’s method overriding since they each have the same signature other than the return types. In fact, you are right again. So, how does this work? This is where bridge methods enter. The JVM within the class files, defines the getInstance method that returns Test (ie: the type variable instance). It then defines two more getInstance methods (one that returns Data (the boundary type of DataContext) and one that returns Object (the boundary type of Context). Those two methods are marked as bridge methods and they purely invoke the main method performing class casting as necessary. In fact, if you were to reflectively get the methods from the TestContext class, it would return all three instances. However, invoking Method.isBridge would return true for two of them.

So what is bridging used for? The answer there lies in how interfaces work. Since the definition for DataContext is essentially Data getInstance(), other code may be interacting with just that interface even though the implementation type is TestContext or any other context. In that case, that code would expect the method to be Data getInstance() within the byte code. In order for that to work, Java needs that method to exist in the concrete implementation class. This affects methods that take parameters (ie: setInstance(E instance)) even more. In pre-JDK 5 days, you would just have used Object getInstance andvoid setInstance(Object) and it would have worked. In JDK 5, since the compiler uses the boundary definitions 

 

 

Understanding Generics, Type Erasure, and Bridge Methods

of the generics to define the base class, it results in setInstance(Object), setInstance(Data), and setInstance(TestData). To make this work, it must use bridge methods.

And that is generics, type erasure, and bridging in a nutshell. I hope this helped to understand JDK 5 generics more.

你可能感兴趣的:(Understanding Generics, Type Erasure, and Bridge Methods)